GANs4RE – Artificial SCADA dataset for benchmarking anomaly detection approaches
Anomaly detection methods are often used in monitoring wind power plants to recognize unusual behavior of the plant early on based on SCADA data and enable its optimal operation. However, a uniform benchmarking for such methods has so far not been established in the field of renewable energies. Most open datasets with labeled errors and anomalies focus on detecting attacks in network data, which differ greatly from SCADA data. GANs4RE is designed to solve this problem by providing an artificial dataset and a model for generating such datasets using generative adversarial networks (GANs).
The project is interesting to:
Model developers, providers of monitoring technology
Preparing existing sensor data of wind power plants to clearly label normal and abnormal behavior
Developing a generative model that generates realistic sensor data of wind power plants during normal and abnormal operation
Evaluating the models by using an existing anomaly detection method that is trained on generated data and applied on real data
Publishing the results in a paper
GANs, neural networks, autoencoder
What are GANs? GANs (generative adversarial networks) are a class of machine learning frameworks consisting of two neural networks. During the training, the first network (generator) learns to generate new data that resembles the real data, while the second network (discriminator) is trying to distinguish between generated and real data. Both networks are trained simultaneously, with the generative network generating more and more realistic data and the discriminative network getting better and better at distinguishing between synthetic and real data. After the training phase, the generative network can be used to generate new data.
The training of a GAN makes it possible to provide a dataset free from anomalies, which is subsequently supplemented with artificially generated anomalies and errors on the basis of known errors. For this purpose, another ML model, which is trained on anomalous data, is to be used to overwrite selected time periods in the normal dataset. That way, a dataset with clearly labelled errors and anomalies is created that can be used for benchmarking and publications. By using various real datasets for the training, artificial benchmark datasets can be created for different systems at varying locations. The focus of the project is initially placed on creating a benchmark dataset for wind power plants. After successful evaluation of the method, however, it can also be applied in other areas. By making the method available, for example in a Python library, a procedure for creating reference datasets could be established that would facilitate the development of anomaly detection methods in the field of renewable energies.
The project is aimed at evaluating the use of GANs in generating artificial datasets in the field of anomaly detection in wind power plants. To evaluate the artificial datasets, ModernWindABS and ADWENTURE anomaly detection methods based on an autoencoder are planned to be used in the projects. The quality of the artificially generated dataset is checked by training the autoencoder on an artificially generated dataset and applying it on a real dataset. After successful evaluation, a paper is planned to be published as part of the ECML PKDD 2022.
Literature search on the current state of the art for generating multivariate time series using GANs and selecting suitable methods for SCADA data
Selecting and preparing training datasets with clearly labeled data (normal or anomaly)
Developing and evaluating the models for generating time series to identify normal behavior and abnormal behavior of a wind power plant
Describing and publishing the results in a paper
Project participants: Mira Jürgens, Edison Guava, Florian Rehwald, René Heinrich, Christian Gück, Dr. Christoph Scholz, Cyriana Roelofs
A publication is planned as part of the ECML PKDD 2022.