Skip to content
Snippets Groups Projects
Pierre LOTTE's avatar
Pierre LOTTE authored
Add soft/fuzzy clustering splitting

See merge request !3
12ff70b4
History

PARADISE (PARtition based Anomaly Detection in multivariate tIme Series)

This repository contains the implementation of PARADISE, a new methodology for anomaly detection in multivariate time series based on the partitioning of the variables.

This repository contains a few important directories and files:

  • algorithms_params.json: Contains the hyperparameters values of algorithms we used.
  • algorithms: Contains the code of all the algorithms you want to use. Implementations found in TimeEval are a good place to start. More information on how to add new algorithms can be found later on in this README.
  • config: Contains some example configurations that are used in the original paper.
  • output: Contains all the generated data and the results of each algorithm you might have used.
  • paradise: Contains the implementation of the methodology
  • scripts: Contains a few scripts used to generate a bunch of configurations or even run the code on slurm capable computing clusters.

Installation

To use this program, we recommend using poetry as it will allow easy library installation. If you do not want to use poetry, you will be able to find all the required libraries and their respective versions in the pyproject.toml file.

First, install all the required dependencies using the following command:

poetry install

Then, to run the program you can run:

Usage

Our program has been designed to be able to do all the different steps required to test our method.

Data Generation

The first thing to do is to generate synthetic data. To do so, we tell the program what configuration we want to use and where to put it.

poetry run python paradise/main.py generate -c <config_path> -o <output_dir>

Training

Then, we tell the program which data configuration(s) we will train our algorithm for, which algorithm(s) we will train and where to find the data. The -s option tells the program to use the data partitioning step. It can be omitted if you only want to train the algorithm on non-partitioned data.

poetry run python paradise/main.py train -c <config_path> -a <algorithm_name> -i <input_dir> [-s]

Result Extraction

Last but not least, we want to extract the results. To do so, we need to tell the program where to find the results.

poetry run python paradise/main.py all -i <input_dir>

All in one

If you want to tell the program to do everything at once you can use the command shown just below. It will perform all the tasks discussed up above.

poetry run python paradise/main.py all -c <config_path> -a <algorithm_name> -o <output_dir> [-s]

For more information on the possible arguments, please refer to the following command's output:

poetry run python paradise/main.py --help

Algorithms to test

As stated earlier, the algorithm implentations we used are taken from TimeEval. Lot of these implementations are usable without much modifications. Some of them require a few fix to work with the library versions we use but should not take too much time to fix.

Because we are based on those implementations we expect a few things if you want to test your own algorithm. First, the csv format your model accept should contain the timestamp as the first column and the label as the last for both train and test values. Second, it should use the same call interface. For more information on this call interface. Third and last, your algorithm should output the result in a file readable using numpy's loadtxt method.

If your algorithm follows those principles, it should be easy for you to run it as part of our pipeline. The last thing you need to do is to add an entry for your algorithm in the file called algorithm_params.json with the proper hyperparameters and their values.