Skip to content
Snippets Groups Projects
Commit 2c2224a8 authored by Pierre LOTTE's avatar Pierre LOTTE
Browse files

Quelques modifications

parent 2d7c775a
Branches
Tags
No related merge requests found
......@@ -12,7 +12,7 @@ This repository contains a few important directories and files:
## Installation
To use this program, you can use poetry.
To use this program, we recommend using poetry as it will allow easy library installation. If you do not want to use poetry, you will be able to find all the required libraries and their respective versions in the `pyproject.toml` file.
First, install all the required dependencies using the following command:
......@@ -22,24 +22,51 @@ poetry install
Then, to run the program you can run:
## Usage
Our program has been designed to be able to do all the different steps required to test our method.
### Data Generation
The first thing to do is to generate synthetic data. To do so, we tell the program what configuration we want to use and where to put it.
```bash
poetry run python paradise/main.py <args>
poetry run python paradise/main.py generate -c <config_path> -o <output_dir>
```
## Usage
### Training
The main use of our program is to generate some data, test the PARADISE methodology on it and extract the results. All of that can be done using a single command:
Then, we tell the program which data configuration(s) we will train our algorithm for, which algorithm(s) we will train and where to find the data. The `-s` option tells the program to use the data partitioning step. It can be omitted if you only want to train the algorithm on non-partitioned data.
```bash
poetry run python paradise/main.py all -c <config_path> -a <algorithm_name> -o <output_dir> -s
poetry run python paradise/main.py train -c <config_path> -a <algorithm_name> -i <input_dir> [-s]
```
In this command we do a few different things. First we tell the program we want to perform data generation, training and results extraction steps one after another. Then we tell it where it will find the configuration file, which algorithm to use (the algorithm name is the name of the directory containing the algorithm implementation) and where to put the results. Finally, we tell it we want to perform the partitioning step. The splitting arguments can be omitted if the partitioning has already been performed and you want to save some time.
### Result Extraction
Last but not least, we want to extract the results. To do so, we need to tell the program where to find the results.
```bash
poetry run python paradise/main.py all -i <input_dir>
```
> For more information on the possible arguments and their value, please refer to the following command's output:
### All in one
If you want to tell the program to do everything at once you can use the command shown just below. It will perform all the tasks discussed up above.
```bash
poetry run python paradise/main.py all -c <config_path> -a <algorithm_name> -o <output_dir> [-s]
```
> For more information on the possible arguments, please refer to the following command's output:
> ```bash
> poetry run python paradise/main.py --help
> ```
## Algorithms to test
\ No newline at end of file
## Algorithms to test
As stated earlier, the algorithm implentations we used are taken from [TimeEval](https://github.com/TimeEval/TimeEval-algorithms). Lot of these implementations are usable without much modifications. Some of them require a few fix to work with the library versions we use but should not take too much time to fix.
Because we are based on those implementations we expect a few things if you want to test your own algorithm. First, the csv format your model accept should contain the timestamp as the first column and the label as the last for both train and test values. Second, it should use the same call interface. For more information on this call interface. Third and last, your algorithm should output the result in a file readable using numpy's `loadtxt` method.
If your algorithm follows those principles, it should be easy for you to run it as part of our pipeline. The last thing you need to do is to add an entry for your algorithm in the file called `algorithm_params.json` with the proper hyperparameters and their values.
\ No newline at end of file
......@@ -39,7 +39,7 @@ class VFAnomaly(BaseAnomaly):
frequencies = np.tile(frqs, self.data.shape[1]//term["period"])
missing_values = self.data.shape[1] - len(frequencies)
frequencies = np.concatenate((frequencies, frequencies[:missing_values]))
frequencies[start:end] *= 3
frequencies[start:end] *= 0.7
# Find the differences between two consecutive points of the data
signal = np.zeros(self.data.shape[1])
......
......@@ -24,7 +24,7 @@ if __name__ == "__main__":
# =================================================================================================================
# Fetch arguments from CLI
# =================================================================================================================
# Create the parser
n # Create the parser
parser = ArgumentParser(prog="Time Series Generator", description="Create time series.")
# Add arguments
......
......@@ -5,6 +5,7 @@ import os
import numpy as np
import pandas as pd
from sklearn.preprocessing import minmax_scale
from tqdm import tqdm
from utils import exec_cmd, reverse_window
......@@ -138,8 +139,14 @@ class ResultExtractor():
"Algorithm",
"Dataset",
"F1",
"Precision",
"Recall",
"F1 paradise",
"Precision paradise",
"Recall paradise",
"F1 ideal",
"Precision ideal",
"Recall ideal",
"Resp. subset paradise",
"Resp. subset ideal",
"ROC",
......@@ -149,7 +156,8 @@ class ResultExtractor():
]
df = pd.DataFrame(columns=columns)
for algo, datasets in self.results.items():
for algo, datasets in tqdm(self.results.items()):
breakpoint()
for dataset_name, dataset_results in datasets.items():
f1_results = F1Metric().compute(dataset_results, self.ground_truth[dataset_name])
roc_results = ROCMetric().compute(dataset_results, self.ground_truth[dataset_name])
......@@ -159,9 +167,15 @@ class ResultExtractor():
df.loc[len(df)] = [
algo,
dataset_name,
f1_results[0],
f1_results[1],
f1_results[2],
f1_results[0]["f1"],
f1_results[0]["prec"],
f1_results[0]["rec"],
f1_results[1]["f1"],
f1_results[1]["prec"],
f1_results[1]["rec"],
f1_results[2]["f1"],
f1_results[2]["prec"],
f1_results[2]["rec"],
f1_results[3],
f1_results[4],
roc_results[0],
......
......@@ -4,12 +4,14 @@ This module contains the code needed to compute the F1 score.
from typing import Iterable, Tuple
import numpy as np
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_curve, roc_curve, f1_score, precision_score, recall_score
class F1Metric:
"""
This class defines the code needed to compute the F1 score.
This class defines the code needed to compute the F1 score, the precision and recall metrics get
computed as a collateral as they are used to compute the F1 score.
"""
def compute(self, results: dict, labels: Iterable) -> Tuple[float, float, float]:
"""
......@@ -39,12 +41,20 @@ class F1Metric:
else:
final_scores = predictions
prec, rec, thresholds = precision_recall_curve(truth, final_scores)
fscore = (2 * prec * rec) / (prec + rec)
fscore = np.nan_to_num(fscore)
idx = np.argmax(fscore)
fpr, tpr, thresholds = roc_curve(truth, final_scores)
distances = np.zeros(len(thresholds))
for idx, (x, y) in enumerate(zip(fpr, tpr)):
distances[idx] = np.sqrt((0 - x)**2 + (1 - y)**2)
# fscore = (2 * prec * rec) / (prec + rec)
# fscore = np.nan_to_num(fscore)
idx = np.argmin(distances)
print(thresholds[idx], idx)
pred_labels = (final_scores > thresholds[idx]).astype(int)
fscore = f1_score(truth, pred_labels)
prec = precision_score(truth, pred_labels)
rec = recall_score(truth, pred_labels)
return fscore[idx], final_scores, thresholds[idx]
return fscore, prec, rec, final_scores, thresholds[idx]
def __find_responsible_subset(global_scores, local_scores, threshold):
"""
......@@ -74,14 +84,21 @@ class F1Metric:
return responsibles
# Compute F1 score for classic approach
classic_f1, _, _ = __find_optimal_f1(labels, results["classic"])
breakpoint()
classic_f1, classic_prec, classic_rec, _, _ = __find_optimal_f1(labels, results["classic"])
# Compute F1 score for paradise approach
paradise_f1, paradise_scores, threshold = __find_optimal_f1(labels, results["paradise"])
paradise_f1, paradise_prec, paradise_rec, paradise_scores, threshold = __find_optimal_f1(labels, results["paradise"])
responsible_subset_paradise = __find_responsible_subset(paradise_scores, results["paradise"], threshold)
# Compute F1 score for ideal approach
ideal_f1, ideal_scores, threshold = __find_optimal_f1(labels, results["ideal"])
ideal_f1, ideal_prec, ideal_rec, ideal_scores, threshold = __find_optimal_f1(labels, results["ideal"])
responsible_subset_ideal = __find_responsible_subset(ideal_scores, results["ideal"], threshold)
return classic_f1, paradise_f1, ideal_f1, responsible_subset_paradise, responsible_subset_ideal
return (
{"f1": classic_f1, "prec": classic_prec, "rec": classic_rec},
{"f1": paradise_f1, "prec": paradise_prec, "rec": paradise_rec},
{"f1": ideal_f1, "prec": ideal_prec, "rec": ideal_rec},
responsible_subset_paradise,
responsible_subset_ideal
)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment