Skip to content
Snippets Groups Projects
Commit 2ac9c5a8 authored by Millian Poquet's avatar Millian Poquet
Browse files

prediction: jupyter->Rmd, update guide

parent 35a95a42
Branches
No related tags found
No related merge requests found
......@@ -281,12 +281,11 @@ The step-by-step instructions of this document can be used in several ways *depe
+ You can *check* the final analyses (code + plots) done in Article @lightpredenergy by reading the provided pre-rendered notebooks available on #link(zenodo-url)[Zenodo].
+ You can *reproduce* the *final analyses* by first downloading the provided aggregated results of the experiments from #link(zenodo-url)[Zenodo], and then by running the notebooks yourself.
This enables you to *edit* our notebooks before running them, so that you can to modify the analyses done or add your own.
// - Refer to #todo[link to Danilo's notebook section] for the machine learning experiment.
- Refer to @sec-analyze-prediction-results for instructions to analyze the results of the machine learning experiment.
- Refer to @sec-analyze-simu-campaign-results for instructions to analyze the results of the scheduling experiment.
+ You can *reproduce* our *experimental campaigns* by downloading the provided experiment input files from #link(zenodo-url)[Zenodo],
and then by running the experiment yourself.
This can enable you to make sure that our experiment can be reproduced with the *exact same parameters and configuration*.
//- Refer to #todo[link to Danilo's expe section?] for the machine learning experiment.
- Refer to @sec-run-simu-campaign for instructions to reproduce the scheduling experiment.
+ You can *fully reproduce* our *experimental campaigns* by downloading original traces of the Marconi100,
by generating the experimental campaigns parameters yourself (enabling you to hack provided command-line parameters or provided code),
......@@ -453,7 +452,7 @@ nix develop .#py-scripts --command m100-pred-merge-jobfiles -d ./m100-data/
]
=== Compressing prediction output into single files
The expected output data of has been stored on #link(zenodo-url)[Zenodo].
The expected output data has been stored on #link(zenodo-url)[Zenodo].
#fullbox(footer:[Disk: 82 Mo.])[
```sh
......@@ -469,19 +468,45 @@ The expected output data of has been stored on #link(zenodo-url)[Zenodo].
))
]
== Analyzing prediction results
== Analyzing prediction results <sec-analyze-prediction-results>
This analysis requires that the two job power prediction archives (outputs of @sec-job-power-pred, available on #link(zenodo-url)[Zenodo]) are available on your disk in the `./user-power-predictions` directory.
The following command populates the `./user-power-predictions/data` by extracting the archives and uncompressing all the required files on your disk.
=== Required data
Output from the previous section
#fullbox(footer: [Disk: 519 Mo. Time: 00:00:05.])[
```sh
mkdir ./user-power-predictions/data
nix develop .#merge-m100-power-predictions --command \
tar xf ./user-power-predictions/*mean.tar.gz --directory ./user-power-predictions/data
nix develop .#merge-m100-power-predictions --command \
tar xf ./user-power-predictions/*max.tar.gz --directory ./user-power-predictions/data
nix develop .#merge-m100-power-predictions --command \
gunzip ./user-power-predictions/data/*/*.gz
```
]
- `m100-data/power_pred_users_allmethods_mean.tar.gz`, the jobs mean power predictions.
- `m100-data/power_pred_users_allmethods_max.tar.gz`, the jobs maximum power predictions.
The analysis of the predictions, which also generates Figures 2 and 3 of Article @lightpredenergy, can be reproduced with the following command.
=== Reproducing the paper's plots
#fullbox(footer:[Time (laptop): 00:00:20.])[
```sh
nix develop .#r-py-notebook --command \
Rscript notebooks/run-rmarkdown-notebook.R \
notebooks/prediction-results-analysis.Rmd
```
Please refer to this #link("./notebooks/m100_process_prediction_results.ipynb")[Notebook] for
the scripts to reproduce the paper's plots, notably Figures 2 and 3.
#filehashes((
"6ce534dd1bf017444f05c354a1b3b767", "notebooks/fig2a-distrib-job-power-mean.svg",
"0b1cdfcf017c2cba7057a544e19cd698", "notebooks/fig2b-distrib-job-power-max.svg",
"0bc88e65ae495a8d6ec7d3cbcfca12ae", "notebooks/fig3a-pred-mape-mean-power.svg",
"a19b1a7c5dc72ec73a5349d85fc68fa3", "notebooks/fig3b-pred-mape-max-power.svg",
"04c2d5ef412b791a4d5515ec0719b3d0", "notebooks/prediction-results-analysis.html",
), fill: (x, y) => {
if y > 0 { red.lighten(80%) }
},
)
We could not make HTML notebooks and Python-generated images binary reproducible despite our best efforts.
Their content should be completely reproducible though.
]
== Job scheduling with power prediction <sec-sched>
This section shows how to reproduce Sections 6.4 and 6.5 of article @lightpredenergy.
......@@ -601,7 +626,7 @@ Required input files.
- The `/tmp/wlds` directory (#emph-overhead[1.4 Go]) that contains all the workload files (output of @sec-gen-workloads).
Please note that all input files can be downloaded from #link(zenodo-url)[Zenodo] if you have not generated them yourself.
In particular to populate the `/tmp/wlds`directory you can *download file* `workloads.tar.xz` and then *extract it* into `/tmp/` via a command such as the following. `tar xf workloads.tar.xz --directory=/tmp/`
In particular to populate the `/tmp/wlds` directory you can *download file* `workloads.tar.xz` and then *extract it* into `/tmp/` via a command such as the following. `tar xf workloads.tar.xz --directory=/tmp/`
#fullbox(footer: [#emph-overhead[Disk: 7.6 Go.] Time: 00:06:00.])[
```sh
......
......@@ -85,7 +85,7 @@
});
easypower-sched-lib = easy-powercap-pkgs.easypower;
};
devShells = {
devShells = rec {
download-m100-months = pkgs.mkShell {
buildInputs = [
packages.python-scripts
......@@ -127,6 +127,14 @@
pkgs.rPackages.viridis
];
};
r-py-notebook = pkgs.mkShell {
buildInputs = r-notebook.buildInputs ++ [
pkgs.rPackages.reticulate
pyPkgs.pandas
pyPkgs.seaborn
pyPkgs.scikit-learn
];
};
typst-shell = pkgs.mkShell {
buildInputs = [
typstpkgs.typst-dev
......
%% Cell type:markdown id: tags:
## Processing the mean power prediction results (script `run_prediction_per_user_allmethods_mean.py`)
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
import os
RESULTS_PATH = "../m100-data/total_power_mean_predictions_users_allmethods_mean/"
PRED_COLS = ["hist_pred_total_power_mean",
"LinearRegression_total_power_mean_watts",
"RandomForestRegressor_total_power_mean_watts",
"LinearSVR_total_power_mean_watts",
"SGDRegressor_total_power_mean_watts"]
result_filenames = os.listdir(RESULTS_PATH)
df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames])
df_all_results = df_all_results.dropna(subset=PRED_COLS)
df_all_results
```
%% Cell type:code id: tags:
``` python
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
lst_users = df_all_results["user_id"].drop_duplicates().to_list()
#print(lst_users)
df_results_user_group = df_all_results.groupby("user_id")
lst_stats_per_user = []
for user in lst_users:
results_user = df_results_user_group.get_group(user)
hist_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["hist_pred_total_power_mean"])
LR_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["LinearRegression_total_power_mean_watts"])
RF_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["RandomForestRegressor_total_power_mean_watts"])
LSVR_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["LinearSVR_total_power_mean_watts"])
SGD_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["SGDRegressor_total_power_mean_watts"])
res = {"user_id": user,
"hist_mape": hist_mape,
"LinearRegression_mape": LR_mape,
"RandomForestRegressor_mape": RF_mape,
"LinearSVR_mape": LSVR_mape,
"SGDRegressor_mape": SGD_mape}
lst_stats_per_user.append(res)
#break
df_stats_per_user = pd.DataFrame(lst_stats_per_user)
df_stats_per_user
```
%% Cell type:code id: tags:
``` python
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user[COLS].describe()
```
%% Cell type:code id: tags:
``` python
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars="user_id")
df_stats_per_user_pivot
```
%% Cell type:markdown id: tags:
### Figure 3 A
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
#plt.rc('font', size=16) # controls default text sizes
plt.rc('font', size=20) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
#g = sns.boxplot(x="variable", y="value", data=df_stats_per_user_pivot, showfliers=False)
#plt.xticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=30)
g = sns.boxplot(y="variable", x="value", data=df_stats_per_user_pivot, showfliers=False)
plt.yticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=0)
g.set_ylabel("Prediction Method")
g.set_xlabel("Mean Absolute Percentage Error (MAPE) ")
```
%% Cell type:markdown id: tags:
## Processing the max power prediction results (script `run_prediction_per_user_allmethods_max.py`)
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
import os
RESULTS_PATH = "./m100-data/total_power_mean_predictions_users_allmethods_max/"
PRED_COLS = ["hist_pred_total_power_max",
"LinearRegression_total_power_max_watts",
"RandomForestRegressor_total_power_max_watts",
"LinearSVR_total_power_max_watts",
"SGDRegressor_total_power_max_watts"]
result_filenames = os.listdir(RESULTS_PATH)
df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames])
df_all_results = df_all_results.dropna(subset=PRED_COLS)
df_all_results
```
%% Cell type:code id: tags:
``` python
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
lst_users = df_all_results["user_id"].drop_duplicates().to_list()
#print(lst_users)
df_results_user_group = df_all_results.groupby("user_id")
lst_stats_per_user = []
for user in lst_users:
results_user = df_results_user_group.get_group(user)
hist_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["hist_pred_total_power_max"])
LR_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["LinearRegression_total_power_max_watts"])
RF_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["RandomForestRegressor_total_power_max_watts"])
LSVR_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["LinearSVR_total_power_max_watts"])
SGD_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["SGDRegressor_total_power_max_watts"])
res = {"user_id": user,
"hist_mape": hist_mape,
"LinearRegression_mape": LR_mape,
"RandomForestRegressor_mape": RF_mape,
"LinearSVR_mape": LSVR_mape,
"SGDRegressor_mape": SGD_mape}
lst_stats_per_user.append(res)
#break
df_stats_per_user = pd.DataFrame(lst_stats_per_user)
df_stats_per_user
```
%% Cell type:code id: tags:
``` python
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user[COLS].describe()
```
%% Cell type:code id: tags:
``` python
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars="user_id")
df_stats_per_user_pivot
```
%% Cell type:markdown id: tags:
### Figure 3 B
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
plt.rc('font', size=20) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
#g = sns.boxplot(x="variable", y="value", data=df_stats_per_user_pivot, showfliers=False)
#plt.xticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=30)
#g.set_xlabel("Prediction Method")
#g.set_ylabel("Mean Absolute Percentage Error (MAPE) ")
g = sns.boxplot(y="variable", x="value", data=df_stats_per_user_pivot, showfliers=False)
plt.yticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=0)
g.set_ylabel("Prediction Method")
g.set_xlabel("Mean Absolute Percentage Error (MAPE)")
```
%% Cell type:markdown id: tags:
## Getting the actual mean and max power distributions
%% Cell type:markdown id: tags:
### Mean (Figure 2 A)
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
import seaborn as sns
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
plt.clf()
plt.rc('figure', figsize=(8, 6))
plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
g = sns.histplot(x="total_power_mean_watts", data=df_all_results, bins=25, fill=False)
#g.ax.set_yscale('log')
g.set_xlabel("Total Power (watts)")
g.set_ylabel("Number of Jobs")
plt.xticks(ticks=[0,250,500,750,1000,1250,1500], rotation=30)
```
%% Cell type:markdown id: tags:
### Max (Figure 2 B)
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
import seaborn as sns
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
plt.clf()
plt.rc('figure', figsize=(8, 6))
plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
#g = sns.displot(x="total_power_max_watts", data=df_all_results)
g = sns.histplot(x="total_power_max_watts", data=df_all_results, bins=25, fill=False)
#g.ax.set_yscale('log')
g.set_xlabel("Total Power (watts)")
g.set_ylabel("Number of Jobs")
plt.xticks(ticks=[0,250,500,750,1000,1250,1500,1750,2000], rotation=30)
```
---
title: "Job power prediction result analysis"
author: "Danilo Carastan-Santos"
date: "2024-05-15"
output:
rmdformats::readthedown
---
## Processing the mean power prediction results
Outputs of script `run_prediction_per_user_allmethods_mean.py`.
```{python}
import pandas as pd
import seaborn as sns
import os
RESULTS_PATH = "../user-power-predictions/data/total_power_mean_predictions_users_allmethods_mean/"
PRED_COLS = ["hist_pred_total_power_mean",
"LinearRegression_total_power_mean_watts",
"RandomForestRegressor_total_power_mean_watts",
"LinearSVR_total_power_mean_watts",
"SGDRegressor_total_power_mean_watts"]
result_filenames = os.listdir(RESULTS_PATH)
df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames])
df_all_results = df_all_results.dropna(subset=PRED_COLS)
df_all_results
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
lst_users = df_all_results["user_id"].drop_duplicates().to_list()
#print(lst_users)
df_results_user_group = df_all_results.groupby("user_id")
lst_stats_per_user = []
for user in lst_users:
results_user = df_results_user_group.get_group(user)
hist_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["hist_pred_total_power_mean"])
LR_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["LinearRegression_total_power_mean_watts"])
RF_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["RandomForestRegressor_total_power_mean_watts"])
LSVR_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["LinearSVR_total_power_mean_watts"])
SGD_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["SGDRegressor_total_power_mean_watts"])
res = {"user_id": user,
"hist_mape": hist_mape,
"LinearRegression_mape": LR_mape,
"RandomForestRegressor_mape": RF_mape,
"LinearSVR_mape": LSVR_mape,
"SGDRegressor_mape": SGD_mape}
lst_stats_per_user.append(res)
#break
df_stats_per_user = pd.DataFrame(lst_stats_per_user)
df_stats_per_user
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user[COLS].describe()
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars="user_id")
df_stats_per_user_pivot
```
### Figure 3 (a)
```{python}
import matplotlib.pyplot as plt
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
#plt.rc('font', size=16) # controls default text sizes
plt.rc('font', size=20) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
plt.rc('figure', figsize=(8,4))
#g = sns.boxplot(x="variable", y="value", data=df_stats_per_user_pivot, showfliers=False)
#plt.xticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=30)
g = sns.boxplot(y="variable", x="value", data=df_stats_per_user_pivot, showfliers=False)
plt.yticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=0)
g.set_ylabel("Prediction Method")
g.set_xlabel("Mean Absolute Percentage Error (MAPE) ")
plt.tight_layout(pad=0)
plt.savefig("./fig3a-pred-mape-mean-power.svg")
```
## Processing the max power prediction results
Outputs of script `run_prediction_per_user_allmethods_max.py`.
```{python}
import pandas as pd
import seaborn as sns
import os
RESULTS_PATH = "../user-power-predictions/data/total_power_mean_predictions_users_allmethods_max/"
PRED_COLS = ["hist_pred_total_power_max",
"LinearRegression_total_power_max_watts",
"RandomForestRegressor_total_power_max_watts",
"LinearSVR_total_power_max_watts",
"SGDRegressor_total_power_max_watts"]
result_filenames = os.listdir(RESULTS_PATH)
df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames])
df_all_results = df_all_results.dropna(subset=PRED_COLS)
#df_all_results
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error
lst_users = df_all_results["user_id"].drop_duplicates().to_list()
#print(lst_users)
df_results_user_group = df_all_results.groupby("user_id")
lst_stats_per_user = []
for user in lst_users:
results_user = df_results_user_group.get_group(user)
hist_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["hist_pred_total_power_max"])
LR_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["LinearRegression_total_power_max_watts"])
RF_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["RandomForestRegressor_total_power_max_watts"])
LSVR_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["LinearSVR_total_power_max_watts"])
SGD_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["SGDRegressor_total_power_max_watts"])
res = {"user_id": user,
"hist_mape": hist_mape,
"LinearRegression_mape": LR_mape,
"RandomForestRegressor_mape": RF_mape,
"LinearSVR_mape": LSVR_mape,
"SGDRegressor_mape": SGD_mape}
lst_stats_per_user.append(res)
#break
df_stats_per_user = pd.DataFrame(lst_stats_per_user)
#df_stats_per_user
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user[COLS].describe()
COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"]
df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars="user_id")
df_stats_per_user_pivot
```
### Figure 3 (b)
```{python}
import matplotlib.pyplot as plt
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
plt.rc('font', size=20) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
plt.rc('figure', figsize=(8,4))
#g = sns.boxplot(x="variable", y="value", data=df_stats_per_user_pivot, showfliers=False)
#plt.xticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=30)
#g.set_xlabel("Prediction Method")
#g.set_ylabel("Mean Absolute Percentage Error (MAPE) ")
g = sns.boxplot(y="variable", x="value", data=df_stats_per_user_pivot, showfliers=False)
plt.yticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=0)
g.set_ylabel("Prediction Method")
g.set_xlabel("Mean Absolute Percentage Error (MAPE)")
plt.tight_layout(pad=0)
plt.savefig("./fig3b-pred-mape-max-power.svg")
```
## Getting the actual mean and max power distributions
### Mean: Figure 2 (a)
```{python}
import matplotlib.pyplot as plt
import seaborn as sns
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
plt.clf()
plt.rc('figure', figsize=(8, 6))
plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
plt.rc('figure', figsize=(6,4))
g = sns.histplot(x="total_power_mean_watts", data=df_all_results, bins=25, fill=False)
#g.ax.set_yscale('log')
g.set_xlabel("Total Power (watts)")
g.set_ylabel("Number of Jobs")
plt.xticks(ticks=[0,250,500,750,1000,1250,1500], rotation=30)
plt.tight_layout(pad=0)
plt.savefig("./fig2a-distrib-job-power-mean.svg")
```
### Max : Figure 2 (b)
```{python}
import matplotlib.pyplot as plt
import seaborn as sns
TINY_SIZE = 2
SMALL_SIZE = 5
MEDIUM_SIZE = 20
BIGGER_SIZE = 50
FIG_WIDTH = 40
FIG_HEIGHT = 10
plt.clf()
plt.rc('figure', figsize=(8, 6))
plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title
plt.rc('figure', figsize=(6,4))
#g = sns.displot(x="total_power_max_watts", data=df_all_results)
g = sns.histplot(x="total_power_max_watts", data=df_all_results, bins=25, fill=False)
#g.ax.set_yscale('log')
g.set_xlabel("Total Power (watts)")
g.set_ylabel("Number of Jobs")
plt.xticks(ticks=[0,250,500,750,1000,1250,1500,1750,2000], rotation=30)
plt.tight_layout(pad=0)
plt.savefig("./fig2b-distrib-job-power-max.svg")
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment