diff --git a/artifact-overview.typ b/artifact-overview.typ index 388c1de6ba8203d72aec28496f0a28962a31a48c..f84abd20b76b4d92416a93f12f09bae4f8511828 100644 --- a/artifact-overview.typ +++ b/artifact-overview.typ @@ -281,12 +281,11 @@ The step-by-step instructions of this document can be used in several ways *depe + You can *check* the final analyses (code + plots) done in Article @lightpredenergy by reading the provided pre-rendered notebooks available on #link(zenodo-url)[Zenodo]. + You can *reproduce* the *final analyses* by first downloading the provided aggregated results of the experiments from #link(zenodo-url)[Zenodo], and then by running the notebooks yourself. This enables you to *edit* our notebooks before running them, so that you can to modify the analyses done or add your own. - // - Refer to #todo[link to Danilo's notebook section] for the machine learning experiment. + - Refer to @sec-analyze-prediction-results for instructions to analyze the results of the machine learning experiment. - Refer to @sec-analyze-simu-campaign-results for instructions to analyze the results of the scheduling experiment. + You can *reproduce* our *experimental campaigns* by downloading the provided experiment input files from #link(zenodo-url)[Zenodo], and then by running the experiment yourself. This can enable you to make sure that our experiment can be reproduced with the *exact same parameters and configuration*. - //- Refer to #todo[link to Danilo's expe section?] for the machine learning experiment. - Refer to @sec-run-simu-campaign for instructions to reproduce the scheduling experiment. + You can *fully reproduce* our *experimental campaigns* by downloading original traces of the Marconi100, by generating the experimental campaigns parameters yourself (enabling you to hack provided command-line parameters or provided code), @@ -453,7 +452,7 @@ nix develop .#py-scripts --command m100-pred-merge-jobfiles -d ./m100-data/ ] === Compressing prediction output into single files -The expected output data of has been stored on #link(zenodo-url)[Zenodo]. +The expected output data has been stored on #link(zenodo-url)[Zenodo]. #fullbox(footer:[Disk: 82 Mo.])[ ```sh @@ -469,19 +468,45 @@ The expected output data of has been stored on #link(zenodo-url)[Zenodo]. )) ] -== Analyzing prediction results +== Analyzing prediction results <sec-analyze-prediction-results> +This analysis requires that the two job power prediction archives (outputs of @sec-job-power-pred, available on #link(zenodo-url)[Zenodo]) are available on your disk in the `./user-power-predictions` directory. +The following command populates the `./user-power-predictions/data` by extracting the archives and uncompressing all the required files on your disk. -=== Required data -Output from the previous section +#fullbox(footer: [Disk: 519 Mo. Time: 00:00:05.])[ + ```sh + mkdir ./user-power-predictions/data + nix develop .#merge-m100-power-predictions --command \ + tar xf ./user-power-predictions/*mean.tar.gz --directory ./user-power-predictions/data + nix develop .#merge-m100-power-predictions --command \ + tar xf ./user-power-predictions/*max.tar.gz --directory ./user-power-predictions/data + nix develop .#merge-m100-power-predictions --command \ + gunzip ./user-power-predictions/data/*/*.gz + ``` +] -- `m100-data/power_pred_users_allmethods_mean.tar.gz`, the jobs mean power predictions. -- `m100-data/power_pred_users_allmethods_max.tar.gz`, the jobs maximum power predictions. +The analysis of the predictions, which also generates Figures 2 and 3 of Article @lightpredenergy, can be reproduced with the following command. -=== Reproducing the paper's plots +#fullbox(footer:[Time (laptop): 00:00:20.])[ + ```sh + nix develop .#r-py-notebook --command \ + Rscript notebooks/run-rmarkdown-notebook.R \ + notebooks/prediction-results-analysis.Rmd + ``` -Please refer to this #link("./notebooks/m100_process_prediction_results.ipynb")[Notebook] for -the scripts to reproduce the paper's plots, notably Figures 2 and 3. + #filehashes(( + "6ce534dd1bf017444f05c354a1b3b767", "notebooks/fig2a-distrib-job-power-mean.svg", + "0b1cdfcf017c2cba7057a544e19cd698", "notebooks/fig2b-distrib-job-power-max.svg", + "0bc88e65ae495a8d6ec7d3cbcfca12ae", "notebooks/fig3a-pred-mape-mean-power.svg", + "a19b1a7c5dc72ec73a5349d85fc68fa3", "notebooks/fig3b-pred-mape-max-power.svg", + "04c2d5ef412b791a4d5515ec0719b3d0", "notebooks/prediction-results-analysis.html", + ), fill: (x, y) => { + if y > 0 { red.lighten(80%) } + }, + ) + We could not make HTML notebooks and Python-generated images binary reproducible despite our best efforts. + Their content should be completely reproducible though. +] == Job scheduling with power prediction <sec-sched> This section shows how to reproduce Sections 6.4 and 6.5 of article @lightpredenergy. @@ -601,7 +626,7 @@ Required input files. - The `/tmp/wlds` directory (#emph-overhead[1.4 Go]) that contains all the workload files (output of @sec-gen-workloads). Please note that all input files can be downloaded from #link(zenodo-url)[Zenodo] if you have not generated them yourself. -In particular to populate the `/tmp/wlds`directory you can *download file* `workloads.tar.xz` and then *extract it* into `/tmp/` via a command such as the following. `tar xf workloads.tar.xz --directory=/tmp/` +In particular to populate the `/tmp/wlds` directory you can *download file* `workloads.tar.xz` and then *extract it* into `/tmp/` via a command such as the following. `tar xf workloads.tar.xz --directory=/tmp/` #fullbox(footer: [#emph-overhead[Disk: 7.6 Go.] Time: 00:06:00.])[ ```sh diff --git a/flake.nix b/flake.nix index bec15ed36dda353cad3d26c9c1abdd1c1ff88db9..80340f5ec05447d4618fcd329c34085ffac34b2a 100644 --- a/flake.nix +++ b/flake.nix @@ -85,7 +85,7 @@ }); easypower-sched-lib = easy-powercap-pkgs.easypower; }; - devShells = { + devShells = rec { download-m100-months = pkgs.mkShell { buildInputs = [ packages.python-scripts @@ -127,6 +127,14 @@ pkgs.rPackages.viridis ]; }; + r-py-notebook = pkgs.mkShell { + buildInputs = r-notebook.buildInputs ++ [ + pkgs.rPackages.reticulate + pyPkgs.pandas + pyPkgs.seaborn + pyPkgs.scikit-learn + ]; + }; typst-shell = pkgs.mkShell { buildInputs = [ typstpkgs.typst-dev diff --git a/notebooks/m100_process_prediction_results.ipynb b/notebooks/m100_process_prediction_results.ipynb deleted file mode 100644 index 6540dc68f5e06f3d2eedba89fb2783403b6b1029..0000000000000000000000000000000000000000 --- a/notebooks/m100_process_prediction_results.ipynb +++ /dev/null @@ -1,387 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Processing the mean power prediction results (script `run_prediction_per_user_allmethods_mean.py`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "import seaborn as sns\n", - "\n", - "import os\n", - "\n", - "RESULTS_PATH = \"../m100-data/total_power_mean_predictions_users_allmethods_mean/\"\n", - "PRED_COLS = [\"hist_pred_total_power_mean\",\n", - " \"LinearRegression_total_power_mean_watts\",\n", - " \"RandomForestRegressor_total_power_mean_watts\", \n", - " \"LinearSVR_total_power_mean_watts\", \n", - " \"SGDRegressor_total_power_mean_watts\"]\n", - "\n", - "\n", - "result_filenames = os.listdir(RESULTS_PATH)\n", - "\n", - "df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames])\n", - "\n", - "df_all_results = df_all_results.dropna(subset=PRED_COLS)\n", - "df_all_results\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error\n", - "\n", - "lst_users = df_all_results[\"user_id\"].drop_duplicates().to_list()\n", - "#print(lst_users)\n", - "\n", - "df_results_user_group = df_all_results.groupby(\"user_id\")\n", - "\n", - "lst_stats_per_user = []\n", - "\n", - "for user in lst_users:\n", - " results_user = df_results_user_group.get_group(user)\n", - " hist_mape = mean_absolute_percentage_error(results_user[\"total_power_mean_watts\"], results_user[\"hist_pred_total_power_mean\"])\n", - " LR_mape = mean_absolute_percentage_error(results_user[\"total_power_mean_watts\"], results_user[\"LinearRegression_total_power_mean_watts\"])\n", - " RF_mape = mean_absolute_percentage_error(results_user[\"total_power_mean_watts\"], results_user[\"RandomForestRegressor_total_power_mean_watts\"])\n", - " LSVR_mape = mean_absolute_percentage_error(results_user[\"total_power_mean_watts\"], results_user[\"LinearSVR_total_power_mean_watts\"])\n", - " SGD_mape = mean_absolute_percentage_error(results_user[\"total_power_mean_watts\"], results_user[\"SGDRegressor_total_power_mean_watts\"])\n", - " res = {\"user_id\": user, \n", - " \"hist_mape\": hist_mape, \n", - " \"LinearRegression_mape\": LR_mape, \n", - " \"RandomForestRegressor_mape\": RF_mape, \n", - " \"LinearSVR_mape\": LSVR_mape,\n", - " \"SGDRegressor_mape\": SGD_mape}\n", - " lst_stats_per_user.append(res)\n", - " #break\n", - "\n", - "df_stats_per_user = pd.DataFrame(lst_stats_per_user)\n", - "df_stats_per_user\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "COLS = [\"hist_mape\",\"LinearRegression_mape\",\"RandomForestRegressor_mape\",\"LinearSVR_mape\",\"SGDRegressor_mape\"]\n", - "\n", - "df_stats_per_user[COLS].describe()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "COLS = [\"hist_mape\",\"LinearRegression_mape\",\"RandomForestRegressor_mape\",\"LinearSVR_mape\",\"SGDRegressor_mape\"]\n", - "\n", - "df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars=\"user_id\")\n", - "df_stats_per_user_pivot" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Figure 3 A" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "import matplotlib.pyplot as plt\n", - "\n", - "TINY_SIZE = 2\n", - "SMALL_SIZE = 5\n", - "MEDIUM_SIZE = 20\n", - "BIGGER_SIZE = 50\n", - "FIG_WIDTH = 40\n", - "FIG_HEIGHT = 10\n", - "\n", - "\n", - "#plt.rc('font', size=16) # controls default text sizes\n", - "plt.rc('font', size=20) # controls default text sizes\n", - "plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title\n", - "plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels\n", - "plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize\n", - "plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title\n", - "\n", - "#g = sns.boxplot(x=\"variable\", y=\"value\", data=df_stats_per_user_pivot, showfliers=False)\n", - "#plt.xticks(ticks=[0,1,2,3,4],labels=[\"History\", \"LinearRegression\", \"RandomForest\", \"LinearSVR\", \"SGDRegressor\"],rotation=30)\n", - "g = sns.boxplot(y=\"variable\", x=\"value\", data=df_stats_per_user_pivot, showfliers=False)\n", - "plt.yticks(ticks=[0,1,2,3,4],labels=[\"History\", \"LinearRegression\", \"RandomForest\", \"LinearSVR\", \"SGDRegressor\"],rotation=0)\n", - "\n", - "g.set_ylabel(\"Prediction Method\")\n", - "g.set_xlabel(\"Mean Absolute Percentage Error (MAPE) \")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Processing the max power prediction results (script `run_prediction_per_user_allmethods_max.py`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "import seaborn as sns\n", - "\n", - "import os\n", - "\n", - "RESULTS_PATH = \"./m100-data/total_power_mean_predictions_users_allmethods_max/\"\n", - "\n", - "PRED_COLS = [\"hist_pred_total_power_max\",\n", - " \"LinearRegression_total_power_max_watts\",\n", - " \"RandomForestRegressor_total_power_max_watts\", \n", - " \"LinearSVR_total_power_max_watts\", \n", - " \"SGDRegressor_total_power_max_watts\"]\n", - "\n", - "\n", - "result_filenames = os.listdir(RESULTS_PATH)\n", - "\n", - "df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames])\n", - "\n", - "df_all_results = df_all_results.dropna(subset=PRED_COLS)\n", - "df_all_results" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error\n", - "\n", - "lst_users = df_all_results[\"user_id\"].drop_duplicates().to_list()\n", - "#print(lst_users)\n", - "\n", - "df_results_user_group = df_all_results.groupby(\"user_id\")\n", - "\n", - "lst_stats_per_user = []\n", - "\n", - "for user in lst_users:\n", - " results_user = df_results_user_group.get_group(user)\n", - " hist_mape = mean_absolute_percentage_error(results_user[\"total_power_max_watts\"], results_user[\"hist_pred_total_power_max\"])\n", - " LR_mape = mean_absolute_percentage_error(results_user[\"total_power_max_watts\"], results_user[\"LinearRegression_total_power_max_watts\"])\n", - " RF_mape = mean_absolute_percentage_error(results_user[\"total_power_max_watts\"], results_user[\"RandomForestRegressor_total_power_max_watts\"])\n", - " LSVR_mape = mean_absolute_percentage_error(results_user[\"total_power_max_watts\"], results_user[\"LinearSVR_total_power_max_watts\"])\n", - " SGD_mape = mean_absolute_percentage_error(results_user[\"total_power_max_watts\"], results_user[\"SGDRegressor_total_power_max_watts\"])\n", - " res = {\"user_id\": user, \n", - " \"hist_mape\": hist_mape, \n", - " \"LinearRegression_mape\": LR_mape, \n", - " \"RandomForestRegressor_mape\": RF_mape, \n", - " \"LinearSVR_mape\": LSVR_mape,\n", - " \"SGDRegressor_mape\": SGD_mape}\n", - " lst_stats_per_user.append(res)\n", - " #break\n", - "\n", - "df_stats_per_user = pd.DataFrame(lst_stats_per_user)\n", - "df_stats_per_user" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "COLS = [\"hist_mape\",\"LinearRegression_mape\",\"RandomForestRegressor_mape\",\"LinearSVR_mape\",\"SGDRegressor_mape\"]\n", - "\n", - "df_stats_per_user[COLS].describe()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "COLS = [\"hist_mape\",\"LinearRegression_mape\",\"RandomForestRegressor_mape\",\"LinearSVR_mape\",\"SGDRegressor_mape\"]\n", - "\n", - "df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars=\"user_id\")\n", - "df_stats_per_user_pivot" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Figure 3 B" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "import matplotlib.pyplot as plt\n", - "\n", - "TINY_SIZE = 2\n", - "SMALL_SIZE = 5\n", - "MEDIUM_SIZE = 20\n", - "BIGGER_SIZE = 50\n", - "FIG_WIDTH = 40\n", - "FIG_HEIGHT = 10\n", - "\n", - "\n", - "plt.rc('font', size=20) # controls default text sizes\n", - "plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title\n", - "plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels\n", - "plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize\n", - "plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title\n", - "\n", - "#g = sns.boxplot(x=\"variable\", y=\"value\", data=df_stats_per_user_pivot, showfliers=False)\n", - "#plt.xticks(ticks=[0,1,2,3,4],labels=[\"History\", \"LinearRegression\", \"RandomForest\", \"LinearSVR\", \"SGDRegressor\"],rotation=30)\n", - "#g.set_xlabel(\"Prediction Method\")\n", - "#g.set_ylabel(\"Mean Absolute Percentage Error (MAPE) \")\n", - "\n", - "g = sns.boxplot(y=\"variable\", x=\"value\", data=df_stats_per_user_pivot, showfliers=False)\n", - "plt.yticks(ticks=[0,1,2,3,4],labels=[\"History\", \"LinearRegression\", \"RandomForest\", \"LinearSVR\", \"SGDRegressor\"],rotation=0)\n", - "g.set_ylabel(\"Prediction Method\")\n", - "g.set_xlabel(\"Mean Absolute Percentage Error (MAPE)\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Getting the actual mean and max power distributions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Mean (Figure 2 A)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "\n", - "TINY_SIZE = 2\n", - "SMALL_SIZE = 5\n", - "MEDIUM_SIZE = 20\n", - "BIGGER_SIZE = 50\n", - "FIG_WIDTH = 40\n", - "FIG_HEIGHT = 10\n", - "\n", - "plt.clf()\n", - "\n", - "plt.rc('figure', figsize=(8, 6))\n", - "plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes\n", - "plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title\n", - "plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels\n", - "plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize\n", - "plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title\n", - "\n", - "g = sns.histplot(x=\"total_power_mean_watts\", data=df_all_results, bins=25, fill=False)\n", - "#g.ax.set_yscale('log')\n", - "g.set_xlabel(\"Total Power (watts)\")\n", - "g.set_ylabel(\"Number of Jobs\")\n", - "plt.xticks(ticks=[0,250,500,750,1000,1250,1500], rotation=30)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Max (Figure 2 B)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "import seaborn as sns\n", - "\n", - "TINY_SIZE = 2\n", - "SMALL_SIZE = 5\n", - "MEDIUM_SIZE = 20\n", - "BIGGER_SIZE = 50\n", - "FIG_WIDTH = 40\n", - "FIG_HEIGHT = 10\n", - "\n", - "plt.clf()\n", - "\n", - "plt.rc('figure', figsize=(8, 6))\n", - "plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes\n", - "plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title\n", - "plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels\n", - "plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels\n", - "plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize\n", - "plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title\n", - "\n", - "#g = sns.displot(x=\"total_power_max_watts\", data=df_all_results)\n", - "g = sns.histplot(x=\"total_power_max_watts\", data=df_all_results, bins=25, fill=False)\n", - "\n", - "#g.ax.set_yscale('log')\n", - "g.set_xlabel(\"Total Power (watts)\")\n", - "g.set_ylabel(\"Number of Jobs\")\n", - "plt.xticks(ticks=[0,250,500,750,1000,1250,1500,1750,2000], rotation=30)\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.9" - }, - "orig_nbformat": 4 - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/notebooks/prediction-results-analysis.Rmd b/notebooks/prediction-results-analysis.Rmd new file mode 100644 index 0000000000000000000000000000000000000000..e20a77bb435d13092d2c97891a8fb78939b22d24 --- /dev/null +++ b/notebooks/prediction-results-analysis.Rmd @@ -0,0 +1,265 @@ +--- +title: "Job power prediction result analysis" +author: "Danilo Carastan-Santos" +date: "2024-05-15" +output: + rmdformats::readthedown +--- + +## Processing the mean power prediction results +Outputs of script `run_prediction_per_user_allmethods_mean.py`. + +```{python} +import pandas as pd +import seaborn as sns + +import os + +RESULTS_PATH = "../user-power-predictions/data/total_power_mean_predictions_users_allmethods_mean/" +PRED_COLS = ["hist_pred_total_power_mean", + "LinearRegression_total_power_mean_watts", + "RandomForestRegressor_total_power_mean_watts", + "LinearSVR_total_power_mean_watts", + "SGDRegressor_total_power_mean_watts"] + + +result_filenames = os.listdir(RESULTS_PATH) + +df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames]) + +df_all_results = df_all_results.dropna(subset=PRED_COLS) +df_all_results + + +from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error + +lst_users = df_all_results["user_id"].drop_duplicates().to_list() +#print(lst_users) + +df_results_user_group = df_all_results.groupby("user_id") + +lst_stats_per_user = [] + +for user in lst_users: + results_user = df_results_user_group.get_group(user) + hist_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["hist_pred_total_power_mean"]) + LR_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["LinearRegression_total_power_mean_watts"]) + RF_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["RandomForestRegressor_total_power_mean_watts"]) + LSVR_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["LinearSVR_total_power_mean_watts"]) + SGD_mape = mean_absolute_percentage_error(results_user["total_power_mean_watts"], results_user["SGDRegressor_total_power_mean_watts"]) + res = {"user_id": user, + "hist_mape": hist_mape, + "LinearRegression_mape": LR_mape, + "RandomForestRegressor_mape": RF_mape, + "LinearSVR_mape": LSVR_mape, + "SGDRegressor_mape": SGD_mape} + lst_stats_per_user.append(res) + #break + +df_stats_per_user = pd.DataFrame(lst_stats_per_user) +df_stats_per_user + +COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"] +df_stats_per_user[COLS].describe() + +COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"] +df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars="user_id") +df_stats_per_user_pivot +``` + +### Figure 3 (a) +```{python} +import matplotlib.pyplot as plt + +TINY_SIZE = 2 +SMALL_SIZE = 5 +MEDIUM_SIZE = 20 +BIGGER_SIZE = 50 +FIG_WIDTH = 40 +FIG_HEIGHT = 10 + +#plt.rc('font', size=16) # controls default text sizes +plt.rc('font', size=20) # controls default text sizes +plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title +plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels +plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize +plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title +plt.rc('figure', figsize=(8,4)) + +#g = sns.boxplot(x="variable", y="value", data=df_stats_per_user_pivot, showfliers=False) +#plt.xticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=30) +g = sns.boxplot(y="variable", x="value", data=df_stats_per_user_pivot, showfliers=False) +plt.yticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=0) + +g.set_ylabel("Prediction Method") +g.set_xlabel("Mean Absolute Percentage Error (MAPE) ") +plt.tight_layout(pad=0) +plt.savefig("./fig3a-pred-mape-mean-power.svg") +``` + +## Processing the max power prediction results +Outputs of script `run_prediction_per_user_allmethods_max.py`. + +```{python} +import pandas as pd +import seaborn as sns + +import os + +RESULTS_PATH = "../user-power-predictions/data/total_power_mean_predictions_users_allmethods_max/" + +PRED_COLS = ["hist_pred_total_power_max", + "LinearRegression_total_power_max_watts", + "RandomForestRegressor_total_power_max_watts", + "LinearSVR_total_power_max_watts", + "SGDRegressor_total_power_max_watts"] + + +result_filenames = os.listdir(RESULTS_PATH) + +df_all_results = pd.concat([pd.read_csv(RESULTS_PATH+filename, low_memory=False) for filename in result_filenames]) + +df_all_results = df_all_results.dropna(subset=PRED_COLS) +#df_all_results + + +from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error + +lst_users = df_all_results["user_id"].drop_duplicates().to_list() +#print(lst_users) + +df_results_user_group = df_all_results.groupby("user_id") + +lst_stats_per_user = [] + +for user in lst_users: + results_user = df_results_user_group.get_group(user) + hist_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["hist_pred_total_power_max"]) + LR_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["LinearRegression_total_power_max_watts"]) + RF_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["RandomForestRegressor_total_power_max_watts"]) + LSVR_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["LinearSVR_total_power_max_watts"]) + SGD_mape = mean_absolute_percentage_error(results_user["total_power_max_watts"], results_user["SGDRegressor_total_power_max_watts"]) + res = {"user_id": user, + "hist_mape": hist_mape, + "LinearRegression_mape": LR_mape, + "RandomForestRegressor_mape": RF_mape, + "LinearSVR_mape": LSVR_mape, + "SGDRegressor_mape": SGD_mape} + lst_stats_per_user.append(res) + #break + +df_stats_per_user = pd.DataFrame(lst_stats_per_user) +#df_stats_per_user + +COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"] +df_stats_per_user[COLS].describe() + +COLS = ["hist_mape","LinearRegression_mape","RandomForestRegressor_mape","LinearSVR_mape","SGDRegressor_mape"] +df_stats_per_user_pivot = pd.melt(df_stats_per_user, id_vars="user_id") +df_stats_per_user_pivot +``` + +### Figure 3 (b) +```{python} +import matplotlib.pyplot as plt + +TINY_SIZE = 2 +SMALL_SIZE = 5 +MEDIUM_SIZE = 20 +BIGGER_SIZE = 50 +FIG_WIDTH = 40 +FIG_HEIGHT = 10 + + +plt.rc('font', size=20) # controls default text sizes +plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title +plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels +plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize +plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title +plt.rc('figure', figsize=(8,4)) + +#g = sns.boxplot(x="variable", y="value", data=df_stats_per_user_pivot, showfliers=False) +#plt.xticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=30) +#g.set_xlabel("Prediction Method") +#g.set_ylabel("Mean Absolute Percentage Error (MAPE) ") + +g = sns.boxplot(y="variable", x="value", data=df_stats_per_user_pivot, showfliers=False) +plt.yticks(ticks=[0,1,2,3,4],labels=["History", "LinearRegression", "RandomForest", "LinearSVR", "SGDRegressor"],rotation=0) +g.set_ylabel("Prediction Method") +g.set_xlabel("Mean Absolute Percentage Error (MAPE)") +plt.tight_layout(pad=0) +plt.savefig("./fig3b-pred-mape-max-power.svg") +``` + +## Getting the actual mean and max power distributions +### Mean: Figure 2 (a) +```{python} +import matplotlib.pyplot as plt +import seaborn as sns + +TINY_SIZE = 2 +SMALL_SIZE = 5 +MEDIUM_SIZE = 20 +BIGGER_SIZE = 50 +FIG_WIDTH = 40 +FIG_HEIGHT = 10 + +plt.clf() + +plt.rc('figure', figsize=(8, 6)) +plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes +plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title +plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels +plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize +plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title +plt.rc('figure', figsize=(6,4)) + +g = sns.histplot(x="total_power_mean_watts", data=df_all_results, bins=25, fill=False) +#g.ax.set_yscale('log') +g.set_xlabel("Total Power (watts)") +g.set_ylabel("Number of Jobs") +plt.xticks(ticks=[0,250,500,750,1000,1250,1500], rotation=30) +plt.tight_layout(pad=0) +plt.savefig("./fig2a-distrib-job-power-mean.svg") +``` + +### Max : Figure 2 (b) +```{python} +import matplotlib.pyplot as plt +import seaborn as sns + +TINY_SIZE = 2 +SMALL_SIZE = 5 +MEDIUM_SIZE = 20 +BIGGER_SIZE = 50 +FIG_WIDTH = 40 +FIG_HEIGHT = 10 + +plt.clf() + +plt.rc('figure', figsize=(8, 6)) +plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes +plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title +plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels +plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('ytick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels +plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize +plt.rc('figure', titlesize=MEDIUM_SIZE) # fontsize of the figure title +plt.rc('figure', figsize=(6,4)) + +#g = sns.displot(x="total_power_max_watts", data=df_all_results) +g = sns.histplot(x="total_power_max_watts", data=df_all_results, bins=25, fill=False) + +#g.ax.set_yscale('log') +g.set_xlabel("Total Power (watts)") +g.set_ylabel("Number of Jobs") +plt.xticks(ticks=[0,250,500,750,1000,1250,1500,1750,2000], rotation=30) +plt.tight_layout(pad=0) +plt.savefig("./fig2b-distrib-job-power-max.svg") +```