Skip to content
Snippets Groups Projects

Multi-behavior experiment

This directory contains the necessary script for the experiment with batmen multi-behavior class using metacentrum dataset. It is a fork from demand-response-user (https://gitlab.irit.fr/sepia-pub/open-science/demand-response-user).

Description of the main files

  • scripts/prepare_workload.sh shell script to prepare the workload
  • scripts/run_expe.sh shell script which launch experiments when workload is prepared.
  • scripts/compute_stat.sh shell script which compute stats when the experiments finished.
  • default.nix nix file given all the necessary dependencies.
  • campaign.py: Python script preparing and launching in parrallel the experiments. Each experiment corresponds to one instance of instance.py.
  • analyse_campaign.ipynb: Jupyter notebook analysing the results.

Steps to reproduce the experiments

The version used for the experiments is the version tagged experiments-version. Once the repository clone you can type this command to switch to the tagged version :

git checkout tags/experiments-version

To reproduce the following steps make sure to be on this version.

1. Install dependencies

The main software used (and configured in the file default.nix) are:

  • Batsim and SimGrid for the infrastructure simulation
  • Batmen: our set of schedulers for batsim and plugin to simulate users
  • Batmen-tools : a set of tools to manipulate swf files
  • python3, pandas, jupyter, matplotlib etc. for the data analysis

All the dependencies of the project are given in the default.nix file. If you do not have Nix installed on your machine, you can install it with

scripts/install_nix.sh

It might be required to type other commands for the nix command to be available in the current shell. In this case it will be indicated by the prompt of the nix installation.

Open a shell with all the dependencies managed:

nix-shell -A exp_env --pure

This will compile and install all the dependencies needed for the experiments. It can take some time (in our case, it took 6 minutes).

2. Prepare input workload

Inside the nix-shell, run the following script to download (from the Parallel Workloads Archive) and filter the input workload used in the experiments:

scripts/prepare_workload.sh

3. Launch the campaign

By default the run_expe scripts only run one seed by experiment to do the 30 experiments you have to modify --nb_replicat argument it should look like this:

python3 campaign.py --nb-replicat 30 --expe-range -1 --window-mode 8 --nb-days 164 \
--json-behavior behavior_file/big_effort.json behavior_file/low_effort.json behavior_file/max_effort.json behavior_file/medium_effort.json \
--compress-mode --production-file data_energy/energy_trace_sizing_solar.csv data_energy/energy_trace_sizing_solar.csv

As every experiment can take up to 20 GB of RAM, you might be limited by the memory of your system. When you are running the experiments you can limit the number of parallel run using --threads n command-line argument with n the maximum of experiments to run in parallel. By default, it uses every physical cores available.

Once you have done all the previous steps, launch the bash script in the nix-shell :

scripts/run_expe.sh

If you see this line :

Rigid finished, Now monolithic behavior

It means that the program has finished computing the simulation and is now computing stat on the obtained value. You can stop the program now if you only want raw simulation results

4. Generate the metrics

The tagged experiments-version version forget some metrics in computation. After the experiments, to compute the metrics we have to switch to tag metrics-compute-version, and then compute the desired metrics using the command :

scripts/compute_stat.sh

The stat will be computed and place in the shell directory with the names campaign_metrics.csv and campaign_metrics_relative.csv. It will likely differ from the one provided in result_big_expe. In our reproduction we noticed a relative difference of 0.5% in energy related metrics and 2% in user behaviors related metrics

5. Generate the graph

To generate the graph you can launch the notebook analyse_campaign.ipynb. You will have to change the variable RAW_DATA_DIR and OUT_DIR to match with your setup.

Tips

With the experiments, there are various scripts_file provided to help you manage the data of experiments :

  • scripts/compress_out.sh out/ out.tar.zst allow you to compress the out_dir in a tar file for archive purpose. In our case, we divided by 7 the space used by the experiments result.
  • scripts/sync_expe_out.sh out/ path_to_backup/ allow you to do backup of simulation data in a backup directory. It uses rsync so it will only write changes if you do twice this command.

Energy Data

The provided example_trace_occitanie_2019.csv comes from a modification of the Open Data Réseaux Electrique energy production dataset for Occitanie in 2019 ( the original file can be directly downloaded here). The provided energy_trace_sizing_solar.csv is the energy trace of the energy produced by DataZero2 sizing algorithm

Advanced options

Inside the nix shell exp_env, launch the commands :

python3 campaign.py --help

You will get a details of every possible arguments. You will have to at least give the following arguments :

  • --expe-range to provide the expe to do for all available experiment type --expe-range -1 else provide the list of experiment with experiments [0,1] it will be --expe-range 0 1

Information about experiments

The experiments took 7 hours to do on two Intel Xeon Gold 6130 and the output took 55 GB to store, once compress using scripts/compress-out.sh it takes 7.8 GB. As each experiments took approximately 20 GB of RAM and we had 188GB available we weren't able to exploit at full capacity the CPU, so with more memory we could have better speed.

List of metrics

The metrics computed for the experiments are the following ( available in result-big-expe) :

  • XP,dir, behavior, seed, the experiment number in our case it is always 0, the directory from which the data where computed, the behavior the user have (probability distribution can be found in behavior_file), the seed the random generator used for computation.
  • #jobs the number of jobs submitted, renounced jobs are not accounted in this total.
  • Duration_red (seconds), Duration_yellow (seconds), Duration_total (seconds) the duration in which the red state occured, yellow state occured, the total measures were done
  • NRJ_red (Joules), NRJ_yellow (Joules), NRJ_total (Joules) the energy consumed in red state, yellow state and during the whole duration of the experiments
  • mean_waiting_time, max_waiting_time, mean_slowdown, max_slowdown. The slowdown and waiting time metrics computed by batsim it doesn't take into account the user behavior
  • energy overproduced (Joules), energy underproduced (Joules) Computation of the energy differences between consumption and production. When the energy produced is higher than the energy consumed, the absolute difference is added to overproduced energy. When the energy produced is lower than the energy consumed, the absolute difference is added to overproduced energy.
  • energy balance (Joules), it is the computation of the difference between the energy produced and energy consumed. It has been computed by two ways : using overproduction and underproduced (1), compute the total energy produced - total energy consumed (2). Relative Accuracy is the difference between both computation normalized.
  • true_rigid_jobs is the number of jobs that was unmodified.
  • mean_delay, max_delay the amount of time the jobs have been delayed by a see_you_later behavior
  • renonce_jobs,reconfig_jobs,degrad_jobs,rigid_jobs the number of jobs that was renounced, reconfigured, degraded, submitted rigidly (might have been delayed by see_you_later)
  • number_of_see_you_later,C_you_later_jobs, the number of see_you_later (a jobs can be subject to more than one see_you_later), the number of jobs that get a see_you_later. It was computed using two ways, only looking for see_you_later in logged behavior (1), check if the original submission time is the real submission time in the jobs batsim record (2).
  • mean_corrected_wtime, max_corrected_wtime, mean_corrected_sdown, max_corrected_sdown. Compute the waiting time and slowdown while taking into account of behaviors (submission time is the original one without see you later, reconfig jobs execution time is the one before reconfiguring) It has been computed by using behaviors stat and jobs data (1), or by crossing the data with the rigid case (2). As the difference (sanity) are quite big we are not sure of the accuracy of these data computed.