Measure Energy of Flower FL in G5K
This project provides tools to measure the energy consumption of Flower-based federated learning (FL) experiments on the Grid'5000 (G5K) testbed. It includes scripts to manage distributed nodes, run FL experiments, and monitor energy usage.
Table of Contents
Getting Started
The repository includes an example of Flower (using TensorFlow) in the Flower_v1
directory and the source of measuring framework in Run
. This example demonstrates how to use this framework to measure energy consumption.
Installation
Clone the repository and navigate to the eflwr
directory:
git clone https://gitlab.irit.fr/sepia-pub/delight/eflwr.git
cd eflwr
This framework requires:
- Python 3.9.2 or higher.
- Additional dependencies listed in
requirements.txt
. Install them with:pip install -r requirements.txt
Note: You also need to install tensorflow
, tensorflow-datasets
scikit-learn
and numpy
if you want to run the provided Flower example.
Navigate to Run
directory:
cd Run
Usage
FL framework
FL scripts (includes server and client scripts) can be updated, example in dir Flower_v1
.
Configure instance for CPU
Configure instances of experiment in a json format, structure is shown below.
- instances includes "1", "2" ,... are identifies of each instance.
- instance: name of instance.
- output_dir: location stores the output files (experiment log and energy monitoring output).
-
dvfs_cpu: choose only one in 3 settings.
-
dummy
: for testing in min and max CPU freq (false
ortrue
). -
baseline
: for testing in max CPU freq (false
ortrue
). -
frequencies
: Limits to the provided list of frequencies (null
orint list []
).Remark: check the available frequencies before using oftion
frequencies
.- Set the permissions and disable Turbo Boost first:
bash "$(python3 -c "import expetator, os; print(os.path.join(os.path.dirname(expetator.__file__), 'leverages', 'dvfs_pct.sh'))")" init
- Run this command to get available frequencies:
python3 get_frequencies.py
- Update extraced frequencies value to configure files.
-
- Structure of json config:
{ "instances": { "1": { "instance": "", "output_dir": "", "dvfs_cpu": { "dummy": true, "baseline": false, "frequencies": null }, "server": { "command": "python3", "args": [ ], "ip": "", "port": 8080 }, "clients": [ { "name": "client1", "command": "python3", "args": [ ], "ip": "" }, {...}, {...} ] }, "2": { "instance": "", ... } } }
Configure instance for GPU
-
The configuration is as same CPU, except dvfs role. In GPU config, the role is dvfs_gpu.
Choose only one in 3 settings (steps - zoomfrom - zoomto use for one setting).-
dummy
: for testing in min and max GPU freq (false
ortrue
). -
baseline
: for testing in max GPU freq (false
ortrue
). -
steps
: steps to jump in range/window of frequencies (int). -
zoomfrom
: freq start -
zoomto
: freq stop
"dvfs_gpu": { "dummy": true, "baseline": false, "steps": 2, "zoomfrom": 0, "zoomto": 0 },
-
Run exp
2 options of experiment: run single instance or all instances (a campaign).
Run single instance:
python3 measure_instance.py -c [config_file] -i [instance] -x [experiment_name] -r [repetitions]
- [config_file]: The instances configuration file.
- [instance] : Identify number of single instance.
- [experiment_name]: The name you use to identify your experiment.
- [repetitions]: Number of repetitions for the experiment.
Run campaign:
python3 measure_campaign.py -x [experiment_name] -c [config_file] -r [repetitions]
For campaign running, all instances which were defined in [config_file] will be used.
Quickstart
Step 1. Reserve the Hosts in G5K
Reserve the required number of hosts (See the document of G5K for more details)
For example:
Reserve 4 hosts (CPU) (1 server + 3 clients) for 2 hours:
oarsub -I -l host=4,walltime=2
Reserve 4 hosts (GPU) (1 server + 3 clients) for 2 hours:
oarsub -I -t exotic -p "gpu_count>0" -l {"cluster='drac'"}/host=4 # grenoble
Make sure your are ineflwr/Run/
:
cd Run
Step 2. Configure
Two JSON configuration files (e.g. config_instances.json
for CPU and config_instances_1.json
for GPU) to specify experiment details includes one or more instances.
cat config_instances.json
For example: config_instances.json
provides two examples of instance configuration. All fields are configured except "output_dir
" and "args
" must be updated with your directories setting.
- instance "
1
": fedAvg, cifar10, dvfs with min and max CPU freq, 1 round. - instance "
2
": fedAvg2Clients, cifar10, dvfs with min and max CPU freq, 1 round.
Step 3. Collect IP
Run the following command to collect/generate a node list:
uniq $OAR_NODEFILE > nodelist
Automatically populate missing IP addresses in the JSON file:
python3 collect_ip.py -n nodelist -c config_instances.json
Step 4. Run the Campaign or Single Instance
Run single instance with instance 1
, and 2 repetitions:
python3 measure_instance.py -c config_instances.json -i 1 -x SingleTest -r 2
Run a campaign with all instances (1
and 2
), and 2 repetitions:
python3 measure_instance.py -x CampaignTest -r 2
Step 5. Output
The logs and energy monitoring data will be saved in the directory specified in the JSON configuration.
Output dir structure:
/Flower_<x>
├── Flower_instance_<instance_name>
│ ├── Expetator
| | ├── config_instance*.json
│ ├── Expetator_<host_info>_<timestamp>_mojitos: mojitos outputs
│ ├── Expetator_<host_info>_<timestamp>_power: wattmetter outputs
│ ├── Expetator_<host_info>_<timestamp>: measurement log
│ ├── Flwr_<timestamp>: Flower log
│ │ ├── Client_<ip>
│ │ ├── Server_<ip>
│ ├── Flwr_<timestamp>
│ │ ├── Client_<ip>
│ │ ├── Server_<ip>
│── Flower_instance_<instance_name>
Step 6. Clean Up
After the experiment, exit the host and kill job if needed:
exit
oardel <job_id>
License
This project is licensed under [GPLv3].