-
huongdm1896 authoredhuongdm1896 authored
Measure Energy of Flower FL in G5K
This project provides tools to measure the energy consumption of Flower-based federated learning (FL) experiments on the Grid'5000 (G5K) testbed. It includes scripts to manage distributed nodes, run FL experiments, and monitor energy usage.
Table of Contents
Getting Started
The repository includes an example of Flower (using TensorFlow) in the Flower_v1
directory and the source of measuring framework in Run
. This example demonstrates how to use this framework to measure energy consumption.
Installation
Clone the repository and navigate to the eflwr
directory:
git clone https://gitlab.irit.fr/sepia-pub/delight/eflwr.git
cd eflwr
This framework requires:
- Python 3.9.2 or higher.
- Additional dependencies listed in
requirements.txt
. Install them with:pip install -r requirements.txt
Note: requirements.txt
includes TensorFlow for running the provided Flower example.
Navigate to Run
directory:
cd Run
Usage
- FL scripts can be updated in
Flower_v1
. - Configure each instance of experiment in
Run\config_instance*.json
. - Follow these steps to config and run your experiment (or jump to Quickstart to run an example).
Step 1. Reserve the Hosts in G5K
Reserve the required number of hosts (See the document of G5K for more details)
oarsub -I -l host=[number_of_hosts],walltime=[duration]
Step 2. Configure
Edit the JSON configuration file (config_instance*.json
) to specify experiment details. You can create multiple config_instance*.json
files with the * is numbering of instance (the numbers must be consecutive positive integers starting from 1.)
vim config_instance1.json
Example structure:
{
"instance": "fedAvg_cifar10",
"output_dir": "/home/mdo/Huong_DL/Log",
"dvfs": {
"dummy": false,
"baseline": false,
"frequencies": [2000000,2200000]
},
"server": {
"command": "python3",
"args": [
"Flower_v1/server.py",
"-r 50",
"-s fedAvg"
],
"ip": "172.16.66.18",
"port": 8080
},
"clients": [
{
"name": "client1",
"command": "python3",
"args": [
"Flower_v1/client_1.py",
"cifar10",
"1",
"3"
],
"ip": "172.16.66.2"
}
{
"name": "client2",
"command": "python3",
"args": [
"Flower_v1/client_1.py",
"cifar10",
"1",
"3"
],
"ip": "172.16.66.3"
}
]
}
- instance: The name of your experiment.
- output_dir: Where to store the log files (experiment log and energy monitoring log).
-
dvfs: choose only one in 3 settings, detects all available frequencies and go through all of them.
-
dummy
: false or true (Only uses min and max frequency) -
baseline
: false or true (Only uses max freq) -
frequencies
: null or int list (Limits to the provided list of frequencies)
-
Remark: check the available frequencies before using oftion frequencies
.
- Set the permissions and disable Turbo Boost first:
bash "$(python3 -c "import expetator, os; print(os.path.join(os.path.dirname(expetator.__file__), 'leverages', 'dvfs_pct.sh'))")" init
- Run this command to get available frequencies:
python3 get_frequencies.py
- Update extraced frequencies value to configure files.
Step 3. Collect IP
Run the following command to generate a node list:
uniq $OAR_NODEFILE > nodelist
Automatically populate missing IP addresses in the JSON file:
python3 collect_ip.py
Step 4. Run the Campaign or Single Instance
Run campain:
python3 run_measure.py -x [experiment_name] -r [repetitions]
Run single instance:
python3 measure.py -c [config_file] -x [experiment_name] -r [repetitions]
- [experiment_name]: The name you use to identify your experiment.
- [repetitions]: Number of repetitions for the experiment.
Step 5. Output
The logs and energy monitoring data will be saved in the directory specified in the JSON configuration.
Step 6. Clean Up
After the experiment:
Exit the host:
exit
Check the job ID:
oarstat -u
Kill the job:
oardel <job_id>
Quickstart
Follow these steps to run an example:
- Reserve 4 hosts (1 server + 3 clients) for 2 hours:
oarsub -I -l host=4,walltime=2
Make sure your are ineflwr/Run/
:
cd Run
- Configure
config_instance1.json
and config_instance2.json
provide two examples of instance configuration. All fields are configured but "output_dir" and "args" must be updated with your directories setting.
-
config_instance1.json
: fedAvg, cifar10, dvfs with min and max freq, 1 round. -
config_instance1.json
: fedAvg2Clients, cifar10, dvfs with min and max freq, 1 round.
- Collect IP
uniq $OAR_NODEFILE > nodelist
python3 collect_ip.py
- Run the Single Instance or Campaign
Run single instance1 with config_instance1.json
, 2 repetitions:
python3 measure.py -c config_instance1.json -x SingleTest -r 2
Run a campaign with all config_instance*.json in /Run
, 2 repetitions:
python3 run_measure.py -x CampaignTest -r 2
Output Structure
Example output directory:
/Flower_<x>
├── Flower_instance_<instance_name>
│ ├── Expetator
| | ├── config_instance*.json
│ ├── Expetator_<host_info>
│ ├── Expetator_<host_info>_power
│ │ ├── <client_logs>
│ ├── Flwr_<timestamp>
│ │ ├── Client_<ip>
│ │ ├── Server_<ip>
License
This project is licensed under [GPLv3].