Skip to content
Snippets Groups Projects

Measure Energy of Flower FL in G5K

This project provides tools to measure the energy consumption of Flower-based federated learning (FL) experiments on the Grid'5000 (G5K) testbed. It includes scripts to manage distributed nodes, run FL experiments, and monitor energy usage.

Table of Contents

Getting Started

The repository includes an example of Flower (using TensorFlow) in the Flower_v1 directory and the source of measuring framework in Run. This example demonstrates how to use this framework to measure energy consumption.

Installation

Clone the repository and navigate to the eflwr directory:

git clone https://gitlab.irit.fr/sepia-pub/delight/eflwr.git
cd eflwr

This framework requires:

  • Python 3.9.2 or higher.
  • Additional dependencies listed in requirements.txt. Install them with:
    pip install -r requirements.txt

Note: requirements.txt includes TensorFlow for running the provided Flower example.

Navigate to Run directory:

cd Run

Usage

  • FL scripts can be updated in Flower_v1.
  • Configure each instance of experiment in Run\config_instance*.json.
  • Follow these steps to config and run your experiment (or jump to Quickstart to run an example).

Step 1. Reserve the Hosts in G5K

Reserve the required number of hosts (See the document of G5K for more details)

oarsub -I -l host=[number_of_hosts],walltime=[duration]

Step 2. Configure

Edit the JSON configuration file (config_instance*.json) to specify experiment details. You can create multiple config_instance*.json files with the * is numbering of instance (the numbers must be consecutive positive integers starting from 1.)

vim config_instance1.json

Example structure:

{
    "instance": "fedAvg_cifar10",
    "output_dir": "/home/mdo/Huong_DL/Log",
    "dvfs": {
        "dummy": false,
        "baseline": false,
        "frequencies": [2000000,2200000]
    },
    "server": {
        "command": "python3",
        "args": [
            "Flower_v1/server.py",
            "-r 50",
            "-s fedAvg"
        ],
        "ip": "172.16.66.18",
        "port": 8080
    },
    "clients": [
        {
            "name": "client1",
            "command": "python3",
            "args": [
                "Flower_v1/client_1.py",
                "cifar10",
                "1",
                "3"
            ],
            "ip": "172.16.66.2"
        }
        {
            "name": "client2",
            "command": "python3",
            "args": [
                "Flower_v1/client_1.py",
                "cifar10",
                "1",
                "3"
            ],
            "ip": "172.16.66.3"
        }
    ]
}
  • instance: The name of your experiment.
  • output_dir: Where to store the log files (experiment log and energy monitoring log).
  • dvfs: choose only one in 3 settings, detects all available frequencies and go through all of them.
    • dummy: false or true (Only uses min and max frequency)
    • baseline: false or true (Only uses max freq)
    • frequencies: null or int list (Limits to the provided list of frequencies)

Remark: check the available frequencies before using oftion frequencies.

  • Set the permissions and disable Turbo Boost first:
bash "$(python3 -c "import expetator, os; print(os.path.join(os.path.dirname(expetator.__file__), 'leverages', 'dvfs_pct.sh'))")" init
  • Run this command to get available frequencies:
python3 get_frequencies.py
  • Update extraced frequencies value to configure files.

Step 3. Collect IP

Run the following command to generate a node list:

uniq $OAR_NODEFILE > nodelist

Automatically populate missing IP addresses in the JSON file:

python3 collect_ip.py

Step 4. Run the Campaign or Single Instance

Run campain:

python3 run_measure.py -x [experiment_name] -r [repetitions]

Run single instance:

python3 measure.py -c [config_file] -x [experiment_name] -r [repetitions]
  • [experiment_name]: The name you use to identify your experiment.
  • [repetitions]: Number of repetitions for the experiment.

Step 5. Output

The logs and energy monitoring data will be saved in the directory specified in the JSON configuration.

Step 6. Clean Up

After the experiment:

Exit the host:

exit

Check the job ID:

oarstat -u

Kill the job:

oardel <job_id>

Quickstart

Follow these steps to run an example:

  1. Reserve 4 hosts (1 server + 3 clients) for 2 hours:
oarsub -I -l host=4,walltime=2

Make sure your are ineflwr/Run/:

cd Run
  1. Configure

config_instance1.json and config_instance2.json provide two examples of instance configuration. All fields are configured but "output_dir" and "args" must be updated with your directories setting.

  • config_instance1.json: fedAvg, cifar10, dvfs with min and max freq, 1 round.
  • config_instance1.json: fedAvg2Clients, cifar10, dvfs with min and max freq, 1 round.
  1. Collect IP
uniq $OAR_NODEFILE > nodelist
python3 collect_ip.py
  1. Run the Single Instance or Campaign

Run single instance1 with config_instance1.json, 2 repetitions:

python3 measure.py -c config_instance1.json -x SingleTest -r 2

Run a campaign with all config_instance*.json in /Run, 2 repetitions:

python3 run_measure.py -x CampaignTest -r 2

Output Structure

Example output directory:

/Flower_<x>
├── Flower_instance_<instance_name>
│   ├── Expetator
|   |   ├── config_instance*.json
│   ├── Expetator_<host_info>
│   ├── Expetator_<host_info>_power
│   │   ├── <client_logs>
│   ├── Flwr_<timestamp>
│   │   ├── Client_<ip>
│   │   ├── Server_<ip>

License

This project is licensed under [GPLv3].