Skip to content
Snippets Groups Projects
user avatar
huongdm1896 authored
057b13d7
History
Name Last commit Last update
Flower_v1
Run
.gitignore
README.md
requirements.txt

Measure Energy of Flower FL in G5K

This project provides tools to measure the energy consumption of Flower-based federated learning (FL) experiments on the Grid'5000 (G5K) testbed. It includes scripts to manage distributed nodes, run FL experiments, and monitor energy usage.

Table of Contents

Getting Started

The repository includes an example of Flower (using TensorFlow) in the Flower_v1 directory and the source of measuring framework in Run. This example demonstrates how to use this framework to measure energy consumption.

Installation

Clone the repository and navigate to the eflwr directory:

git clone https://gitlab.irit.fr/sepia-pub/delight/eflwr.git
cd eflwr

This framework requires:

  • Python 3.9.2 or higher.
  • Additional dependencies listed in requirements.txt. Install them with:
    pip install -r requirements.txt

Note: You also need to install tensorflow, tensorflow-datasets scikit-learn and numpy if you want to run the provided Flower example.

Navigate to Run directory:

cd Run

Usage

FL framework

FL scripts (includes server and client scripts) can be updated, example in dir Flower_v1.

Configure instance for CPU

Configure instances of experiment in a json format, structure is shown below.

  • instances includes "1", "2" ,... are identifies of each instance.
  • instance: name of instance.
  • output_dir: location stores the output files (experiment log and energy monitoring output).
  • dvfs_cpu: choose only one in 3 settings.
    • dummy: for testing in min and max CPU freq (false or true).

    • baseline: for testing in max CPU freq (false or true).

    • frequencies: Limits to the provided list of frequencies (null or int list []).

      Remark: check the available frequencies before using oftion frequencies.

      • Set the permissions and disable Turbo Boost first:
      bash "$(python3 -c "import expetator, os; print(os.path.join(os.path.dirname(expetator.__file__), 'leverages', 'dvfs_pct.sh'))")" init
      • Run this command to get available frequencies:
      python3 get_frequencies.py
      • Update extraced frequencies value to configure files.
  • Structure of json config:
    {
        "instances": {
            "1": {
                "instance": "",
                "output_dir": "",
                "dvfs_cpu": {
                    "dummy": true,
                    "baseline": false,
                    "frequencies": null
                },
                "server": {
                    "command": "python3",
                    "args": [
                    ],
                    "ip": "",
                    "port": 8080
                    },
                "clients": [
                {
                    "name": "client1",
                    "command": "python3",
                    "args": [
                    ],
                    "ip": ""
                },
                {...},
                {...}
                ]
            },
            "2": {
                "instance": "",
                ...
            }
        }
    }

Configure instance for GPU

  • The configuration is as same CPU, except dvfs role. In GPU config, the role is dvfs_gpu.
    Choose only one in 3 settings (steps - zoomfrom - zoomto use for one setting).

    • dummy: for testing in min and max GPU freq (false or true).
    • baseline: for testing in max GPU freq (false or true).
    • steps: steps to jump in range/window of frequencies (int).
    • zoomfrom: freq start
    • zoomto: freq stop
    "dvfs_gpu": {
                  "dummy": true,
                  "baseline": false,
                  "steps": 2,
                  "zoomfrom": 0,
                  "zoomto": 0
              },

Run exp

2 options of experiment: run single instance or all instances (a campaign).

Run single instance:

python3 measure_instance.py -c [config_file] -i [instance] -x [experiment_name] -r [repetitions]
  • [config_file]: The instances configuration file.
  • [instance] : Identify number of single instance.
  • [experiment_name]: The name you use to identify your experiment.
  • [repetitions]: Number of repetitions for the experiment.

Run campaign:

python3 measure_campaign.py -x [experiment_name] -c [config_file] -r [repetitions]

For campaign running, all instances which were defined in [config_file] will be used.

Quickstart

Step 1. Reserve the Hosts in G5K

Reserve the required number of hosts (See the document of G5K for more details)
For example:

Reserve 4 hosts (CPU) (1 server + 3 clients) for 2 hours:

oarsub -I -l host=4,walltime=2

Reserve 4 hosts (GPU) (1 server + 3 clients) for 2 hours:

oarsub -I -t exotic -p "gpu_count>0" -l {"cluster='drac'"}/host=4 # grenoble

Make sure your are ineflwr/Run/:

cd Run

Step 2. Configure

Two JSON configuration files (e.g. config_instances.json for CPU and config_instances_1.json for GPU) to specify experiment details includes one or more instances.

cat config_instances.json

For example: config_instances.json provides two examples of instance configuration. All fields are configured except "output_dir" and "args" must be updated with your directories setting.

  • instance "1": fedAvg, cifar10, dvfs with min and max CPU freq, 1 round.
  • instance "2": fedAvg2Clients, cifar10, dvfs with min and max CPU freq, 1 round.

Step 3. Collect IP

Run the following command to collect/generate a node list:

uniq $OAR_NODEFILE > nodelist

Automatically populate missing IP addresses in the JSON file:

python3 collect_ip.py -n nodelist -c config_instances.json

Step 4. Run the Campaign or Single Instance

Run single instance with instance 1, and 2 repetitions:

python3 measure_instance.py -c config_instances.json -i 1 -x SingleTest -r 2

Run a campaign with all instances (1 and 2), and 2 repetitions:

python3 measure_instance.py -x CampaignTest -r 2

Step 5. Output

The logs and energy monitoring data will be saved in the directory specified in the JSON configuration.

Output dir structure:

/Flower_<x>
├── Flower_instance_<instance_name>
│   ├── Expetator
|   |   ├── config_instance*.json
│   ├── Expetator_<host_info>_<timestamp>_mojitos: mojitos outputs
│   ├── Expetator_<host_info>_<timestamp>_power: wattmetter outputs
│   ├── Expetator_<host_info>_<timestamp>: measurement log
│   ├── Flwr_<timestamp>: Flower log
│   │   ├── Client_<ip>
│   │   ├── Server_<ip>
│   ├── Flwr_<timestamp>
│   │   ├── Client_<ip>
│   │   ├── Server_<ip>
│── Flower_instance_<instance_name>

Step 6. Clean Up

After the experiment, exit the host and kill job if needed:

exit
oardel <job_id>

License

This project is licensed under [GPLv3].