diff --git a/README.md b/README.md index da91c6c200d83e540b55b5a5490ac4e4521e0df5..5466e6b995b8cad51804161c74b9d2855217e474 100644 --- a/README.md +++ b/README.md @@ -1,144 +1,229 @@ # Measure Energy of Flower FL in G5K +This project provides tools to measure the energy consumption of Flower-based federated learning (FL) experiments on the Grid'5000 (G5K) testbed. It includes scripts to manage distributed nodes, run FL experiments, and monitor energy usage. + +## Table of Contents + +- [Getting Started](#getting-started) +- [Installation](#installation) +- [Usage](#usage) + - [Step 1. Configure](#step-1-configure) + - [Step 2. Reserve the Hosts in G5K](#step-2-reserve-the-hosts-in-g5k) + - [Step 3. Collect IP](#step-3-collect-ip) + - [Step 4. Run the Campaign or Instance](#step-4-run-the-campaign-or-instance) + - [Step 5. Output](#step-5-output) + - [Step 6. Clean Up](#step-6-clean-up) +- [Quickstart](#quickstart) +- [Output Structure](#output-structure) +- [License](#license) + ## Getting Started -These instructions will let you know how to run it. +The repository includes an example of Flower (using TensorFlow) in the `Flower_v1` directory and the source of measuring framework in `Run`. This example demonstrates how to use this framework to measure energy consumption. -An example of Flower (using tensorflow) is stored in Flower_v1. The test example will be use this source. +## Installation -### Prerequisites +This framework requires: +- **Python 3.9.2** or higher. +- Additional dependencies listed in `requirements.txt`. Install them with: + ```bash + pip install -r requirements.txt + ``` +*Note:* `requirements.txt` includes TensorFlow for running the provided Flower example. -This framework requires **Python 3.9.2** or higher. Check your Python package version: -```bash -python3 --version -``` -Other dependencies are stored in *requirements.txt*. Run: -```bash -pip install -r requirements.txt -``` -This *requirements.txt* includes tensorflow to run the provided example Flwr. -### Installing -Download code: +Clone the repository and navigate to the `Run` directory: ```bash git clone https://gitlab.irit.fr/sepia-pub/delight/eflwr.git -``` - -Go to Run directory: -```bash cd eflwr/Run ``` -## Running the tests - -**Remark:** EFLWR is configured for Flower only. Your Flower framework must contain server and client script to run on distributed nodes. The Flower_v1 directory provides an example. Once configured, EFLWR will automatically execute flower on the specified nodes and measure the energy consumption. - -### Step 1. Configure -Configure the information of each exp in `config_instance*.json`, you can add more `config_instance*.json` with * is a number. Use your editor instead of *vim* (*nano* etc). See the examples in 2 config files which provied in Run directory. -```bash -vim config_instance1.json -``` -``` -"instance": the name of your exp, anything you want, I suggest to input the specific name of your testing. +## Usage -"output_dir": where you want to store your log (your exp log and energy mornitoring log) +- FL scripts can be updated in `Flower_v1`. +- Configure each instance of experiment in `Run\config_instance*.json`. +- Follow these steps to config and run your experiment (or jump to [Quickstart](#quickstart) to run an example). -"server": - "command": which cmd sever use, my case is python3. - "args": includes file to run and arguments - "ip": address of server, this one will be automatic input when you run the code, just leave it blank - "port": default 8080m dont change it. +### Step 1. Reserve the Hosts in G5K -"clients": - "name": numbering/name your client, should be client1.2.3... - "command": which cmd client use, my case is python3 - "args": includes file to run and arguments - "ip": address of client, this one will be automatic input when you run the code, just leave it blank. +Reserve the required number of hosts (*See the [document of G5K](https://www.grid5000.fr/w/Getting_Started#Reserving_resources_with_OAR:_the_basics) for more details*) +```bash +oarsub -I -l host=[number_of_hosts],walltime=[duration] ``` -Note that if you create *x* clients in json config then you have to researve *x+1* hosts in g5k (1 for server and *x* for *x* clients). +### Step 2. Configure +Edit the JSON configuration file (`config_instance*.json`) to specify experiment details. You can create multiple `config_instance*.json` files with the * is numbering of instance (the numbers must be consecutive positive integers starting from 1.) -### Step 2. Reserve the hosts in g5k: -Run below cmd: ```bash -oarsub -I -l host=<number_of_hosts>,walltime=<number_of_during> +vim config_instance1.json ``` -For example: 1 sever and 3 clients need 4 hosts. -```bash -oarsub -I -l host=4,walltime=2 -``` +Example structure: + +```json +{ + "instance": "fedAvg_cifar10", + "output_dir": "/home/mdo/Huong_DL/Log", + "dvfs": { + "dummy": false, + "baseline": false, + "frequencies": [2000000,2200000] + }, + "server": { + "command": "python3", + "args": [ + "Flower_v1/server.py", + "-r 50", + "-s fedAvg" + ], + "ip": "172.16.66.18", + "port": 8080 + }, + "clients": [ + { + "name": "client1", + "command": "python3", + "args": [ + "Flower_v1/client_1.py", + "cifar10", + "1", + "3" + ], + "ip": "172.16.66.2" + } + { + "name": "client2", + "command": "python3", + "args": [ + "Flower_v1/client_1.py", + "cifar10", + "1", + "3" + ], + "ip": "172.16.66.3" + } + ] +} +``` + +- **instance**: The name of your experiment. +- **output_dir**: Where to store the log files (experiment log and energy monitoring log). +- **dvfs**: choose only one in 3 settings, detects all available frequencies and go through all of them. + - `dummy`: false or true (Only uses min and max frequency) + - `baseline`: false or true (Only uses max freq) + - `frequencies`: null or int list (Limits to the provided list of frequencies) + +**Remark:** check the available frequencies before using oftion `frequencies`. + +- Set the permissions and disable Turbo Boost first: +```bash +bash "$(python3 -c "import expetator, os; print(os.path.join(os.path.dirname(expetator.__file__), 'leverages', 'dvfs_pct.sh'))")" init +``` +- Run this command to get available frequencies: +```bash +python3 get_frequencies.py +``` +- Update extraced frequencies value to configure files. ### Step 3. Collect IP -Run below cmd: + +Run the following command to generate a node list: ```bash uniq $OAR_NODEFILE > nodelist ``` -Then automatically fill out missing IP addresses in `json`: +Automatically populate missing IP addresses in the JSON file: ```bash python3 collect_ip.py ``` -### Step 4. Run the campaign or instance: +### Step 4. Run the Campaign or Single Instance -Now you can run the monitoring campaign by run_measure.py. Note that, this function will scan all the json file in the Run directory with the name "config_instance*.json". +Run campain: ```bash -python3 run_measure.py -x <your_str> -r <number_of_repetation> +python3 run_measure.py -x [experiment_name] -r [repetitions] ``` - -For example: +Run single instance: ```bash -python3 run_measure.py -x IamPretty -r 2 +python3 measure.py -c [config_file] -x [experiment_name] -r [repetitions] ``` -<your_str>: whatever_you_want to recorgnize your testing. -<number_of_repetation>: number of repeatation of your exp +- **[experiment_name]**: The name you use to identify your experiment. +- **[repetitions]**: Number of repetitions for the experiment. -In case you only need to run 1 instance, you can use measure.py instead: -```bash -python3 measure.py -c <config_instance*.json> -x <your_str> -r <number_of_repetation> -``` +### Step 5. Output + +The logs and energy monitoring data will be saved in the directory specified in the JSON configuration. + +### Step 6. Clean Up -For example: +After the experiment: + +Exit the host: + ```bash + exit + ``` + +Check the job ID: + ```bash + oarstat -u + ``` + +Kill the job: + ```bash + oardel <job_id> + ``` + +## Quickstart + +Follow these steps to run an example: + +1. Reserve 4 hosts (1 server + 3 clients) for 2 hours: ```bash -python3 measure.py -c config_instance1.json -x IamPretty -r 2 +oarsub -I -l host=4,walltime=2 ``` +2. Configure -### Step 6. Output -Check the output in the directory where you set in json file. The output structure: -```plaintext -/Flower_Test1 -├── Flower_instance_fedAvg_cifar10 -│ ├── Expetator -│ ├── Expetator_gros-26.nancy.grid5000.fr_1732808824 -│ ├── Expetator_gros-26.nancy.grid5000.fr_1732808824_mojitos -│ │ ├── gros-26.nancy.grid5000.fr_flower_1732808838 -│ │ ├── gros-38.nancy.grid5000.fr_flower_1732808838 -│ │ ├── gros-4.nancy.grid5000.fr_flower_1732808838 -│ │ └── gros-65.nancy.grid5000.fr_flower_1732808838 -│ ├── Expetator_gros-26.nancy.grid5000.fr_1732808824_power -│ │ └── gros-26.nancy.grid5000.fr_flower_1732808838 -│ ├── Flwr_20241128_164718 -│ │ ├── Client_172.16.66.38 -│ │ ├── Client_172.16.66.4 -│ │ ├── Client_172.16.66.65 -│ │ ├── flower_log_summary.txt -│ │ └── Server_172.16.66.26 -``` +`config_instance1.json` and `config_instance2.json` provide two examples of instance configuration. All fields are configured but "output_dir" and "args" must be updated with your directories setting. +- `config_instance1.json`: fedAvg, cifar10, dvfs with min and max freq, 1 round. +- `config_instance1.json`: fedAvg2Clients, cifar10, dvfs with min and max freq, 1 round. -### Step 7. Kill job in g5k after finish -Exit the host: +3. Collect IP + ```bash -exit +uniq $OAR_NODEFILE > nodelist +python3 collect_ip.py ``` -Check the job id: + +4. Run the Single Instance or Campaign + +Run single instance1 with `/Run/config_instance1.json`, 2 repetitions: ```bash -oarstat -u +python3 measure.py -c config_instance1.json -x SingleTest -r 2 ``` -Kill the job: +Run a campaign with all config_instance*.json in `/Run`, 2 repetitions: ```bash -oardel <job_id> +python3 run_measure.py -x CampaignTest -r 2 ``` +## Output Structure + +Example output directory: + +```plaintext +/Flower_<x> +├── Flower_instance_<instance_name> +│ ├── Expetator +| | ├── config_instance*.json +│ ├── Expetator_<host_info> +│ ├── Expetator_<host_info>_power +│ │ ├── <client_logs> +│ ├── Flwr_<timestamp> +│ │ ├── Client_<ip> +│ │ ├── Server_<ip> +``` + +## License + +This project is licensed under [GPLv3]. \ No newline at end of file diff --git a/Run/config_instance1.json b/Run/config_instance1.json index d69ccd7ea651d16bd82bbd3a17b5eb548c0dc50e..a26dbc8d760905ee6a7de5ac1bc2b6230d7ef638 100644 --- a/Run/config_instance1.json +++ b/Run/config_instance1.json @@ -2,20 +2,15 @@ "instance": "fedAvg_cifar10", "output_dir": "/home/mdo/Framework/eflwr/Log", "dvfs": { - "dummy": false, + "dummy": true, "baseline": false, - "frequencies": [ - 1000000, - 1400000, - 1800000, - 2200000 - ] + "frequencies": null }, "server": { "command": "python3", "args": [ "/home/mdo/Framework/eflwr/Flower_v1/server_1.py", - "-r 25", + "-r 1", "-s fedAvg" ], "additional_env_var": [ diff --git a/Run/config_instance2.json b/Run/config_instance2.json index fdc92f69c487b2a6a1b17e70941d78fda52dfd89..ce0122cddbec50bf9194619d6a2e5854904b33b8 100644 --- a/Run/config_instance2.json +++ b/Run/config_instance2.json @@ -2,20 +2,15 @@ "instance": "fedAvg2Clients_cifar10", "output_dir": "/home/mdo/Framework/eflwr/Log", "dvfs": { - "dummy": false, + "dummy": true, "baseline": false, - "frequencies": [ - 1000000, - 1400000, - 1800000, - 2200000 - ] + "frequencies": null }, "server": { "command": "python3", "args": [ "/home/mdo/Framework/eflwr/Flower_v1/server_1.py", - "-r 25", + "-r 1", "-s fedAvg2Clients" ], "additional_env_var": [