From 75b3b627245eae93e7a06f5fc019431ee6d80efc Mon Sep 17 00:00:00 2001
From: huongdm1896 <domaihuong1451997@gmail.com>
Date: Sat, 3 May 2025 01:52:27 +0200
Subject: [PATCH] Add GPU instructions

---
 GPU_cuda.md         | 131 ++++++++++++++++++++++++++++++++++++++++++++
 README.md           |  21 +++++--
 requirement_GPU.txt |   8 +++
 3 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 GPU_cuda.md
 create mode 100644 requirement_GPU.txt

diff --git a/GPU_cuda.md b/GPU_cuda.md
new file mode 100644
index 0000000..180e188
--- /dev/null
+++ b/GPU_cuda.md
@@ -0,0 +1,131 @@
+# CUDA Installation and Configuration Guide with TensorFlow on NVIDIA GPU (G5K)
+(This guide has been tested on chifflot - Lille)
+## 1. Check GPU Status
+
+```bash
+nvidia-smi
+```
+
+The result will look like this:
+
+```
+Fri Apr 25 14:53:18 2025       
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  Tesla V100-PCIE-32GB           On  | 00000000:3B:00.0 Off |                    0 |
+| N/A   33C    P0              26W / 250W |      0MiB / 32768MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
+|   1  Tesla V100-PCIE-32GB           On  | 00000000:D8:00.0 Off |                    0 |
+| N/A   30C    P0              27W / 250W |      0MiB / 32768MiB |      0%      Default |
+|                                         |                      |                  N/A |
++-----------------------------------------+----------------------+----------------------+
++---------------------------------------------------------------------------------------+
+| Processes:                                                                            |
+|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+|        ID   ID                                                             Usage      |
+|=======================================================================================|
+|  No running processes found                                                           |
++---------------------------------------------------------------------------------------+
+```
+
+## 2. Check Current CUDA Version
+
+```bash
+nvcc --version
+```
+
+The result will look like this:
+
+```
+nvcc: NVIDIA (R) Cuda compiler driver
+Copyright (c) 2005-2021 NVIDIA Corporation
+Built on Sun_Feb_14_21:12:58_PST_2021
+Cuda compilation tools, release 11.2, V11.2.152
+Build cuda_11.2.r11.2/compiler.29618528_0
+```
+
+### Note:
+You need to check the version of CUDA which GPU support and the current using version.
+If you need a different version of CUDA, you will need to switch to another one. 
+
+## 3. Check Available CUDA Versions with `module`
+
+```bash
+module av cuda
+```
+
+The result will look like this:
+
+```
+-------------------- /grid5000/spack/v1/share/spack/modules/linux-debian11-x86_64_v2 --------------------
+   cuda/11.4.0_gcc-10.4.0    cuda/11.8.0_gcc-10.4.0    cuda/12.2.1_gcc-10.4.0        (D)
+   cuda/11.6.2_gcc-10.4.0    cuda/12.0.0_gcc-10.4.0    mpich/4.1_gcc-10.4.0-ofi-cuda
+   cuda/11.7.1_gcc-10.4.0    cuda/12.1.1_gcc-10.4.0    mpich/4.1_gcc-10.4.0-ucx-cuda
+```
+
+## 4. Load the Desired CUDA Version
+
+To load CUDA version 12.2.1 (as GPU requires), use the following command:
+
+```bash
+module load cuda/12.2.1_gcc-10.4.0
+```
+
+The default CUDA version will be replaced by CUDA 12.2.1.
+
+## 5. Set CUDA Environment Variables(If needed)
+
+Set up the environment variables to use the newly loaded CUDA version:
+
+```bash
+export PATH=/usr/local/cuda-12.2.1/bin:$PATH
+export LD_LIBRARY_PATH=/usr/local/cuda-12.2.1/lib64:$LD_LIBRARY_PATH  
+```
+
+## 6. Check CUDA Version Again
+
+After loading the new CUDA version, check the version again:
+
+```bash
+nvcc --version
+```
+
+The result will show:
+
+```
+nvcc: NVIDIA (R) Cuda compiler driver
+Copyright (c) 2005-2023 NVIDIA Corporation
+Built on Tue_Jul_11_02:20:44_PDT_2023
+Cuda compilation tools, release 12.2, V12.2.128
+Build cuda_12.2.r12.2/compiler.33053471_0
+```
+
+## 7. Install TensorFlow with CUDA Support
+
+```bash
+python3 -m pip install 'tensorflow[and-cuda]'
+```
+
+## 8. Verify Available GPUs in TensorFlow
+
+Verify that TensorFlow recognizes the available GPUs:
+
+```bash
+python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
+```
+
+The result will show a list of available GPUs:
+
+```
+[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
+```
+
+---
+
+Congratulations, you have successfully installed and configured TensorFlow with GPU support on G5K! (Or not? :stuck_out_tongue_winking_eye:)
\ No newline at end of file
diff --git a/README.md b/README.md
index 818f918..1facfef 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,6 @@ This framework requires:
   ```bash
   pip install -r requirements.txt
   ```
-*Note:* `requirements.txt` includes `tensorflow`, `tensorflow-datasets` `scikit-learn` and `numpy` using for the provided Flower example.
 
 Navigate to `Run` directory:
 
@@ -198,7 +197,7 @@ Choose only one in 3 settings:
 
 ## Quickstart
 
-### Step 1. Reserve the Hosts in G5K
+### Step 0. Reserve the Hosts in G5K
 
 Reserve the required number of hosts (*See the [document of G5K](https://www.grid5000.fr/w/Getting_Started#Reserving_resources_with_OAR:_the_basics) for more details*)  
 <u>For example</u>: 
@@ -209,17 +208,29 @@ oarsub -I -l host=4,walltime=2
 ```
 Reserve 4 hosts (GPU) (1 server + 3 clients) for 2 hours:
 ```bash
-oarsub -I -t exotic -p "gpu_count>0" -l {"cluster='drac'"}/host=4 # grenoble
-oarsub -I -p "gpu_count>0" -l {"cluster='chifflot'"}/host=4 # lille
+oarsub -I -p "gpu_count>0" -l {"cluster='chifflot'"}/host=4,walltime=2 # lille
 ```
 
-**Remark**: for now only 2 clusters, `chifflot` in Lille and `drac` in Grenoble are available for testing in more than 3 GPU nodes, maximum is 8 (`chifflot`) or 12 (`drac`) nodes.
+**Remark**: for now only 1 cluster, `chifflot` in Lille is available for testing (more than 3 GPU nodes and able to set up requirement), maximum is 8 (`chifflot`) nodes. Need to configure cuda for GPU using, check out the quick guide [here](./GPU_cuda.md) or [G5K website](https://www.grid5000.fr/w/GPUs_on_Grid5000). 
 
 Make sure your are in`eflwr/Run/`:
 ```bash
 cd Run
 ```
 
+### Step 1. Install requirements
+
+If you use CPU nodes:
+  ```bash
+  pip install -r requirements.txt # futher needed for Flower example
+  ```
+
+If you use GPU nodes:
+  ```bash
+  pip install -r requirement_GPU.txt # futher needed for Flower example
+  ```
+*Note:* futher requirement includes `tensorflow` or `tensorflow[and-cuda]` , `tensorflow-datasets` `scikit-learn` and `numpy` using for the provided Flower example. 
+
 ### Step 2. Configure
 Two JSON configuration files (e.g. `config_instances_CPU.json` for CPU and `config_instances_GPU.json` for GPU) to specify experiment details includes one or more instances. 
 
diff --git a/requirement_GPU.txt b/requirement_GPU.txt
new file mode 100644
index 0000000..bbf1a72
--- /dev/null
+++ b/requirement_GPU.txt
@@ -0,0 +1,8 @@
+flwr==1.13.0
+flwr-datasets==0.4.0
+expetator==0.3.25
+tensorflow[and-cuda]>=2.16.1,<2.17.0
+tensorflow-datasets==4.4.0
+tensorboard>=2.16.2,<2.17.0
+scikit-learn==1.1.3  
+numpy>=1.23.0,<1.24.0 
-- 
GitLab