Skip to content
Snippets Groups Projects
Unverified Commit 5f30ad9d authored by Michael's avatar Michael Committed by GitHub
Browse files

Merge pull request #9 from leahcimali/Occidata

Occidata Conflict with main
parents 283a5bb1 7debfc97
Branches
No related tags found
No related merge requests found
Showing
with 1282 additions and 479 deletions
*.pyc *.py~
*.py~ results/*
results/* .vscode/*
.vscode/* src/__pycache__/*
src/__pycache__/* info.log
info.log launch.json
*.ipynb backup_results/*
launch.json *.sh
backup_results/* tests/__pycache__/*
datasets/*
pub/*
data/*
*.tgz
*.pyc
info.log
*.ipynb
launch.json
backup_results/*
*.sh *.sh
\ No newline at end of file
------------------------------------------------------------
SOFTWARE EVALUATION LICENSE
IRIT computer science research laboratory, Toulouse, France.
------------------------------------------------------------
Definitions
SOFTWARE: The "Comparative-Evaluation-of-Clustered-Federated-Learning-Methods" software, in source-code form, written by leahcimali and omar-rifai at the IRIT computer science research laboratory.
LICENSOR: L’UNIVERSITE TOULOUSE III - PAUL SABATIER, a public scientific, cultural and professional establishment, having SIRET No. 193 113 842 00010, APE code 8542 Z, having its registered office at 118, route de Narbonne, 31062 Toulouse Cedex 9, France, acting in its own name and on its own behalf, and on behalf of l'Institut de Recherche en Informatique de Toulouse (IRIT), UMR N°5055.
1. INTENT/PURPOSE
This agreement determines the conditions in which the LICENSOR, who has the rights to the SOFTWARE, grants to the LICENSEE a license for research and evaluation purposes only, excluding any commercial use.
2. LICENSEE
Any person or organization who receives the SOFTWARE with a copy of this license.
3. RIGHTS GRANTED
The rights to use and copy the SOFTWARE, subject to the restrictions described in this agreement.
The rights to modify and compile the SOFTWARE when it's provided in source code form, subject to the restrictions described in this agreement.
For the SOFTWARE provided in binary form only, LICENSEE undertakes to not decompile, disassemble, decrypt, extract the components or perform reverse engineering except to the extent expressly provided by law.
5. SCOPE OF THE LICENSE
- NON-COMMERCIAL license for research and evaluation purposes ONLY.
- NO right to commercialize the SOFTWARE, or any derivative work, without separate agreement with the LICENSOR.
6. MODIFICATION OF THE SOFTWARE
License permits LICENSEE to modify the SOFTWARE provided in source code form for research and evaluation purposes only.
7. REDISTRIBUTION
- License permits LICENSEE to redistribute verbatim copies of the SOFTWARE, accompanied with a copy of this license.
- License DOES NOT permit LICENSEE to redistribute modified versions of the SOFTWARE provided in source code form.
- License DOES NOT permit LICENSEE to commercialize the SOFTWARE or any derivative work of the SOFTWARE.
8. FEE/ROYALTY
- LICENSEE pays no royalty for this license.
- LICENSEE and any third parties must enter a new agreement for any use beyond scope of license. Please contact the IRIT technology transfer office (email numerique@toulouse-tech-transfer.com) for further information.
9. NO WARRANTY
The SOFTWARE is provided "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of the program is with the LICENSEE.
10. NO LIABILITY
In no event unless required by applicable law or agreed to in writing will any copyright owner be liable to LICENSEE for damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the program (including but not limited to loss of data or data being rendered inaccurate or losses sustained by LICENSEE or third parties or a failure of the program to operate with other programs), even if such holder has been advised of the possibility of such damages.
#### Code for the paper: *Comparative Evaluation of Clustered Federated Learning Methods* #### Code for the paper: *Comparative Evaluation of Clustered Federated Learning Methods*
##### Submited to 'The 2nd IEEE International Conference on Federated Learning Technologies and Applications (FLTA24), VALENCIA, SPAIN' ##### Submited to 'The 2nd IEEE International Conference on Federated Learning Technologies and Applications (FLTA24), VALENCIA, SPAIN'
1. To reproduce the results in the paper run `driver.py` with the parameters in `exp_configs.csv` 1. To reproduce the results in the paper run `driver.py` with the parameters in `exp_configs.csv`
2. Each experiment will output a `.csv` file with the resuting metrics 2. Each experiment will output a `.csv` file with the resuting metrics
3. Histogram plots and a summary table of various experiments can be obtained running `src/utils_results.py` 3. Histogram plots and a summary table of various experiments can be obtained running `src/utils_results.py`
To use driver.py use the following parameters : To use driver.py use the following parameters :
`python driver.py --exp_type --dataset --heterogeneity_type --num_clients --num_samples_by_label --num_clusters --centralized_epochs --federated_rounds --seed ` `python driver.py --exp_type --dataset --heterogeneity_type --num_clients --num_samples_by_label --num_clusters --centralized_epochs --federated_rounds --seed `
To run all experiments in exp_config.csv user `run_exp.py`. To run all experiments in exp_config.csv user `run_exp.py`.
Once all experiments are done, to get results run `src/utils_results.src`. Once all experiments are done, to get results run `src/utils_results.src`.
\ No newline at end of file
%% Cell type:code id:c1649e65-6fb0-4af7-8ecd-d94f44511d9d tags:
``` python
import tarfile
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, TensorDataset
from torch.utils.data import random_split
import torchvision
from torchvision.datasets.utils import download_url
from torchvision.datasets import ImageFolder
from torchvision.transforms import ToTensor
import torchvision.transforms as transforms
from src.utils_results import plot_img
from src.fedclass import Client
from src.utils_training import train_central, test_model
from src.utils_data import get_clients_data, data_preparation
from src.models import GenericConvModel
from sklearn.model_selection import train_test_split
```
%% Cell type:markdown id:59c7fe59-a1ce-4925-bdca-8ecb777902e8 tags:
## Three Methods to load the dataset
%% Cell type:code id:60951c57-4f25-4e62-9255-a57d120c6370 tags:
``` python
### 1- Using raw image folder
dataset_url = "https://s3.amazonaws.com/fast-ai-imageclas/cifar10.tgz"
download_url(dataset_url, '.')
with tarfile.open('./cifar10.tgz', 'r:gz') as tar:
tar.extractall(path='./data')
data_dir = './data/cifar10'
classes = os.listdir(data_dir + "/train")
dataset1 = ImageFolder(data_dir+'/train', transform=ToTensor())
### 2- Using project functions
dict_clients = get_clients_data(num_clients = 1, num_samples_by_label = 600, dataset = 'cifar10', nn_model = 'convolutional')
x_data, y_data = dict_clients[0]['x'], dict_clients[0]['y']
x_data = np.transpose(x_data, (0, 3, 1, 2))
dataset2 = TensorDataset(torch.tensor(x_data, dtype=torch.float32), torch.tensor(y_data, dtype=torch.long))
### 3 - Using CIFAR10 dataset from Pytorch
cifar10 = torchvision.datasets.CIFAR10("datasets", download=True, transform=ToTensor())
(x_data, y_data) = cifar10.data, cifar10.targets
x_data = np.transpose(x_data, (0, 3, 1, 2))
dataset3 = TensorDataset(torch.tensor(x_data, dtype=torch.float32), torch.tensor(y_data, dtype=torch.long))
```
%% Output
Using downloaded and verified file: ./cifar10.tgz
/tmp/ipykernel_6044/2990241823.py:6: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
tar.extractall(path='./data')
Files already downloaded and verified
Files already downloaded and verified
%% Cell type:code id:2ff13653-be89-4b0b-97f0-ffe7ee9c23ab tags:
``` python
model = GenericConvModel(32,3)
```
%% Cell type:code id:148d883c-a667-49a2-87c1-5962f1c859eb tags:
``` python
```
%% Cell type:markdown id:c4f0dc1d-4cdc-47cb-b200-2c58984ac171 tags:
## Conversion to dataloaders
%% Cell type:code id:bcefeb34-f9f4-4086-8af9-73469c3fd375 tags:
``` python
random_seed = 42
torch.manual_seed(random_seed);
val_size = 5000
train_size = len(dataset1) - val_size
train_ds, val_ds = random_split(dataset1, [train_size, val_size])
batch_size=128
train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_dl = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
```
%% Cell type:code id:91bec335-be85-4ef8-a27f-298ed08b80fc tags:
``` python
num_epochs = 10
opt_func = torch.optim.Adam
lr = 0.001
@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=opt_func):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
model.epoch_end(epoch, result)
history.append(result)
return history
```
%% Cell type:code id:56aa9198-0a07-4ec8-802b-792352667795 tags:
``` python
history = fit(num_epochs, lr, model, train_dl, val_dl, opt_func)
```
%% Output
Epoch [0], train_loss: 1.7809, val_loss: 1.4422, val_acc: 0.4745
Epoch [1], train_loss: 1.2344, val_loss: 1.0952, val_acc: 0.6092
Epoch [2], train_loss: 0.9971, val_loss: 0.9526, val_acc: 0.6552
Epoch [3], train_loss: 0.8338, val_loss: 0.8339, val_acc: 0.7085
Epoch [4], train_loss: 0.7093, val_loss: 0.7892, val_acc: 0.7239
Epoch [5], train_loss: 0.6082, val_loss: 0.7572, val_acc: 0.7490
%% Cell type:code id:a106673e-a9a9-4525-bc94-98d3b64f2a7d tags:
``` python
result = evaluate(model, test_loader)
```
%% Cell type:code id:24941b20-3aed-4336-9f79-87e4fcf0bba7 tags:
``` python
result
```
%% Output
{'val_loss': 2.3049447536468506, 'val_acc': 0.10572139918804169}
%% Cell type:code id:218e7550-2b48-4a8e-8547-e3afe81d34fe tags:
``` python
```
%% Cell type:code id:15f616e4-e565-4396-9450-03c891530640 tags:
``` python
```
%% Cell type:code id:3fbea674-afff-495a-ac28-c34b44561d47 tags:
``` python
```
import os
# Set the environment variable for deterministic behavior with CuBLAS (Give reproductibility with CUDA)
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
import click import click
@click.command() @click.command()
@click.option('--exp_type', help="The experiment type to run") @click.option('--exp_type', help="The experiment type to run")
@click.option('--heterogeneity_type', help="The data heterogeneity to test (or dataset)")
@click.option('--dataset') @click.option('--dataset')
@click.option('--nn_model', help= "The training model to use ('linear (default) or 'convolutional')")
@click.option('--heterogeneity_type', help="The data heterogeneity to test (or dataset)")
@click.option('--num_clients', type=int) @click.option('--num_clients', type=int)
@click.option('--num_samples_by_label', type=int) @click.option('--num_samples_by_label', type=int)
@click.option('--num_clusters', type=int) @click.option('--num_clusters', type=int)
...@@ -13,24 +18,20 @@ import click ...@@ -13,24 +18,20 @@ import click
@click.option('--seed', type=int) @click.option('--seed', type=int)
def main_driver(exp_type, dataset, nn_model, heterogeneity_type, num_clients, num_samples_by_label, num_clusters, centralized_epochs, federated_rounds, seed):
def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_by_label, num_clusters, centralized_epochs, federated_rounds, seed):
from pathlib import Path from pathlib import Path
import pandas as pd import pandas as pd
from src.utils_logging import cprint, setup_logging
from src.utils_data import setup_experiment, get_uid from src.utils_data import setup_experiment, get_uid
setup_logging() row_exp = pd.Series({"exp_type": exp_type, "dataset": dataset, "nn_model" : nn_model, "heterogeneity_type": heterogeneity_type, "num_clients": num_clients,
row_exp = pd.Series({"exp_type": exp_type, "dataset": dataset, "heterogeneity_type": heterogeneity_type, "num_clients": num_clients,
"num_samples_by_label": num_samples_by_label, "num_clusters": num_clusters, "centralized_epochs": centralized_epochs, "num_samples_by_label": num_samples_by_label, "num_clusters": num_clusters, "centralized_epochs": centralized_epochs,
"federated_rounds": federated_rounds, "seed": seed}) "federated_rounds": federated_rounds, "seed": seed})
output_name = row_exp.to_string(header=False, index=False, name=False).replace(' ', "").replace('\n','_') output_name = row_exp.to_string(header=False, index=False, name=False).replace(' ', "").replace('\n','_')
hash_outputname = get_uid(output_name) hash_outputname = get_uid(output_name)
pathlist = Path("results").rglob('*.json') pathlist = Path("results").rglob('*.json')
...@@ -39,7 +40,7 @@ def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_ ...@@ -39,7 +40,7 @@ def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_
if get_uid(str(file_name.stem)) == hash_outputname: if get_uid(str(file_name.stem)) == hash_outputname:
cprint(f"Experiment {str(file_name.stem)} already executed in with results in \n {output_name}.json", lvl="warning") print(f"Experiment {str(file_name.stem)} already executed in with results in \n {output_name}.json")
return return
try: try:
...@@ -48,8 +49,7 @@ def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_ ...@@ -48,8 +49,7 @@ def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_
except Exception as e: except Exception as e:
cprint(f"Could not run experiment with parameters {output_name}. Exception {e}") print(f"Could not run experiment with parameters {output_name}. Exception {e}")
return return
launch_experiment(model_server, list_clients, row_exp, output_name) launch_experiment(model_server, list_clients, row_exp, output_name)
...@@ -58,39 +58,42 @@ def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_ ...@@ -58,39 +58,42 @@ def main_driver(exp_type, dataset, heterogeneity_type, num_clients, num_samples_
def launch_experiment(model_server, list_clients, row_exp, output_name): def launch_experiment(model_server, list_clients, row_exp, output_name, save_results = True):
from src.utils_training import run_cfl_client_side, run_cfl_server_side from src.utils_training import run_cfl_client_side, run_cfl_server_side
from src.utils_training import run_benchmark from src.utils_training import run_benchmark
from src.utils_logging import cprint
str_row_exp = ':'.join(row_exp.to_string().replace('\n', '/').split()) str_row_exp = ':'.join(row_exp.to_string().replace('\n', '/').split())
if row_exp['exp_type'] == "benchmark": if row_exp['exp_type'] == "global-federated" or row_exp['exp_type'] == "pers-centralized":
cprint(f"Launching benchmark experiment with parameters:\n{str_row_exp}", lvl="info") print(f"Launching benchmark experiment with parameters:\n{str_row_exp}")
run_benchmark(list_clients, row_exp, output_name, main_model=model_server) df_results = run_benchmark(model_server, list_clients, row_exp)
elif row_exp['exp_type'] == "client": elif row_exp['exp_type'] == "client":
cprint(f"Launching client-side experiment with parameters:\n {str_row_exp}", lvl="info") print(f"Launching client-side experiment with parameters:\n {str_row_exp}")
run_cfl_client_side(model_server, list_clients, row_exp, output_name) df_results = run_cfl_client_side(model_server, list_clients, row_exp)
elif row_exp['exp_type'] == "server": elif row_exp['exp_type'] == "server":
cprint(f"Launching server-side experiment with parameters:\n {str_row_exp}", lvl="info") print(f"Launching server-side experiment with parameters:\n {str_row_exp}")
run_cfl_server_side(model_server, list_clients, row_exp, output_name) df_results = run_cfl_server_side(model_server, list_clients, row_exp)
else: else:
str_exp_type = row_exp['exp_type'] str_exp_type = row_exp['exp_type']
raise Exception(f"Unrecognized experiement type {str_exp_type}. Please check config file and try again.") raise Exception(f"Unrecognized experiement type {str_exp_type}. Please check config file and try again.")
return if save_results :
df_results.to_csv("results/" + output_name + ".csv")
return
if __name__ == "__main__": if __name__ == "__main__":
......
exp_type,dataset,heterogeneity_type,num_clients,num_samples_by_label,num_clusters,centralized_epochs,federated_rounds,seed exp_type,dataset,nn_model,heterogeneity_type,num_clients,num_samples_by_label,num_clusters,centralized_epochs,federated_rounds,seed
server,kmnist,concept-shift-on-labels,48,100,6,10,20,42 pers-centralized,cifar10,convolutional,concept-shift-on-features,48,100,4,200,0,42
server,fashion-mnist,concept-shift-on-labels,48,100,6,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,20,50,42
server,mnist,concept-shift-on-features,48,100,4,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,20,100,42
server,kmnist,concept-shift-on-features,48,100,4,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,20,150,42
server,fashion-mnist,concept-shift-on-features,48,100,4,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,20,200,42
server,mnist,labels-distribution-skew,48,100,4,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,50,50,42
server,kmnist,labels-distribution-skew,48,100,4,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,50,100,42
server,fashion-mnist,labels-distribution-skew,48,100,4,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,50,150,42
server,mnist,features-distribution-skew,48,100,3,10,20,42 global-federated,cifar10,convolutional,concept-shift-on-features,48,100,4,50,200,42
server,kmnist,features-distribution-skew,48,100,3,10,20,42 \ No newline at end of file
server,fashion-mnist,features-distribution-skew,48,100,3,10,20,42
server,mnist,quantity-skew,48,100,4,10,20,42
server,kmnist,quantity-skew,48,100,4,10,20,42
server,fashion-mnist,quantity-skew,48,100,4,10,20,42
client,mnist,concept-shift-on-labels,48,100,6,10,20,42
client,kmnist,concept-shift-on-labels,48,100,6,10,20,42
client,fashion-mnist,concept-shift-on-labels,48,100,6,10,20,42
client,mnist,concept-shift-on-features,48,100,4,10,20,42
client,kmnist,concept-shift-on-features,48,100,4,10,20,42
client,fashion-mnist,concept-shift-on-features,48,100,4,10,20,42
client,mnist,labels-distribution-skew,48,100,4,10,20,42
client,kmnist,labels-distribution-skew,48,100,4,10,20,42
client,fashion-mnist,labels-distribution-skew,48,100,4,10,20,42
client,mnist,features-distribution-skew,48,100,3,10,20,42
client,kmnist,features-distribution-skew,48,100,3,10,20,42
client,fashion-mnist,features-distribution-skew,48,100,3,10,20,42
client,mnist,quantity-skew,48,100,4,10,20,42
client,kmnist,quantity-skew,48,100,4,10,20,42
client,fashion-mnist,quantity-skew,48,100,4,10,20,42
benchmark,mnist,concept-shift-on-labels,48,100,6,10,50,42
benchmark,kmnist,concept-shift-on-labels,48,100,6,10,50,42
benchmark,fashion-mnist,concept-shift-on-labels,48,100,6,10,50,42
benchmark,mnist,concept-shift-on-features,48,100,4,10,50,42
benchmark,kmnist,concept-shift-on-features,48,100,4,10,50,42
benchmark,fashion-mnist,concept-shift-on-features,48,100,4,10,50,42
benchmark,mnist,labels-distribution-skew,48,100,4,10,50,42
benchmark,kmnist,labels-distribution-skew,48,100,4,10,50,42
benchmark,fashion-mnist,labels-distribution-skew,48,100,4,10,50,42
benchmark,mnist,features-distribution-skew,48,100,3,10,50,42
benchmark,kmnist,features-distribution-skew,48,100,3,10,50,42
benchmark,fashion-mnist,features-distribution-skew,48,100,3,10,50,42
benchmark,mnist,quantity-skew,48,100,4,10,50,42
benchmark,kmnist,quantity-skew,48,100,4,10,50,42
benchmark,fashion-mnist,quantity-skew,48,100,4,10,50,42
# Automatically generated by https://github.com/damnever/pigar. # Automatically generated by https://github.com/damnever/pigar.
imbalanced-learn==0.12.3 imbalanced-learn==0.12.3
inputimeout==1.0.4 inputimeout==1.0.4
kiwisolver==1.4.5 kiwisolver==1.4.5
matplotlib==3.9.0 matplotlib==3.9.0
numpy==1.26.4 numpy==1.26.4
opencv-python==4.10.0.84 opencv-python==4.10.0.84
pandas==2.2.2 pandas==2.2.2
scikit-learn==1.5.0 scikit-learn==1.5.0
scipy==1.14.0 scipy==1.14.0
tensorflow==2.16.2 tensorflow==2.16.2
...@@ -13,21 +13,22 @@ with open(csv_file, newline='') as csvfile: ...@@ -13,21 +13,22 @@ with open(csv_file, newline='') as csvfile:
row = next(reader) # Reading the second row row = next(reader) # Reading the second row
# Assigning CSV values to variables # Assigning CSV values to variables
exp_type, dataset, heterogeneity_type, num_clients, num_samples_by_label, num_clusters, centralized_epochs, federated_rounds, seed = row exp_type, dataset, nn_model, heterogeneity_type, num_clients, num_samples_by_label, num_clusters, centralized_epochs, federated_rounds, seed = row
# Building the command # Building the command
command = [ command = [
"python", "driver.py", "python", "driver.py",
"--exp_type", exp_type, "--exp_type", exp_type,
"--dataset", dataset, "--dataset", dataset,
"--nn_model", nn_model,
"--heterogeneity_type", heterogeneity_type, "--heterogeneity_type", heterogeneity_type,
"--num_clients", num_clients, "--num_clients", num_clients,
"--num_samples_by_label", num_samples_by_label, "--num_samples_by_label", num_samples_by_label,
"--num_clusters", num_clusters, "--num_clusters", num_clusters,
"--centralized_epochs", centralized_epochs, "--centralized_epochs", centralized_epochs,
"--federated_rounds", federated_rounds, "--federated_rounds", federated_rounds,
"--seed", seed "--seed", seed]
]
# Run the command # Run the command
subprocess.run(command) subprocess.run(command)
No preview for this file type
No preview for this file type
No preview for this file type
No preview for this file type
class Client: class Client:
# Define the client class """ Client Object used in the Fedearated Learning protocol
def __init__(self, client_id, data):
Attributes:
import numpy as np client_id: unique client identifier
data: client data in the form {'x': [], 'y' :[]) where x, and y are
respectively the features and labels of the dataset
"""
def __init__(self, client_id: int, data: dict):
"""Initialize the Client object
Arguments:
id : int
unique client identifier
data : dict
local data dict of the form {'x': [], 'y'[]}
model : nn.Module
The local nn model of the Client
cluster_id : int
ID of the cluster the client belong to or None if not applicable
heterogeneity_class: int
The ID of heterogeneity class the client's data belong to or None if not applicable
accuracy : float
The current client's model's accuracy based on a test set
"""
self.id = client_id self.id = client_id
self.data = data self.data = data
self.model = None self.model = None
...@@ -12,18 +34,55 @@ class Client: ...@@ -12,18 +34,55 @@ class Client:
self.heterogeneity_class = None self.heterogeneity_class = None
self.accuracy = 0 self.accuracy = 0
def __eq__(self, value: object) -> bool:
return (self.id == value.id and
self.model == value.model and
all((self.data['x'] == value.data['x']).flatten()) and
all((self.data['y'] == value.data['y']).flatten()) and
self.cluster_id == value.cluster_id and
self.heterogeneity_class == value.heterogeneity_class)
def to_dict(self): def to_dict(self):
"""Return a dictionary with the attributes of the Client """
return { return {
'id': self.id, 'id': self.id,
'cluster_id': self.cluster_id, 'cluster_id': self.cluster_id,
'heterogeneity_class': self.heterogeneity_class, 'heterogeneity_class': self.heterogeneity_class,
'accuracy': self.accuracy 'accuracy': self.accuracy
} }
class Server: class Server:
# Define the server class
def __init__(self,model,num_clusters=None): """ Server Object used in the Fedearated Learning protocol
self.model = model # initialize central server model
self.num_clusters = num_clusters # number of clusters defined Attributes:
self.clusters_models = {} # Dictionary of clusters models model: nn.Module
The nn learing model the server is associated with
\ No newline at end of file num_clusters: int
Number of clusters the server defines for a CFL protocol
"""
def __init__(self,model,num_clusters: int=None):
"""Initialize a Server object with an empty dictionary of cluster_models
Arguments:
model: nn.Module
The nn learing model the server is associated with
num_clusters: int
Number of clusters the server defines for a CFL protocol
"""
self.model = model
self.num_clusters = num_clusters
self.clusters_models = {}
def __eq__(self, value: object) -> bool:
return (str(self.model.state_dict()) == str(value.model.state_dict()) and
self.num_clusters == value.num_clusters and
self.clusters_models == value.clusters_models)
\ No newline at end of file
def calc_global_metrics(labels_true, labels_pred): def calc_global_metrics(labels_true: list, labels_pred: list) -> dict:
""" """ Calculate global metrics based on model weights
Calculate global metrics based on model weights
Arguments:
labels_true : list
list of ground truth labels
labels_pred : list
list of predicted labels to compare with ground truth
Returns:
a dictionary containing the following metrics:
'ARI', 'AMI', 'hom': homogeneity_score, 'cmpt': completness score, 'vm': v-measure
""" """
from sklearn.metrics import adjusted_rand_score, homogeneity_completeness_v_measure, adjusted_mutual_info_score from sklearn.metrics import adjusted_rand_score, homogeneity_completeness_v_measure, adjusted_mutual_info_score
homogeneity_score, completness_score, v_measure = homogeneity_completeness_v_measure(labels_true, labels_pred) homogeneity_score, completness_score, v_measure = homogeneity_completeness_v_measure(labels_true, labels_pred)
...@@ -14,30 +23,4 @@ def calc_global_metrics(labels_true, labels_pred): ...@@ -14,30 +23,4 @@ def calc_global_metrics(labels_true, labels_pred):
dict_metrics = {"ARI": ARI_score, "AMI": AMI_score, "hom": homogeneity_score, "cmplt": completness_score, "vm": v_measure} dict_metrics = {"ARI": ARI_score, "AMI": AMI_score, "hom": homogeneity_score, "cmplt": completness_score, "vm": v_measure}
return dict_metrics return dict_metrics
\ No newline at end of file
def report_CFL(list_clients, output_name):
"""
Save results as a csv
"""
import pandas as pd
df_results = pd.DataFrame.from_records([c.to_dict() for c in list_clients])
df_results.to_csv("results/" + output_name + ".csv")
return
def plot_mnist(image,label):
# Function to plot the mnist image
import matplotlib.pyplot as plt
plt.imshow(image, cmap='gray')
plt.title(f'MNIST Digit: {label}') # Add the label as the title
plt.axis('off') # Turn off axis
plt.show()
import torch
import torch.nn as nn import torch.nn as nn
import torch.nn.functional as F import torch.nn.functional as F
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class SimpleLinear(nn.Module):
# Simple fully connected neural network with ReLU activations with a single hidden layer of size 200
def __init__(self, h1=200):
super().__init__()
self.fc1 = nn.Linear(28*28, h1)
self.fc2 = nn.Linear(h1, 10)
def forward(self, x):
x = x.view(-1, 28 * 28)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
class SimpleConv(nn.Module): class ImageClassificationBase(nn.Module):
def __init__(self): def training_step(self, batch, device):
super(SimpleConv, self).__init__() images, labels = batch
# convolutional layer images, labels = images.to(device), labels.to(device).long()
self.conv1 = nn.Conv2d(3, 16, 3, padding=1) out = self(images)
self.conv2 = nn.Conv2d(16, 32, 3, padding = 1) loss = F.cross_entropy(out, labels)
self.conv3 = nn.Conv2d(32, 16, 3, padding = 1) return loss
# max pooling layer
self.pool = nn.MaxPool2d(2, 2) def validation_step(self, batch, device):
images, labels = batch
# Fully connected layer images, labels = images.to(device), labels.to(device).long()
self.fc1 = nn.Linear(16 * 4 * 4, 10) out = self(images)
loss = F.cross_entropy(out, labels)
acc = accuracy(out, labels)
return {'val_loss': loss.detach(), 'val_acc': acc}
# Dropout def validation_epoch_end(self, outputs):
self.dropout = nn.Dropout(p=0.2) batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean()
def flatten(self, x): batch_accs = [x['val_acc'] for x in outputs]
return x.reshape(x.size()[0], -1) epoch_acc = torch.stack(batch_accs).mean()
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def forward(self, x): def epoch_end(self, epoch, result):
# add sequence of convolutional and max pooling layers print("Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
x = self.dropout(self.pool(F.relu(self.conv1(x)))) epoch, result['train_loss'], result['val_loss'], result['val_acc']))
x = self.dropout(self.pool(F.relu(self.conv2(x))))
x = self.dropout(self.pool(F.relu(self.conv3(x))))
x = self.flatten(x)
x = self.fc1(x)
return F.log_softmax(x, dim=1) class GenericLinearModel(ImageClassificationBase):
\ No newline at end of file def __init__(self, in_size, n_channels):
super().__init__()
self.in_size = in_size
self.network = nn.Sequential(
nn.Linear(in_size * in_size, 200),
nn.Linear(200, 10)
)
def forward(self, xb):
xb = xb.view(-1, self.in_size * self.in_size)
return self.network(xb)
class GenericConvModel(ImageClassificationBase):
def __init__(self, in_size, n_channels):
super().__init__()
self.img_final_size = int(in_size / (2**3))
self.network = nn.Sequential(
nn.Conv2d(n_channels, 32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 64 x 16 x 16
nn.Dropout(0.25),
nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 128 x 8 x 8
nn.Dropout(0.25),
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2, 2), # output: 256 x 4 x 4
nn.Dropout(0.25),
nn.Flatten(),
nn.Linear(256 * self.img_final_size * self.img_final_size, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 10)
)
def forward(self, xb):
return self.network(xb)
\ No newline at end of file
This diff is collapsed.
from src.fedclass import Server
import torch
import torch.nn as nn
import pandas as pd
from torch.utils.data import DataLoader
def send_server_model_to_client(client_list, my_server): device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
"""
Function to copy server model to clients in standard FL def send_server_model_to_client(list_clients : list, my_server : Server) -> None:
""" Function to copy the Server model to client attributes in a FL protocol
Arguments:
list_clients : List of Client objects on which to set the parameter `model'
my_server : Server object with the model to copy
""" """
import copy import copy
for client in client_list: for client in list_clients:
setattr(client, 'model', copy.deepcopy(my_server.model)) setattr(client, 'model', copy.deepcopy(my_server.model))
return return
def send_cluster_models_to_clients(client_list , my_server): def send_cluster_models_to_clients(list_clients : list , my_server : Server) -> None:
""" """ Function to copy Server modelm to clients based on attribute client.cluster_id
Function to distribute cluster models to clients based on attribute client.cluster_id
Arguments:
list_clients : List of Clients to update
my_server : Server from which to fetch models
""" """
import copy import copy
for client in client_list: for client in list_clients:
if client.cluster_id is None: if client.cluster_id is None:
setattr(client, 'model', copy.deepcopy(my_server.model)) setattr(client, 'model', copy.deepcopy(my_server.model))
else: else:
...@@ -26,204 +41,258 @@ def send_cluster_models_to_clients(client_list , my_server): ...@@ -26,204 +41,258 @@ def send_cluster_models_to_clients(client_list , my_server):
return return
def model_avg(client_list): def model_avg(list_clients : list) -> nn.Module:
""" Utility function for the fed_avg function which creates a new model
with weights set to the weighted average of
Arguments:
list_clients : List of Client whose models we want to use to perform the weighted average
Returns:
New model with weights equal to the weighted average of those in the input Clients list
"""
import copy import copy
import torch import torch
# Create a new model with the weight average of clients' weights new_model = copy.deepcopy(list_clients[0].model)
new_model = copy.deepcopy(client_list[0].model)
total_data_size = sum(len(client.data_loader['train'].dataset) for client in list_clients)
# Initialize a variable to store the total size of all local training datasets
total_data_size = sum(len(client.data_loader['train'].dataset) for client in client_list)
# Iterate over the parameters of the new model
for name, param in new_model.named_parameters(): for name, param in new_model.named_parameters():
# Initialize the weighted averaged parameter with zeros
weighted_avg_param = torch.zeros_like(param) weighted_avg_param = torch.zeros_like(param)
# Accumulate the parameters across all clients, ponderated by local data size for client in list_clients:
for client in client_list:
# Calculate the weight based on the local data size
data_size = len(client.data_loader['train'].dataset) data_size = len(client.data_loader['train'].dataset)
weight = data_size / total_data_size
weight = data_size / total_data_size
# Add the weighted parameters of the current client
weighted_avg_param += client.model.state_dict()[name] * weight weighted_avg_param += client.model.state_dict()[name] * weight
# Assign the weighted averaged parameter to the new model param.data = weighted_avg_param #TODO: make more explicit
param.data = weighted_avg_param
return new_model return new_model
def fedavg(my_server,client_list): def fedavg(my_server : Server, list_clients : list) -> None:
""" """
Perform a weighted average of model parameters across clients, Implementation of the (Clustered) federated aggregation algorithm with one model per cluster.
where the weight is determined by the size of each client's The code modifies the cluster models `my_server.cluster_models[i]'
local training dataset. Return a new model with the averaged parameters.
Arguments:
my_server : Server model which contains the cluster models
Args: list_clients: List of clients, each containing a PyTorch model and a data loader.
client_list (list): List of clients, each containing a PyTorch model and a data loader.
Returns:
torch.nn.Module: A new PyTorch model with the weighted averaged parameters.
""" """
if my_server.num_clusters == None: if my_server.num_clusters == None:
# Initialize a new model
my_server.model = model_avg(client_list) my_server.model = model_avg(list_clients)
else : else :
for cluster_id in range(my_server.num_clusters): for cluster_id in range(my_server.num_clusters):
# Filter clients belonging to the current cluster cluster_clients_list = [client for client in list_clients if client.cluster_id == cluster_id]
cluster_client_list = [client for client in client_list if client.cluster_id == cluster_id]
if len(cluster_client_list)>0 : if len(cluster_clients_list)>0 :
my_server.clusters_models[cluster_id] = model_avg(cluster_client_list) my_server.clusters_models[cluster_id] = model_avg(cluster_clients_list)
return
def model_weight_matrix(list_clients : list) -> pd.DataFrame:
""" Create a weight matrix DataFrame using the weights of local federated models for use in the server-side CFL
Arguments:
list_clients: List of Clients with respective models
Returns:
DataFrame with weights of each model as rows
"""
# FOR SERVER-SIDE CFL
def model_weight_matrix(list_clients):
import numpy as np import numpy as np
import pandas as pd import pandas as pd
"""
Create a weight matrix DataFrame using the weights of local federated models
Parameters
----------
list_clients: List of Clients with respective models
all the federated system models
Returns
-------
pd.DataFrame
DataFrame with weights of each model as rows
"""
model_dict = {client.id : client.model for client in list_clients} model_dict = {client.id : client.model for client in list_clients}
# Collect the shapes of the model parameters
shapes = [param.data.numpy().shape for param in next(iter(model_dict.values())).parameters()]
# Create an empty NumPy matrix to store the weights shapes = [param.data.cpu().numpy().shape for param in next(iter(model_dict.values())).parameters()]
weight_matrix_np = np.empty((len(model_dict), sum(np.prod(shape) for shape in shapes))) weight_matrix_np = np.empty((len(model_dict), sum(np.prod(shape) for shape in shapes)))
# Iterate through the keys of model_dict
for idx, (_, model) in enumerate(model_dict.items()): for idx, (_, model) in enumerate(model_dict.items()):
# Extract model weights and flatten
model_weights = np.concatenate([param.data.numpy().flatten() for param in model.parameters()]) model_weights = np.concatenate([param.data.cpu().numpy().flatten() for param in model.parameters()])
weight_matrix_np[idx, :] = model_weights weight_matrix_np[idx, :] = model_weights
# Convert the NumPy matrix to a DataFrame
weight_matrix = pd.DataFrame(weight_matrix_np, columns=[f'w_{i+1}' for i in range(weight_matrix_np.shape[1])]) weight_matrix = pd.DataFrame(weight_matrix_np, columns=[f'w_{i+1}' for i in range(weight_matrix_np.shape[1])])
return weight_matrix return weight_matrix
def k_means_cluster_id(weight_matrix, k): def k_means_cluster_id(weight_matrix : pd.DataFrame, k : int, seed : int) -> pd.Series:
from sklearn.cluster import KMeans
""" """ Define cluster identites using k-means
Define cluster identites using k-means
---------- Args :
weight_matrix: DataFrame weight_matrix: Weight matrix of all federated models
Weight matrix of all federated models k: K-means parameter
k: Interger seed : Random seed to allow reproducibility
K-means parameter
Returns Returns:
-------
pd.DataFrame
Pandas Serie with cluster identity of each model Pandas Serie with cluster identity of each model
""" """
from sklearn.cluster import KMeans
# Initialize the KMeans model kmeans = KMeans(n_clusters=k, random_state=seed)
kmeans = KMeans(n_clusters=k, random_state=42)
# Fit the model to the standardized data
kmeans.fit(weight_matrix) kmeans.fit(weight_matrix)
# Add a new column to the DataFrame indicating the cluster assignment
weight_matrix['cluster'] = kmeans.labels_ weight_matrix['cluster'] = kmeans.labels_
clusters_identities = weight_matrix['cluster'] clusters_identities = weight_matrix['cluster']
return clusters_identities return clusters_identities
def k_means_clustering(client_list,num_clusters): def k_means_clustering(list_clients : list, num_clusters : int, seed : int) -> None:
import pickle
weight_matrix = model_weight_matrix(client_list) """ Performs a k-mean clustering and sets the cluser_id attribute to clients based on the result
clusters_identities = k_means_cluster_id(weight_matrix, num_clusters)
for client in client_list : Arguments:
setattr(client, 'cluster_id',clusters_identities[client.id]) list_clients : List of Clients on which to perform clustering
num_clusters : Parameter to set the number of clusters needed
seed : Random seed to allow reproducibility
"""
weight_matrix = model_weight_matrix(list_clients)
# FOR CLIENT-SIDE CFL clusters_identities = k_means_cluster_id(weight_matrix, num_clusters, seed)
def init_server_cluster(my_server, client_list, row_exp, p_expert_opinion=None): for client in list_clients :
setattr(client, 'cluster_id',clusters_identities[client.id])
""" return
Assign clients to initial clusters using a given distribution or completely at random.
def init_server_cluster(my_server : Server, list_clients : list, row_exp : dict, imgs_params: dict, p_expert_opinion : float = 0) -> None:
""" Function to initialize cluster membership for client-side CFL (sets param cluster id)
using a given distribution or completely at random.
Arguments:
my_server : Server model containing one model per cluster
list_clients : List of Clients whose model we want to initialize
row_exp : Dictionary containing the different global experiement parameters
p_expert_opintion : Parameter to avoid completly random assignment if neeed (default to 0)
""" """
from src.models import SimpleLinear from src.models import GenericLinearModel, GenericConvModel
import numpy as np import numpy as np
import copy import copy
import torch
torch.manual_seed(row_exp['seed']) np.random.seed(row_exp['seed'])
list_heterogeneities = list(set([c.heterogeneity_class for c in client_list])) list_heterogeneities = list(dict.fromkeys([client.heterogeneity_class for client in list_clients]))
if not p_expert_opinion: if not p_expert_opinion or p_expert_opinion == 0:
p_expert_opinion = 1 / row_exp['num_clusters'] p_expert_opinion = 1 / row_exp['num_clusters']
p_rest = (1 - p_expert_opinion) / (row_exp['num_clusters'] - 1) p_rest = (1 - p_expert_opinion) / (row_exp['num_clusters'] - 1)
my_server.num_clusters = row_exp['num_clusters'] my_server.num_clusters = row_exp['num_clusters']
my_server.clusters_models = {cluster_id: GenericConvModel(in_size=imgs_params[0], n_channels=imgs_params[1]) for cluster_id in range(row_exp['num_clusters'])}
my_server.clusters_models = {cluster_id: SimpleLinear(h1=200) for cluster_id in range(row_exp['num_clusters'])} for client in list_clients:
for client in client_list:
probs = [p_rest if x != list_heterogeneities.index(client.heterogeneity_class) % row_exp['num_clusters'] probs = [p_rest if x != list_heterogeneities.index(client.heterogeneity_class) % row_exp['num_clusters']
else p_expert_opinion for x in range(row_exp['num_clusters'])] else p_expert_opinion for x in range(row_exp['num_clusters'])]
client.cluster_id = np.random.choice(range(row_exp['num_clusters']), p = probs) client.cluster_id = np.random.choice(range(row_exp['num_clusters']), p = probs)
client.model = copy.deepcopy(my_server.clusters_models[client.cluster_id]) client.model = copy.deepcopy(my_server.clusters_models[client.cluster_id])
return
def set_client_cluster(my_server, client_list, row_exp): def loss_calculation(model : nn.modules, train_loader : DataLoader) -> float:
"""
Use the loss to calculate the cluster membership for client-side CFL """ Utility function to calculate average_loss across all samples <train_loader>
"""
Arguments:
model : the input server model
train_loader : DataLoader with the dataset to use for loss calculation
"""
import torch
import torch.nn as nn
criterion = nn.CrossEntropyLoss()
model.to(device)
model.eval()
total_loss = 0.0
total_samples = 0
with torch.no_grad():
for inputs, targets in train_loader:
inputs, targets = inputs.to(device), targets.to(device).long()
outputs = model(inputs)
loss = criterion(outputs, targets)
total_loss += loss.item() * inputs.size(0)
total_samples += inputs.size(0)
average_loss = total_loss / total_samples
return average_loss
def set_client_cluster(my_server : Server, list_clients : list, row_exp : dict) -> None:
""" Function to calculate cluster membership for client-side CFL (sets param cluster id)
from src.utils_training import loss_calculation Arguments:
my_server : Server model containing one model per cluster
list_clients : List of Clients whose model we want to initialize
row_exp : Dictionary containing the different global experiement parameters
"""
import numpy as np import numpy as np
import copy import copy
for client in client_list: for client in list_clients:
cluster_losses = [] cluster_losses = []
for cluster_id in range(row_exp['num_clusters']): for cluster_id in range(row_exp['num_clusters']):
cluster_loss = loss_calculation(my_server.clusters_models[cluster_id], client.data_loader['train'], row_exp) cluster_loss = loss_calculation(my_server.clusters_models[cluster_id], client.data_loader['train'])
cluster_losses.append(cluster_loss) cluster_losses.append(cluster_loss)
index_of_min_loss = np.argmin(cluster_losses) index_of_min_loss = np.argmin(cluster_losses)
#print(f"client {client.id} with heterogeneity {client.heterogeneity_class} cluster losses:", cluster_losses)
client.model = copy.deepcopy(my_server.clusters_models[index_of_min_loss]) client.model = copy.deepcopy(my_server.clusters_models[index_of_min_loss])
client.cluster_id = index_of_min_loss client.cluster_id = index_of_min_loss
...@@ -16,7 +16,7 @@ def cprint(msg: str, lvl: str = "info") -> None: ...@@ -16,7 +16,7 @@ def cprint(msg: str, lvl: str = "info") -> None:
""" """
Print message to the console at the desired logging level. Print message to the console at the desired logging level.
Args: Arguments:
msg (str): Message to print. msg (str): Message to print.
lvl (str): Logging level between "debug", "info", "warning", "error" and "critical". lvl (str): Logging level between "debug", "info", "warning", "error" and "critical".
The default value is "info". The default value is "info".
......
from pandas import DataFrame from pandas import DataFrame
from pathlib import Path from pathlib import Path
from torch import tensor
def save_histograms():
""" def save_histograms() -> None:
Read result files and save all histogram plots
"""Read csv files found in 'results/' and generates and saves histogram plots of clients assignemnts
Raises :
Warning when the csv file is not of the expected format (code generated results csv)
""" """
import pandas as pd import pandas as pd
...@@ -28,13 +31,15 @@ def save_histograms(): ...@@ -28,13 +31,15 @@ def save_histograms():
print(f"Error: Unable to open result file {file_path}.",e) print(f"Error: Unable to open result file {file_path}.",e)
continue continue
return return
def get_clusters(df_results): def get_clusters(df_results : DataFrame) -> list:
""" Function to returns a list of clusters ranging from 0 to max_cluster (uses: append_empty_clusters())
"""
list_clusters = list(df_results['cluster_id'].unique()) list_clusters = list(df_results['cluster_id'].unique())
list_clusters = append_empty_clusters(list_clusters) list_clusters = append_empty_clusters(list_clusters)
...@@ -42,9 +47,15 @@ def get_clusters(df_results): ...@@ -42,9 +47,15 @@ def get_clusters(df_results):
return list_clusters return list_clusters
def append_empty_clusters(list_clusters): def append_empty_clusters(list_clusters : list) -> list:
""" """
Handle the situation where some clusters are empty by appending the clusters ID Utility function for ``get_clusters'' to handle the situation where some clusters are empty by appending the clusters ID
Arguments:
list_clusters: List of clusters with clients
Returns:
List of clusters with or without clients
""" """
list_clusters_int = [int(x) for x in list_clusters] list_clusters_int = [int(x) for x in list_clusters]
...@@ -61,8 +72,10 @@ def append_empty_clusters(list_clusters): ...@@ -61,8 +72,10 @@ def append_empty_clusters(list_clusters):
def get_z_nclients(df_results, x_het, y_clust, labels_heterogeneity): def get_z_nclients(df_results : dict, x_het : list, y_clust : list, labels_heterogeneity : list) -> list:
""" Returns the number of clients associated with a given heterogeneity class for each cluster"""
z_nclients = [0]* len(x_het) z_nclients = [0]* len(x_het)
for i in range(len(z_nclients)): for i in range(len(z_nclients)):
...@@ -74,12 +87,31 @@ def get_z_nclients(df_results, x_het, y_clust, labels_heterogeneity): ...@@ -74,12 +87,31 @@ def get_z_nclients(df_results, x_het, y_clust, labels_heterogeneity):
def plot_histogram_clusters(df_results: DataFrame, title): def plot_img(img : tensor) -> None:
"""Utility function to plot an image of any shape"""
from torchvision import transforms
import matplotlib.pyplot as plt
plt.imshow(transforms.ToPILImage()(img))
def plot_histogram_clusters(df_results: DataFrame, title : str) -> None:
""" Function to create 3D Histograms of clients to cluster assignments showing client's heterogeneity class
Arguments:
df_results : DataFrame containing all parameters from the resulting csv files
title : The plot title. The image is saved in results/plots/histogram_' + title + '.png'
"""
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
labels_heterogeneity = list(df_results['heterogeneity_class'].unique()) labels_heterogeneity = list(df_results['heterogeneity_class'].unique())
bar_width = bar_depth = 0.5 bar_width = bar_depth = 0.5
...@@ -122,10 +154,15 @@ def plot_histogram_clusters(df_results: DataFrame, title): ...@@ -122,10 +154,15 @@ def plot_histogram_clusters(df_results: DataFrame, title):
plt.title(title, fontdict=None, loc='center', pad=None) plt.title(title, fontdict=None, loc='center', pad=None)
plt.savefig('results/plots/histogram_' + title + '.png') plt.savefig('results/plots/histogram_' + title + '.png')
plt.close() plt.close()
return
def normalize_results(results_accuracy : float, results_std : float) -> int:
def normalize_results(results_accuracy, results_std): """Utility function to convert float accuracy and std to percentage """
if results_accuracy < 1: if results_accuracy < 1:
...@@ -136,8 +173,10 @@ def normalize_results(results_accuracy, results_std): ...@@ -136,8 +173,10 @@ def normalize_results(results_accuracy, results_std):
return results_accuracy, results_std return results_accuracy, results_std
def summarize_results(): def summarize_results() -> None:
""" Creates results summary of all the results files under "results/summarized_results.csv"""
from pathlib import Path from pathlib import Path
import pandas as pd import pandas as pd
from numpy import mean, std from numpy import mean, std
...@@ -164,9 +203,9 @@ def summarize_results(): ...@@ -164,9 +203,9 @@ def summarize_results():
list_params = path.stem.split('_') list_params = path.stem.split('_')
dict_exp_results = {"exp_type" : list_params[0], "dataset": list_params[1], "dataset_type": list_params[2], "number_of_clients": list_params[3], dict_exp_results = {"exp_type" : list_params[0], "dataset": list_params[1], "nn_model" : list_params[2], "dataset_type": list_params[3], "number_of_clients": list_params[4],
"samples by_client": list_params[4], "num_clusters": list_params[5], "centralized_epochs": list_params[6], "samples by_client": list_params[5], "num_clusters": list_params[6], "centralized_epochs": list_params[7],
"federated_rounds": list_params[7],"accuracy": accuracy} "federated_rounds": list_params[8],"accuracy": accuracy}
try: try:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment