Skip to content
Snippets Groups Projects
Commit f53c201a authored by Maël Madon's avatar Maël Madon
Browse files

slight changes in the workload selection

parent c3fba52c
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:forced-resolution tags:
# Downloading and preparing the workload and platform
## Workload
We use the reconverted log `METACENTRUM-2013-3.swf` available on [Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html).
%% Cell type:code id:f66eb756 tags:
``` python
# Download the workload
!curl https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/METACENTRUM-2013-3.swf.gz \
--output workload/METACENTRUM-2013-3.swf.gz
```
%% Cell type:code id:bound-harvey tags:
``` python
# Unzip the workload
!gunzip workload/METACENTRUM-2013-3.swf.gz
```
%% Cell type:markdown id:graphic-rabbit tags:
As mentionned in the [original paper releasing the log](https://www.cs.huji.ac.il/~feit/parsched/jsspp15/p5-klusacek.pdf), the platform is **very heterogeneous**. For the purpose of our study, we perform the following selection:
It is a 2-year-long trace from MetaCentrum, the national grid of the Czech republic. As mentionned in the [original paper releasing the log](https://www.cs.huji.ac.il/~feit/parsched/jsspp15/p5-klusacek.pdf), the platform is **very heterogeneous** and underwent majors changes during the logging period. For the purpose of our study, we perform the following selection.
First:
- we remove from the workload all the clusters whose nodes have **more than 16 cores**
- we remove from the workload the jobs with a **execution time greater than one day**
- we truncate the workload to keep only 6 month (June to November 2014) where no major change was performed in the infrastructure (no cluster < 16 cores added nor removed, no reconfiguration in the scheduling system)
Second:
- we remove from the workload the jobs with an **execution time greater than one day**
- we remove from the workload the jobs with a **number of requested cores greater than 16**
To do so, we use a home made SWF parser.
%% Cell type:code id:6ec15ee8 tags:
%% Cell type:code id:ff40dcdd tags:
``` python
! ./0_prepare_workload/swf_moulinette.py workload/METACENTRUM-2013-3.swf -o workload/MC_selection_article.swf \
--keep_only="nb_res <= 16 and run_time <= 24*3600" \
from time import *
begin_trace = 1356994806 # according to original SWF header
jun1_unix_time, nov30_unix_time = mktime(strptime('Sun Jun 1 00:00:00 2014')), mktime(strptime('Sun Nov 30 23:59:59 2014'))
jun1, nov30 = (int) (jun1_unix_time - begin_trace), (int) (nov30_unix_time - begin_trace)
print("Unix Time Jun 1st 2014: {:.0f}".format( jun1_unix_time ))
print("Unix Time Nov 30th 2014: {:.0f}".format( nov30_unix_time ))
print("We should keep all the jobs submitted between {:d} and {:d}".format(jun1, nov30))
# Create a swf with only the selected clusters and the 6 selected months
! scripts/swf_moulinette.py workload/METACENTRUM-2013-3.swf \
-o workload/METACENTRUM_6months.swf \
--keep_only="submit_time >= {jun1} and submit_time <= {nov30}" \
--partitions_to_select 1 2 3 5 7 8 9 10 11 12 14 15 18 19 20 21 22 23 25 26 31
```
%% Output
Unix Time Jun 1st 2014: 1401573600
Unix Time Nov 30th 2014: 1417388399
We should keep all the jobs submitted between 44578794 and 60393593
%% Cell type:code id:6ec15ee8 tags:
``` python
# Keep only the selected jobs
! ./scripts/swf_moulinette.py workload/METACENTRUM_6months.swf \
-o workload/MC_selection_article.swf \
--keep_only="nb_res <= 16 and run_time <= 24*3600"
```
%% Output
Processing swf line 100000
Processing swf line 200000
Processing swf line 300000
Processing swf line 400000
Processing swf line 500000
Processing swf line 600000
Processing swf line 700000
Processing swf line 800000
Processing swf line 900000
Processing swf line 1000000
Processing swf line 1100000
Processing swf line 1200000
Processing swf line 1300000
Processing swf line 1400000
Processing swf line 1500000
Processing swf line 1600000
Processing swf line 1700000
Processing swf line 1800000
Processing swf line 1900000
Processing swf line 2000000
Processing swf line 2100000
Processing swf line 2200000
Processing swf line 2300000
Processing swf line 2400000
Processing swf line 2500000
Processing swf line 2600000
Processing swf line 2700000
Processing swf line 2800000
Processing swf line 2900000
Processing swf line 3000000
Processing swf line 3100000
Processing swf line 3200000
Processing swf line 3300000
Processing swf line 3400000
Processing swf line 3500000
Processing swf line 3600000
Processing swf line 3700000
Processing swf line 3800000
Processing swf line 3900000
Processing swf line 4000000
Processing swf line 4100000
Processing swf line 4200000
Processing swf line 4300000
Processing swf line 4400000
Processing swf line 4500000
Processing swf line 4600000
Processing swf line 4700000
Processing swf line 4800000
Processing swf line 4900000
Processing swf line 5000000
Processing swf line 5100000
Processing swf line 5200000
Processing swf line 5300000
Processing swf line 5400000
Processing swf line 5500000
Processing swf line 5600000
Processing swf line 5700000
-------------------
End parsing
Total 4836170 jobs and 838 users have been created.
Total number of core-hours: 16805967
887919 valid jobs were not selected (keep_only) for 77201656 core-hour
Jobs not selected: 15.5% in number, 82.1% in core-hour
7119 out of 5731209 lines in the file did not match the swf format
30 jobs were not valid
%% Cell type:markdown id:afde35e8 tags:
## Platform
According to the system specifications given in the [corresponding page in Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html), if we exclude nodes with >16 cores, there are $\#cores_{total} = 6416$ cores on May 1st 2014.(1)
We build a platform file adapted to the remaining workload. We choose to make it homogeneous with 16-core nodes. To have a coherent number of nodes, we count:
$\#nodes = \frac{\#cores_{total} * \%kept_{core.hour}}{\#corePerNode} = 6416 * .294 / 16 = 118$
In SimGrid platform language, this corresponds to such a cluster:
```xml
<cluster id="cluster_MC" prefix="MC_" suffix="" radical="0-117" core="16">
```
The corresponding SimGrid platform file can be found in `platform/average_metacentrum.xml`.
(1) clusters decomissionned before or comissionned after May 1st 2014 have also been removed: $8+480+160+1792+256+576+88+416+108+168+752+112+588+152+160+160+192+24+224 = 6416$
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment