Skip to content
Snippets Groups Projects
Commit 1cbb23b1 authored by Georges Da Costa's avatar Georges Da Costa
Browse files

Cleans the jupyter file

parent a0a36145
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:forced-resolution tags: %% Cell type:markdown id:forced-resolution tags:
# Downloading and preparing the workload and platform # Downloading and preparing the workload and platform
## Workload ## Workload
We use the reconverted log `METACENTRUM-2013-3.swf` available on [Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html). We use the reconverted log `METACENTRUM-2013-3.swf` available on [Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html).
%% Cell type:code id:f66eb756 tags: %% Cell type:code id:f66eb756 tags:
``` python ``` python
# Download the workload (548.3 MB unzipped) # Download the workload (548.3 MB unzipped)
!wget https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/METACENTRUM-2013-3.swf.gz \ !wget https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/METACENTRUM-2013-3.swf.gz \
--output-file workload/METACENTRUM-2013-3.swf.gz --no-check-certificate -nc -P workload workload/METACENTRUM-2013-3.swf.gz
```
%% Cell type:code id:a8982775 tags:
``` python
!wget --help
```
%% Output
GNU Wget 1.21.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit
-h, --help print this help
-b, --background go to background after startup
-e, --execute=COMMAND execute a `.wgetrc'-style command
Logging and input file:
-o, --output-file=FILE log messages to FILE
-a, --append-output=FILE append messages to FILE
-d, --debug print lots of debugging information
-q, --quiet quiet (no output)
-v, --verbose be verbose (this is the default)
-nv, --no-verbose turn off verboseness, without being quiet
--report-speed=TYPE output bandwidth as TYPE. TYPE can be bits
-i, --input-file=FILE download URLs found in local or external FILE
-F, --force-html treat input file as HTML
-B, --base=URL resolves HTML input-file links (-i -F)
relative to URL
--config=FILE specify config file to use
--no-config do not read any config file
--rejected-log=FILE log reasons for URL rejection to FILE
Download:
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits)
--retry-connrefused retry even if connection is refused
--retry-on-http-error=ERRORS comma-separated list of HTTP errors to retry
-O, --output-document=FILE write documents to FILE
-nc, --no-clobber skip downloads that would download to
existing files (overwriting them)
--no-netrc don't try to obtain credentials from .netrc
-c, --continue resume getting a partially-downloaded file
--start-pos=OFFSET start downloading from zero-based position OFFSET
--progress=TYPE select progress gauge type
--show-progress display the progress bar in any verbosity mode
-N, --timestamping don't re-retrieve files unless newer than
local
--no-if-modified-since don't use conditional if-modified-since get
requests in timestamping mode
--no-use-server-timestamps don't set the local file's timestamp by
the one on the server
-S, --server-response print server response
--spider don't download anything
-T, --timeout=SECONDS set all timeout values to SECONDS
--dns-timeout=SECS set the DNS lookup timeout to SECS
--connect-timeout=SECS set the connect timeout to SECS
--read-timeout=SECS set the read timeout to SECS
-w, --wait=SECONDS wait SECONDS between retrievals
(applies if more then 1 URL is to be retrieved)
--waitretry=SECONDS wait 1..SECONDS between retries of a retrieval
(applies if more then 1 URL is to be retrieved)
--random-wait wait from 0.5*WAIT...1.5*WAIT secs between retrievals
(applies if more then 1 URL is to be retrieved)
--no-proxy explicitly turn off proxy
-Q, --quota=NUMBER set retrieval quota to NUMBER
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host
--limit-rate=RATE limit download rate to RATE
--no-dns-cache disable caching DNS lookups
--restrict-file-names=OS restrict chars in file names to ones OS allows
--ignore-case ignore case when matching files/directories
-4, --inet4-only connect only to IPv4 addresses
-6, --inet6-only connect only to IPv6 addresses
--prefer-family=FAMILY connect first to addresses of specified family,
one of IPv6, IPv4, or none
--user=USER set both ftp and http user to USER
--password=PASS set both ftp and http password to PASS
--ask-password prompt for passwords
--use-askpass=COMMAND specify credential handler for requesting
username and password. If no COMMAND is
specified the WGET_ASKPASS or the SSH_ASKPASS
environment variable is used.
--no-iri turn off IRI support
--local-encoding=ENC use ENC as the local encoding for IRIs
--remote-encoding=ENC use ENC as the default remote encoding
--unlink remove file before clobber
--xattr turn on storage of metadata in extended file attributes
Directories:
-nd, --no-directories don't create directories
-x, --force-directories force creation of directories
-nH, --no-host-directories don't create host directories
--protocol-directories use protocol name in directories
-P, --directory-prefix=PREFIX save files to PREFIX/..
--cut-dirs=NUMBER ignore NUMBER remote directory components
HTTP options:
--http-user=USER set http user to USER
--http-password=PASS set http password to PASS
--no-cache disallow server-cached data
--default-page=NAME change the default page name (normally
this is 'index.html'.)
-E, --adjust-extension save HTML/CSS documents with proper extensions
--ignore-length ignore 'Content-Length' header field
--header=STRING insert STRING among the headers
--compression=TYPE choose compression, one of auto, gzip and none. (default: none)
--max-redirect maximum redirections allowed per page
--proxy-user=USER set USER as proxy username
--proxy-password=PASS set PASS as proxy password
--referer=URL include 'Referer: URL' header in HTTP request
--save-headers save the HTTP headers to file
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION
--no-http-keep-alive disable HTTP keep-alive (persistent connections)
--no-cookies don't use cookies
--load-cookies=FILE load cookies from FILE before session
--save-cookies=FILE save cookies to FILE after session
--keep-session-cookies load and save session (non-permanent) cookies
--post-data=STRING use the POST method; send STRING as the data
--post-file=FILE use the POST method; send contents of FILE
--method=HTTPMethod use method "HTTPMethod" in the request
--body-data=STRING send STRING as data. --method MUST be set
--body-file=FILE send contents of FILE. --method MUST be set
--content-disposition honor the Content-Disposition header when
choosing local file names (EXPERIMENTAL)
--content-on-error output the received content on server errors
--auth-no-challenge send Basic HTTP authentication information
without first waiting for the server's
challenge
HTTPS (SSL/TLS) options:
--secure-protocol=PR choose secure protocol, one of auto, SSLv2,
SSLv3, TLSv1, TLSv1_1, TLSv1_2 and PFS
--https-only only follow secure HTTPS links
--no-check-certificate don't validate the server's certificate
--certificate=FILE client certificate file
--certificate-type=TYPE client certificate type, PEM or DER
--private-key=FILE private key file
--private-key-type=TYPE private key type, PEM or DER
--ca-certificate=FILE file with the bundle of CAs
--ca-directory=DIR directory where hash list of CAs is stored
--crl-file=FILE file with bundle of CRLs
--pinnedpubkey=FILE/HASHES Public key (PEM/DER) file, or any number
of base64 encoded sha256 hashes preceded by
'sha256//' and separated by ';', to verify
peer against
--random-file=FILE file with random data for seeding the SSL PRNG
--ciphers=STR Set the priority string (GnuTLS) or cipher list string (OpenSSL) directly.
Use with care. This option overrides --secure-protocol.
The format and syntax of this string depend on the specific SSL/TLS engine.
HSTS options:
--no-hsts disable HSTS
--hsts-file path of HSTS database (will override default)
FTP options:
--ftp-user=USER set ftp user to USER
--ftp-password=PASS set ftp password to PASS
--no-remove-listing don't remove '.listing' files
--no-glob turn off FTP file name globbing
--no-passive-ftp disable the "passive" transfer mode
--preserve-permissions preserve remote file permissions
--retr-symlinks when recursing, get linked-to files (not dir)
FTPS options:
--ftps-implicit use implicit FTPS (default port is 990)
--ftps-resume-ssl resume the SSL/TLS session started in the control connection when
opening a data connection
--ftps-clear-data-connection cipher the control channel only; all the data will be in plaintext
--ftps-fallback-to-ftp fall back to FTP if FTPS is not supported in the target server
WARC options:
--warc-file=FILENAME save request/response data to a .warc.gz file
--warc-header=STRING insert STRING into the warcinfo record
--warc-max-size=NUMBER set maximum size of WARC files to NUMBER
--warc-cdx write CDX index files
--warc-dedup=FILENAME do not store records listed in this CDX file
--no-warc-compression do not compress WARC files with GZIP
--no-warc-digests do not calculate SHA1 digests
--no-warc-keep-log do not store the log file in a WARC record
--warc-tempdir=DIRECTORY location for temporary files created by the
WARC writer
Recursive download:
-r, --recursive specify recursive download
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)
--delete-after delete files locally after downloading them
-k, --convert-links make links in downloaded HTML or CSS point to
local files
--convert-file-only convert the file part of the URLs only (usually known as the basename)
--backups=N before writing file X, rotate up to N backup files
-K, --backup-converted before converting file X, back up as X.orig
-m, --mirror shortcut for -N -r -l inf --no-remove-listing
-p, --page-requisites get all images, etc. needed to display HTML page
--strict-comments turn on strict (SGML) handling of HTML comments
Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions
-R, --reject=LIST comma-separated list of rejected extensions
--accept-regex=REGEX regex matching accepted URLs
--reject-regex=REGEX regex matching rejected URLs
--regex-type=TYPE regex type (posix|pcre)
-D, --domains=LIST comma-separated list of accepted domains
--exclude-domains=LIST comma-separated list of rejected domains
--follow-ftp follow FTP links from HTML documents
--follow-tags=LIST comma-separated list of followed HTML tags
--ignore-tags=LIST comma-separated list of ignored HTML tags
-H, --span-hosts go to foreign hosts when recursive
-L, --relative follow relative links only
-I, --include-directories=LIST list of allowed directories
--trust-server-names use the name specified by the redirection
URL's last component
-X, --exclude-directories=LIST list of excluded directories
-np, --no-parent don't ascend to the parent directory
Email bug reports, questions, discussions to <bug-wget@gnu.org>
and/or open issues at https://savannah.gnu.org/bugs/?func=additem&group=wget.
%% Cell type:code id:e75c1fdf tags:
``` python
``` ```
%% Cell type:code id:bound-harvey tags: %% Cell type:code id:bound-harvey tags:
``` python ``` python
# Unzip the workload # Unzip the workload
!gunzip workload/METACENTRUM-2013-3.swf.gz !gunzip workload/METACENTRUM-2013-3.swf.gz
``` ```
%% Output %% Output
gzip: workload/METACENTRUM-2013-3.swf already exists; do you wish to overwrite (y or n)? ^C gzip: workload/METACENTRUM-2013-3.swf already exists; do you wish to overwrite (y or n)? ^C
%% Cell type:code id:d4cd4f2c tags:
``` python
!wget --help
```
%% Output
GNU Wget 1.21.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit
-h, --help print this help
-b, --background go to background after startup
-e, --execute=COMMAND execute a `.wgetrc'-style command
Logging and input file:
-o, --output-file=FILE log messages to FILE
-a, --append-output=FILE append messages to FILE
-d, --debug print lots of debugging information
-q, --quiet quiet (no output)
-v, --verbose be verbose (this is the default)
-nv, --no-verbose turn off verboseness, without being quiet
--report-speed=TYPE output bandwidth as TYPE. TYPE can be bits
-i, --input-file=FILE download URLs found in local or external FILE
-F, --force-html treat input file as HTML
-B, --base=URL resolves HTML input-file links (-i -F)
relative to URL
--config=FILE specify config file to use
--no-config do not read any config file
--rejected-log=FILE log reasons for URL rejection to FILE
Download:
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits)
--retry-connrefused retry even if connection is refused
--retry-on-http-error=ERRORS comma-separated list of HTTP errors to retry
-O, --output-document=FILE write documents to FILE
-nc, --no-clobber skip downloads that would download to
existing files (overwriting them)
--no-netrc don't try to obtain credentials from .netrc
-c, --continue resume getting a partially-downloaded file
--start-pos=OFFSET start downloading from zero-based position OFFSET
--progress=TYPE select progress gauge type
--show-progress display the progress bar in any verbosity mode
-N, --timestamping don't re-retrieve files unless newer than
local
--no-if-modified-since don't use conditional if-modified-since get
requests in timestamping mode
--no-use-server-timestamps don't set the local file's timestamp by
the one on the server
-S, --server-response print server response
--spider don't download anything
-T, --timeout=SECONDS set all timeout values to SECONDS
--dns-timeout=SECS set the DNS lookup timeout to SECS
--connect-timeout=SECS set the connect timeout to SECS
--read-timeout=SECS set the read timeout to SECS
-w, --wait=SECONDS wait SECONDS between retrievals
(applies if more then 1 URL is to be retrieved)
--waitretry=SECONDS wait 1..SECONDS between retries of a retrieval
(applies if more then 1 URL is to be retrieved)
--random-wait wait from 0.5*WAIT...1.5*WAIT secs between retrievals
(applies if more then 1 URL is to be retrieved)
--no-proxy explicitly turn off proxy
-Q, --quota=NUMBER set retrieval quota to NUMBER
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host
--limit-rate=RATE limit download rate to RATE
--no-dns-cache disable caching DNS lookups
--restrict-file-names=OS restrict chars in file names to ones OS allows
--ignore-case ignore case when matching files/directories
-4, --inet4-only connect only to IPv4 addresses
-6, --inet6-only connect only to IPv6 addresses
--prefer-family=FAMILY connect first to addresses of specified family,
one of IPv6, IPv4, or none
--user=USER set both ftp and http user to USER
--password=PASS set both ftp and http password to PASS
--ask-password prompt for passwords
--use-askpass=COMMAND specify credential handler for requesting
username and password. If no COMMAND is
specified the WGET_ASKPASS or the SSH_ASKPASS
environment variable is used.
--no-iri turn off IRI support
--local-encoding=ENC use ENC as the local encoding for IRIs
--remote-encoding=ENC use ENC as the default remote encoding
--unlink remove file before clobber
--xattr turn on storage of metadata in extended file attributes
Directories:
-nd, --no-directories don't create directories
-x, --force-directories force creation of directories
-nH, --no-host-directories don't create host directories
--protocol-directories use protocol name in directories
-P, --directory-prefix=PREFIX save files to PREFIX/..
--cut-dirs=NUMBER ignore NUMBER remote directory components
HTTP options:
--http-user=USER set http user to USER
--http-password=PASS set http password to PASS
--no-cache disallow server-cached data
--default-page=NAME change the default page name (normally
this is 'index.html'.)
-E, --adjust-extension save HTML/CSS documents with proper extensions
--ignore-length ignore 'Content-Length' header field
--header=STRING insert STRING among the headers
--compression=TYPE choose compression, one of auto, gzip and none. (default: none)
--max-redirect maximum redirections allowed per page
--proxy-user=USER set USER as proxy username
--proxy-password=PASS set PASS as proxy password
--referer=URL include 'Referer: URL' header in HTTP request
--save-headers save the HTTP headers to file
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION
--no-http-keep-alive disable HTTP keep-alive (persistent connections)
--no-cookies don't use cookies
--load-cookies=FILE load cookies from FILE before session
--save-cookies=FILE save cookies to FILE after session
--keep-session-cookies load and save session (non-permanent) cookies
--post-data=STRING use the POST method; send STRING as the data
--post-file=FILE use the POST method; send contents of FILE
--method=HTTPMethod use method "HTTPMethod" in the request
--body-data=STRING send STRING as data. --method MUST be set
--body-file=FILE send contents of FILE. --method MUST be set
--content-disposition honor the Content-Disposition header when
choosing local file names (EXPERIMENTAL)
--content-on-error output the received content on server errors
--auth-no-challenge send Basic HTTP authentication information
without first waiting for the server's
challenge
HTTPS (SSL/TLS) options:
--secure-protocol=PR choose secure protocol, one of auto, SSLv2,
SSLv3, TLSv1, TLSv1_1, TLSv1_2 and PFS
--https-only only follow secure HTTPS links
--no-check-certificate don't validate the server's certificate
--certificate=FILE client certificate file
--certificate-type=TYPE client certificate type, PEM or DER
--private-key=FILE private key file
--private-key-type=TYPE private key type, PEM or DER
--ca-certificate=FILE file with the bundle of CAs
--ca-directory=DIR directory where hash list of CAs is stored
--crl-file=FILE file with bundle of CRLs
--pinnedpubkey=FILE/HASHES Public key (PEM/DER) file, or any number
of base64 encoded sha256 hashes preceded by
'sha256//' and separated by ';', to verify
peer against
--random-file=FILE file with random data for seeding the SSL PRNG
--ciphers=STR Set the priority string (GnuTLS) or cipher list string (OpenSSL) directly.
Use with care. This option overrides --secure-protocol.
The format and syntax of this string depend on the specific SSL/TLS engine.
HSTS options:
--no-hsts disable HSTS
--hsts-file path of HSTS database (will override default)
FTP options:
--ftp-user=USER set ftp user to USER
--ftp-password=PASS set ftp password to PASS
--no-remove-listing don't remove '.listing' files
--no-glob turn off FTP file name globbing
--no-passive-ftp disable the "passive" transfer mode
--preserve-permissions preserve remote file permissions
--retr-symlinks when recursing, get linked-to files (not dir)
FTPS options:
--ftps-implicit use implicit FTPS (default port is 990)
--ftps-resume-ssl resume the SSL/TLS session started in the control connection when
opening a data connection
--ftps-clear-data-connection cipher the control channel only; all the data will be in plaintext
--ftps-fallback-to-ftp fall back to FTP if FTPS is not supported in the target server
WARC options:
--warc-file=FILENAME save request/response data to a .warc.gz file
--warc-header=STRING insert STRING into the warcinfo record
--warc-max-size=NUMBER set maximum size of WARC files to NUMBER
--warc-cdx write CDX index files
--warc-dedup=FILENAME do not store records listed in this CDX file
--no-warc-compression do not compress WARC files with GZIP
--no-warc-digests do not calculate SHA1 digests
--no-warc-keep-log do not store the log file in a WARC record
--warc-tempdir=DIRECTORY location for temporary files created by the
WARC writer
Recursive download:
-r, --recursive specify recursive download
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)
--delete-after delete files locally after downloading them
-k, --convert-links make links in downloaded HTML or CSS point to
local files
--convert-file-only convert the file part of the URLs only (usually known as the basename)
--backups=N before writing file X, rotate up to N backup files
-K, --backup-converted before converting file X, back up as X.orig
-m, --mirror shortcut for -N -r -l inf --no-remove-listing
-p, --page-requisites get all images, etc. needed to display HTML page
--strict-comments turn on strict (SGML) handling of HTML comments
Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions
-R, --reject=LIST comma-separated list of rejected extensions
--accept-regex=REGEX regex matching accepted URLs
--reject-regex=REGEX regex matching rejected URLs
--regex-type=TYPE regex type (posix|pcre)
-D, --domains=LIST comma-separated list of accepted domains
--exclude-domains=LIST comma-separated list of rejected domains
--follow-ftp follow FTP links from HTML documents
--follow-tags=LIST comma-separated list of followed HTML tags
--ignore-tags=LIST comma-separated list of ignored HTML tags
-H, --span-hosts go to foreign hosts when recursive
-L, --relative follow relative links only
-I, --include-directories=LIST list of allowed directories
--trust-server-names use the name specified by the redirection
URL's last component
-X, --exclude-directories=LIST list of excluded directories
-np, --no-parent don't ascend to the parent directory
Email bug reports, questions, discussions to <bug-wget@gnu.org>
and/or open issues at https://savannah.gnu.org/bugs/?func=additem&group=wget.
%% Cell type:markdown id:graphic-rabbit tags: %% Cell type:markdown id:graphic-rabbit tags:
It is a 2-year-long trace from MetaCentrum, the national grid of the Czech republic. As mentionned in the [original paper releasing the log](https://www.cs.huji.ac.il/~feit/parsched/jsspp15/p5-klusacek.pdf), the platform is **very heterogeneous** and underwent majors changes during the logging period. For the purpose of our study, we perform the following selection. It is a 2-year-long trace from MetaCentrum, the national grid of the Czech republic. As mentionned in the [original paper releasing the log](https://www.cs.huji.ac.il/~feit/parsched/jsspp15/p5-klusacek.pdf), the platform is **very heterogeneous** and underwent majors changes during the logging period. For the purpose of our study, we perform the following selection.
First: First:
- we remove from the workload all the clusters whose nodes have **more than 16 cores** - we remove from the workload all the clusters whose nodes have **more than 16 cores**
- we truncate the workload to keep only 6 month (June to November 2014) where no major change was performed in the infrastructure (no cluster < 16 cores added nor removed, no reconfiguration in the scheduling system) - we truncate the workload to keep only 6 month (June to November 2014) where no major change was performed in the infrastructure (no cluster < 16 cores added nor removed, no reconfiguration in the scheduling system)
Second: Second:
- we remove from the workload the jobs with an **execution time greater than one day** - we remove from the workload the jobs with an **execution time greater than one day**
- we remove from the workload the jobs with a **number of requested cores greater than 16** - we remove from the workload the jobs with a **number of requested cores greater than 16**
To do so, we use a the home-made SWF parser `swf_moulinette.py`: To do so, we use a the home-made SWF parser `swf_moulinette.py`:
%% Cell type:code id:ff40dcdd tags: %% Cell type:code id:ff40dcdd tags:
``` python ``` python
# First selection # First selection
# Create a swf with only the selected clusters and the 6 selected months # Create a swf with only the selected clusters and the 6 selected months
from time import * from time import *
begin_trace = 1356994806 # according to original SWF header begin_trace = 1356994806 # according to original SWF header
jun1_unix_time, nov30_unix_time = mktime(strptime('Sun Jun 1 00:00:00 2014')), mktime(strptime('Sun Nov 30 23:59:59 2014')) jun1_unix_time, nov30_unix_time = mktime(strptime('Sun Jun 1 00:00:00 2014')), mktime(strptime('Sun Nov 30 23:59:59 2014'))
jun1, nov30 = (int) (jun1_unix_time - begin_trace), (int) (nov30_unix_time - begin_trace) jun1, nov30 = (int) (jun1_unix_time - begin_trace), (int) (nov30_unix_time - begin_trace)
print("Unix Time Jun 1st 2014: {:.0f}".format( jun1_unix_time )) print("Unix Time Jun 1st 2014: {:.0f}".format( jun1_unix_time ))
print("Unix Time Nov 30th 2014: {:.0f}".format( nov30_unix_time )) print("Unix Time Nov 30th 2014: {:.0f}".format( nov30_unix_time ))
print("We should keep all the jobs submitted between {:d} and {:d}".format(jun1, nov30)) print("We should keep all the jobs submitted between {:d} and {:d}".format(jun1, nov30))
! ./scripts/swf_moulinette.py workload/METACENTRUM-2013-3.swf \ ! ./scripts/swf_moulinette.py workload/METACENTRUM-2013-3.swf \
-o workload/METACENTRUM_6months.swf \ -o workload/METACENTRUM_6months.swf \
--keep_only="submit_time >= {jun1} and submit_time <= {nov30}" \ --keep_only="submit_time >= {jun1} and submit_time <= {nov30}" \
--partitions_to_select 1 2 3 5 7 8 9 10 11 12 14 15 18 19 20 21 22 23 25 26 31 --partitions_to_select 1 2 3 5 7 8 9 10 11 12 14 15 18 19 20 21 22 23 25 26 31
``` ```
%% Output %% Output
Unix Time Jun 1st 2014: 1401573600 Unix Time Jun 1st 2014: 1401573600
Unix Time Nov 30th 2014: 1417388399 Unix Time Nov 30th 2014: 1417388399
We should keep all the jobs submitted between 44578794 and 60393593 We should keep all the jobs submitted between 44578794 and 60393593
Processing swf line 100000 Processing swf line 100000
Processing swf line 200000 Processing swf line 200000
Processing swf line 300000 Processing swf line 300000
Processing swf line 400000 Processing swf line 400000
Processing swf line 500000 Processing swf line 500000
Processing swf line 600000 Processing swf line 600000
Processing swf line 700000 Processing swf line 700000
Processing swf line 800000 Processing swf line 800000
Processing swf line 900000 Processing swf line 900000
Processing swf line 1000000 Processing swf line 1000000
Processing swf line 1100000 Processing swf line 1100000
Processing swf line 1200000 Processing swf line 1200000
Processing swf line 1300000 Processing swf line 1300000
Processing swf line 1400000 Processing swf line 1400000
Processing swf line 1500000 Processing swf line 1500000
Processing swf line 1600000 Processing swf line 1600000
Processing swf line 1700000 Processing swf line 1700000
Processing swf line 1800000 Processing swf line 1800000
Processing swf line 1900000 Processing swf line 1900000
Processing swf line 2000000 Processing swf line 2000000
Processing swf line 2100000 Processing swf line 2100000
Processing swf line 2200000 Processing swf line 2200000
Processing swf line 2300000 Processing swf line 2300000
Processing swf line 2400000 Processing swf line 2400000
Processing swf line 2500000 Processing swf line 2500000
Processing swf line 2600000 Processing swf line 2600000
Processing swf line 2700000 Processing swf line 2700000
Processing swf line 2800000 Processing swf line 2800000
Processing swf line 2900000 Processing swf line 2900000
Processing swf line 3000000 Processing swf line 3000000
Processing swf line 3100000 Processing swf line 3100000
Processing swf line 3200000 Processing swf line 3200000
Processing swf line 3300000 Processing swf line 3300000
Processing swf line 3400000 Processing swf line 3400000
Processing swf line 3500000 Processing swf line 3500000
Processing swf line 3600000 Processing swf line 3600000
Processing swf line 3700000 Processing swf line 3700000
Processing swf line 3800000 Processing swf line 3800000
Processing swf line 3900000 Processing swf line 3900000
Processing swf line 4000000 Processing swf line 4000000
Processing swf line 4100000 Processing swf line 4100000
Processing swf line 4200000 Processing swf line 4200000
Processing swf line 4300000 Processing swf line 4300000
Processing swf line 4400000 Processing swf line 4400000
Processing swf line 4500000 Processing swf line 4500000
Processing swf line 4600000 Processing swf line 4600000
Processing swf line 4700000 Processing swf line 4700000
Processing swf line 4800000 Processing swf line 4800000
Processing swf line 4900000 Processing swf line 4900000
Processing swf line 5000000 Processing swf line 5000000
Processing swf line 5100000 Processing swf line 5100000
Processing swf line 5200000 Processing swf line 5200000
Processing swf line 5300000 Processing swf line 5300000
Processing swf line 5400000 Processing swf line 5400000
Processing swf line 5500000 Processing swf line 5500000
Processing swf line 5600000 Processing swf line 5600000
Processing swf line 5700000 Processing swf line 5700000
------------------- -------------------
End parsing End parsing
Total 1649029 jobs and 556 users have been created. Total 1649029 jobs and 556 users have been created.
Total number of core-hours: 18222722 Total number of core-hours: 18222722
4075060 valid jobs were not selected (keep_only) for 75784902 core-hour 4075060 valid jobs were not selected (keep_only) for 75784902 core-hour
Jobs not selected: 71.2% in number, 80.6% in core-hour Jobs not selected: 71.2% in number, 80.6% in core-hour
7119 out of 5731209 lines in the file did not match the swf format 7119 out of 5731209 lines in the file did not match the swf format
30 jobs were not valid 30 jobs were not valid
%% Cell type:code id:6ec15ee8 tags: %% Cell type:code id:6ec15ee8 tags:
``` python ``` python
# Second selection # Second selection
# Keep only the selected jobs # Keep only the selected jobs
! ./scripts/swf_moulinette.py workload/METACENTRUM_6months.swf \ ! ./scripts/swf_moulinette.py workload/METACENTRUM_6months.swf \
-o workload/MC_selection_article.swf \ -o workload/MC_selection_article.swf \
--keep_only="nb_res <= 16 and run_time <= 24*3600" --keep_only="nb_res <= 16 and run_time <= 24*3600"
``` ```
%% Output %% Output
Processing swf line 100000 Processing swf line 100000
Processing swf line 200000 Processing swf line 200000
Processing swf line 300000 Processing swf line 300000
Processing swf line 400000 Processing swf line 400000
Processing swf line 500000 Processing swf line 500000
Processing swf line 600000 Processing swf line 600000
Processing swf line 700000 Processing swf line 700000
Processing swf line 800000 Processing swf line 800000
Processing swf line 900000 Processing swf line 900000
Processing swf line 1000000 Processing swf line 1000000
Processing swf line 1100000 Processing swf line 1100000
Processing swf line 1200000 Processing swf line 1200000
Processing swf line 1300000 Processing swf line 1300000
Processing swf line 1400000 Processing swf line 1400000
Processing swf line 1500000 Processing swf line 1500000
Processing swf line 1600000 Processing swf line 1600000
------------------- -------------------
End parsing End parsing
Total 1604201 jobs and 546 users have been created. Total 1604201 jobs and 546 users have been created.
Total number of core-hours: 4785357 Total number of core-hours: 4785357
44828 valid jobs were not selected (keep_only) for 13437365 core-hour 44828 valid jobs were not selected (keep_only) for 13437365 core-hour
Jobs not selected: 2.7% in number, 73.7% in core-hour Jobs not selected: 2.7% in number, 73.7% in core-hour
0 out of 1649030 lines in the file did not match the swf format 0 out of 1649030 lines in the file did not match the swf format
1 jobs were not valid 1 jobs were not valid
%% Cell type:markdown id:afde35e8 tags: %% Cell type:markdown id:afde35e8 tags:
## Platform ## Platform
According to the system specifications given in the [corresponding page in Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html): from June 1st 2014 to Nov 30th 2014 there is no change in the platform for the clusters considered in our study (<16 cores). There is a total of **6304 cores**.(1) According to the system specifications given in the [corresponding page in Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html): from June 1st 2014 to Nov 30th 2014 there is no change in the platform for the clusters considered in our study (<16 cores). There is a total of **6304 cores**.(1)
We build a platform file adapted to the remaining workload. We see above that the second selection cuts 73.7\% of core-hours from the original workload. We choose to make an homogeneous cluster with 16-core nodes. To have a coherent number of nodes, we count: We build a platform file adapted to the remaining workload. We see above that the second selection cuts 73.7\% of core-hours from the original workload. We choose to make an homogeneous cluster with 16-core nodes. To have a coherent number of nodes, we count:
$\#nodes = \frac{\#cores_{total} * \%kept_{core.hour}}{\#corePerNode} = 6304 * .263 / 16 = 104$ $\#nodes = \frac{\#cores_{total} * \%kept_{core.hour}}{\#corePerNode} = 6304 * .263 / 16 = 104$
In SimGrid platform language, this corresponds to such a cluster: In SimGrid platform language, this corresponds to such a cluster:
```xml ```xml
<cluster id="cluster_MC" prefix="MC_" suffix="" radical="0-103" core="16"> <cluster id="cluster_MC" prefix="MC_" suffix="" radical="0-103" core="16">
``` ```
The corresponding SimGrid platform file can be found in `platform/average_metacentrum.xml`. The corresponding SimGrid platform file can be found in `platform/average_metacentrum.xml`.
(1) clusters decomissionned before or comissionned after the 6-month period have been removed: $8+480+160+1792+256+576+88+416+108+168+752+112+588+48+152+160+192+24+224 = 6304$ (1) clusters decomissionned before or comissionned after the 6-month period have been removed: $8+480+160+1792+256+576+88+416+108+168+752+112+588+48+152+160+192+24+224 = 6304$
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment