{ "cells": [ { "cell_type": "markdown", "id": "forced-resolution", "metadata": {}, "source": [ "# Downloading and preparing the workload and platform\n", "## Workload\n", "We use the reconverted log `METACENTRUM-2013-3.swf` available on [Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html)." ] }, { "cell_type": "code", "execution_count": 7, "id": "f66eb756", "metadata": {}, "outputs": [], "source": [ "# Download the workload (548.3 MB unzipped)\n", "!wget https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/METACENTRUM-2013-3.swf.gz \\\n", " --no-check-certificate -nc -P workload workload/METACENTRUM-2013-3.swf.gz" ] }, { "cell_type": "code", "execution_count": 6, "id": "bound-harvey", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gzip: workload/METACENTRUM-2013-3.swf already exists; do you wish to overwrite (y or n)? ^C\n" ] } ], "source": [ "# Unzip the workload\n", "!gunzip workload/METACENTRUM-2013-3.swf.gz" ] }, { "cell_type": "markdown", "id": "graphic-rabbit", "metadata": {}, "source": [ "It is a 2-year-long trace from MetaCentrum, the national grid of the Czech republic. As mentionned in the [original paper releasing the log](https://www.cs.huji.ac.il/~feit/parsched/jsspp15/p5-klusacek.pdf), the platform is **very heterogeneous** and underwent majors changes during the logging period. For the purpose of our study, we perform the following selection.\n", "\n", "First:\n", "- we remove from the workload all the clusters whose nodes have **more than 16 cores**\n", "- we truncate the workload to keep only 6 month (June to November 2014) where no major change was performed in the infrastructure (no cluster < 16 cores added nor removed, no reconfiguration in the scheduling system)\n", "\n", "Second:\n", "- we remove from the workload the jobs with an **execution time greater than one day**\n", "- we remove from the workload the jobs with a **number of requested cores greater than 16**\n", "\n", "To do so, we use a the home-made SWF parser `swf_moulinette.py`:" ] }, { "cell_type": "code", "execution_count": 3, "id": "ff40dcdd", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Unix Time Jun 1st 2014: 1401573600\n", "Unix Time Nov 30th 2014: 1417388399\n", "We should keep all the jobs submitted between 44578794 and 60393593\n", "Processing swf line 100000\n", "Processing swf line 200000\n", "Processing swf line 300000\n", "Processing swf line 400000\n", "Processing swf line 500000\n", "Processing swf line 600000\n", "Processing swf line 700000\n", "Processing swf line 800000\n", "Processing swf line 900000\n", "Processing swf line 1000000\n", "Processing swf line 1100000\n", "Processing swf line 1200000\n", "Processing swf line 1300000\n", "Processing swf line 1400000\n", "Processing swf line 1500000\n", "Processing swf line 1600000\n", "Processing swf line 1700000\n", "Processing swf line 1800000\n", "Processing swf line 1900000\n", "Processing swf line 2000000\n", "Processing swf line 2100000\n", "Processing swf line 2200000\n", "Processing swf line 2300000\n", "Processing swf line 2400000\n", "Processing swf line 2500000\n", "Processing swf line 2600000\n", "Processing swf line 2700000\n", "Processing swf line 2800000\n", "Processing swf line 2900000\n", "Processing swf line 3000000\n", "Processing swf line 3100000\n", "Processing swf line 3200000\n", "Processing swf line 3300000\n", "Processing swf line 3400000\n", "Processing swf line 3500000\n", "Processing swf line 3600000\n", "Processing swf line 3700000\n", "Processing swf line 3800000\n", "Processing swf line 3900000\n", "Processing swf line 4000000\n", "Processing swf line 4100000\n", "Processing swf line 4200000\n", "Processing swf line 4300000\n", "Processing swf line 4400000\n", "Processing swf line 4500000\n", "Processing swf line 4600000\n", "Processing swf line 4700000\n", "Processing swf line 4800000\n", "Processing swf line 4900000\n", "Processing swf line 5000000\n", "Processing swf line 5100000\n", "Processing swf line 5200000\n", "Processing swf line 5300000\n", "Processing swf line 5400000\n", "Processing swf line 5500000\n", "Processing swf line 5600000\n", "Processing swf line 5700000\n", "-------------------\n", "End parsing\n", "Total 1649029 jobs and 556 users have been created.\n", "Total number of core-hours: 18222722\n", "4075060 valid jobs were not selected (keep_only) for 75784902 core-hour\n", "Jobs not selected: 71.2% in number, 80.6% in core-hour\n", "7119 out of 5731209 lines in the file did not match the swf format\n", "30 jobs were not valid\n" ] } ], "source": [ "# First selection\n", "# Create a swf with only the selected clusters and the 6 selected months \n", "from time import *\n", "begin_trace = 1356994806 # according to original SWF header\n", "jun1_unix_time, nov30_unix_time = mktime(strptime('Sun Jun 1 00:00:00 2014')), mktime(strptime('Sun Nov 30 23:59:59 2014'))\n", "jun1, nov30 = (int) (jun1_unix_time - begin_trace), (int) (nov30_unix_time - begin_trace)\n", "print(\"Unix Time Jun 1st 2014: {:.0f}\".format( jun1_unix_time ))\n", "print(\"Unix Time Nov 30th 2014: {:.0f}\".format( nov30_unix_time ))\n", "print(\"We should keep all the jobs submitted between {:d} and {:d}\".format(jun1, nov30))\n", "\n", "! ./scripts/swf_moulinette.py workload/METACENTRUM-2013-3.swf \\\n", " -o workload/METACENTRUM_6months.swf \\\n", " --keep_only=\"submit_time >= {jun1} and submit_time <= {nov30}\" \\\n", " --partitions_to_select 1 2 3 5 7 8 9 10 11 12 14 15 18 19 20 21 22 23 25 26 31" ] }, { "cell_type": "code", "execution_count": 4, "id": "6ec15ee8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing swf line 100000\n", "Processing swf line 200000\n", "Processing swf line 300000\n", "Processing swf line 400000\n", "Processing swf line 500000\n", "Processing swf line 600000\n", "Processing swf line 700000\n", "Processing swf line 800000\n", "Processing swf line 900000\n", "Processing swf line 1000000\n", "Processing swf line 1100000\n", "Processing swf line 1200000\n", "Processing swf line 1300000\n", "Processing swf line 1400000\n", "Processing swf line 1500000\n", "Processing swf line 1600000\n", "-------------------\n", "End parsing\n", "Total 1604201 jobs and 546 users have been created.\n", "Total number of core-hours: 4785357\n", "44828 valid jobs were not selected (keep_only) for 13437365 core-hour\n", "Jobs not selected: 2.7% in number, 73.7% in core-hour\n", "0 out of 1649030 lines in the file did not match the swf format\n", "1 jobs were not valid\n" ] } ], "source": [ "# Second selection\n", "# Keep only the selected jobs\n", "! ./scripts/swf_moulinette.py workload/METACENTRUM_6months.swf \\\n", " -o workload/MC_selection_article.swf \\\n", " --keep_only=\"nb_res <= 16 and run_time <= 24*3600\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "747ba154", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing swf line 100000\n", "Processing swf line 200000\n", "Processing swf line 300000\n", "Processing swf line 400000\n", "Processing swf line 500000\n", "Processing swf line 600000\n", "Processing swf line 700000\n", "Processing swf line 800000\n", "Processing swf line 900000\n", "Processing swf line 1000000\n", "Processing swf line 1100000\n", "Processing swf line 1200000\n", "Processing swf line 1300000\n", "Processing swf line 1400000\n", "Processing swf line 1500000\n", "Processing swf line 1600000\n", "-------------------\n", "End parsing\n", "Total 1358376 jobs and 416 users have been created.\n", "Total number of core-hours: 929847\n", "245825 valid jobs were not selected (keep_only) for 3855510 core-hour\n", "Jobs not selected: 15.3% in number, 80.6% in core-hour\n", "0 out of 1604202 lines in the file did not match the swf format\n", "1 jobs were not valid\n" ] } ], "source": [ "# Check how many jobs are 1-core-jobs\n", "! ./scripts/swf_moulinette.py workload/MC_selection_article.swf \\\n", " --keep_only=\"nb_res == 1\"" ] }, { "cell_type": "code", "execution_count": 71, "id": "38296cb6", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "<Figure size 800x300 with 2 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "from matplotlib.ticker import PercentFormatter\n", "import matplotlib.pyplot as plt\n", "\n", "header = [\"JOB_ID\",\"SUBMIT_TIME\",\"WAIT_TIME\",\"RUN_TIME\",\"ALLOCATED_PROCESSOR_COUNT\",\"AVERAGE_CPU_TIME_USED\",\"USED_MEMORY\",\"REQUESTED_NUMBER_OF_PROCESSORS\",\"REQUESTED_TIME\",\"REQUESTED_MEMORY\",\"STATUS\",\"USER_ID\",\"GROUP_ID\",\"APPLICATION_ID\",\"QUEUD_ID\",\"PARTITION_ID\",\"PRECEDING_JOB_ID\",\"THINK_TIME_FROM_PRECEDING_JOB\"]\n", "df = pd.read_csv(\"workload/MC_selection_article.swf\", delim_whitespace=True, header=None, names=header)\n", "# pd.read\n", "\n", "fix, ax = plt.subplots(1,2, sharey=True, figsize=(8,3), layout=\"tight\")\n", "\n", "procs = df[\"REQUESTED_NUMBER_OF_PROCESSORS\"].value_counts().sort_index()\n", "df[\"coreh\"] = df[\"REQUESTED_NUMBER_OF_PROCESSORS\"] * df[\"RUN_TIME\"]\n", "coreh = df.groupby(by=[\"REQUESTED_NUMBER_OF_PROCESSORS\"]).sum()[\"coreh\"]\n", "coreh_prct = coreh / coreh.sum() \n", "procs_prct = procs / procs.sum()\n", "# display(procs_prct.sort_index())\n", "procs_prct.plot.barh(ax=ax[0], ylabel=\"number of requested cores\", title=\"proportion of jobs\")\n", "coreh_prct.plot.barh(ax=ax[1], title=\"proportion of core-hours\")\n", "\n", "ax[0].grid(); ax[1].grid(); ax[0].set_axisbelow(True); ax[1].set_axisbelow(True)\n", "ax[0].xaxis.set_major_formatter(PercentFormatter(1)); ax[1].xaxis.set_major_formatter(PercentFormatter(1,decimals=0))\n", "\n", "plt.savefig(\"out/stats_filtered_wl.pdf\", bbox_inches='tight', dpi=1000)\n" ] }, { "cell_type": "code", "execution_count": 74, "id": "8d3d2fb8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 1.604201e+06\n", "mean 3.213063e+03\n", "std 9.773328e+03\n", "min 0.000000e+00\n", "25% 6.600000e+01\n", "50% 1.360000e+02\n", "75% 1.123000e+03\n", "max 8.640000e+04\n", "Name: RUN_TIME, dtype: float64" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "<Figure size 640x480 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df[\"RUN_TIME\"].hist(bins=100, density=1)\n", "df[\"RUN_TIME\"].describe()" ] }, { "cell_type": "markdown", "id": "afde35e8", "metadata": {}, "source": [ "## Platform\n", "According to the system specifications given in the [corresponding page in Parallel Workload Archive](https://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum2/index.html): from June 1st 2014 to Nov 30th 2014 there is no change in the platform for the clusters considered in our study (<16 cores). There is a total of **6304 cores**.(1)\n", "\n", "We build a platform file adapted to the remaining workload. We see above that the second selection cuts 73.7\\% of core-hours from the original workload. We choose to make an homogeneous cluster with 16-core nodes. To have a coherent number of nodes, we count:\n", "\n", "$\\#nodes = \\frac{\\#cores_{total} * \\%kept_{core.hour}}{\\#corePerNode} = 6304 * .263 / 16 = 104$\n", "\n", "In SimGrid platform language, this corresponds to such a cluster:\n", "```xml\n", "<cluster id=\"cluster_MC\" prefix=\"MC_\" suffix=\"\" radical=\"0-103\" core=\"16\">\n", "```\n", "\n", "The corresponding SimGrid platform file can be found in `platform/average_metacentrum.xml`.\n", "\n", "(1) clusters decomissionned before or comissionned after the 6-month period have been removed: $8+480+160+1792+256+576+88+416+108+168+752+112+588+48+152+160+192+24+224 = 6304$" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }