Skip to content
Snippets Groups Projects
Commit 4c90c165 authored by Millian Poquet's avatar Millian Poquet
Browse files

write simulation output analysis into a notebook

parent 2bbb83c8
No related branches found
No related tags found
No related merge requests found
......@@ -127,8 +127,16 @@
buildInputs = [
pkgs.R
pkgs.rPackages.tidyverse
pkgs.rPackages.multidplyr
pkgs.rPackages.gaussplotR
pkgs.rPackages.viridis
];
};
r-notebook = pkgs.mkShell {
buildInputs = [
pkgs.pandoc
pkgs.R
pkgs.rPackages.rmarkdown
pkgs.rPackages.knitr
pkgs.rPackages.tidyverse
pkgs.rPackages.viridis
];
};
......
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
if (length(args) == 1) {
library(rmarkdown)
rmarkdown::render(args[1])
} else {
write("usage: run-rmarkdown-notebook <NOTEBOOK-FILE>", stderr())
invokeRestart("abort")
}
Source diff could not be displayed: it is too large. Options to address this: view the blob.
---
title: "Simulation output analysis"
author: "Millian Poquet"
date: "2024-05-05"
params:
simulation_aggregated_output: "simulation-aggregated-output.csv"
output:
html_document:
toc: true
theme: united
---
# Introduction
This notebook analyzes the results of the resource management experimental campaign of the article "Light-weight prediction for improving energy consumption in HPC platforms" (sections 6.4 and 6.5) published at Euro-Par 2024. For full context of this experiment please refer to the article preprint, which is available on [[hal long-term open-access link]](https://hal.science/hal-04566184).
**Résumé of the experiment.** 30 different 1-day workloads have been extracted at random points in time from [the Marconi 100 trace](https://gitlab.com/ecs-lab/exadata). Each workload has been replayed in simulation thanks to the Batsim simulator [[thesis link]](https://hal.science/tel-01757245v2) [[long-term software heritage code permalink]](https://archive.softwareheritage.org/swh:1:rev:ee797ccebbb95410479663ee0547e752112fc83e;origin=https://framagit.org/batsim/batsim.git;visit=swh:1:snp:ec2c0ac3c1fb85d35cc35049f314234cc34124cd). A constraint is set on the power that the whole platform can use during the first 3 hours of the simulation. This is implemented on our EASY backfilling implementation that supports a powercap [[gitlab code link]](https://gitlab.irit.fr/sepia-pub/batsim/easy-powercap) [[software heritage long-term code permalink]](https://archive.softwareheritage.org/swh:1:rev:659660c35650e9f46ec47e8c0743d75649e68d7b;origin=https://framagit.org/batsim/easy-powercap.git;visit=swh:1:snp:03d429f7d797ff3acbea78c29236f22f04984c95), which uses a prediction of the jobs' power consumption to take its decisions.
The goal of this notebook is to determine the impact of the job power predictor on the schedules resulting from this scheduling algorithm execution. The notebook takes an aggregation of all the simulation executions as input. The notebook outputs image files that are Figures 4 and 5 of the article, and also provides additional analyses (images + short text analysis) that could not fit in the article page limit.
## Power predictor naming difference w.r.t. article
- `upper_bound` is the predictor named `naive` in the article. It assumes that all the nodes allocated to the job are at full power during the whole job execution. This is an upper bound on the job power consumption that can be used safely from the scheduler point of view.
- `real_mean` is the predictor that uses the real mean power of each job (perfect oracle, unfeasible in practice but shows the best we would get with a perfect predictor)
- `real_max` is the predictor that uses the real maximum power of each job (perfect oracle, unfeasible in practice but shows the best we would get with a perfect predictor)
- `mean` is the history-based (light-weight) mean job power predictor described in section 4.1 of the article
- `max` is the history-based (light-weight) maximum job power predictor described in section 4.1 of the article
- `zero` assumes that all jobs consume 0 W. This is strictly equivalent to EASY backfilling without powercap support, and is used as baseline for scheduling metrics.
## Code to read and prepare data
```{r, echo = TRUE}
library(tidyverse)
library(viridis)
library(knitr)
# data extracted from the analysis of the M100 real trace from 2022-01 to 2022-09
nb_nodes = 980
max_observed_total_power = 955080
max_power_per_node = 2100.0
min_power_per_node = 240.0
max_dynamic_power = max_observed_total_power - min_power_per_node * nb_nodes
# data from the simulation campaign definition
constrained_time_window_duration_seconds = 60*60*3 # 3 hours
# read input data, fix types, reorder predictor and split predictors in categories
data = read_csv(params$simulation_aggregated_output, show_col_types = FALSE) %>% mutate(
start_dt_s = as.factor(start_dt_s),
job_power_estimation_field = as.factor(job_power_estimation_field)
)
data$predictor_name = factor(data$predictor_name,
levels=c('upper_bound', 'max', 'real_max', 'real_mean', 'mean', 'zero'))
data = data %>% mutate(
predictor_metrics = ifelse(predictor_name %in% c('real_max', 'max'), 'max',
ifelse(predictor_name %in% c('real_mean', 'mean'), 'mean',
'naive'
)),
predictor_method = ifelse(predictor_name %in% c('mean', 'max'), 'predicted', 'real')
)
data$predictor_metrics = factor(data$predictor_metrics, levels=c('naive', 'max', 'mean'))
data$predictor_method = factor(data$predictor_method, levels=c('predicted', 'real'))
# compute scheduling metrics against their matching EASY baseline
data_nz = data %>% filter(predictor_name != 'zero')
data_z = data %>% filter(predictor_name == 'zero' &
powercap_dynamic_value_ratio == max(data$powercap_dynamic_value_ratio))
data_z_joinable = data_z %>% transmute(
start_dt_s = start_dt_s,
zero_mean_utilization = mean_utilization,
zero_max_utilization = max_utilization,
zero_mean_turnaround_time = mean_turnaround_time,
)
data_nz = inner_join(data_nz, data_z_joinable, by='start_dt_s') %>% mutate(
mean_turnaround_time_minus_zero = mean_turnaround_time - zero_mean_turnaround_time,
) %>% mutate(
mean_turnaround_time_increase_ratio = mean_turnaround_time_minus_zero / zero_mean_turnaround_time
)
```
# Consistency checks
This section inspects the simulation data to make sure the values are consistent with our expectations on the algorithm.
## During the constrained time window, is the utilization proportional to the powercap value for each (predictor, workload)? **it should**
```{r, dev="jpeg", fig.width=10, fig.height=16}
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_utilization / nb_nodes, color=predictor_name)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_abline(slope=1) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0.1,0.7,0.2), labels = scales::percent) +
scale_y_continuous(breaks=seq(0,1,0.2), labels = scales::percent) +
scale_color_viridis(discrete=TRUE) +
expand_limits(x=0) +
facet_wrap(vars(start_dt_s)) +
labs(
y="Utilization (proportion of nodes)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
```
**Conclusion**: Yes, almost perfectly proportional for all (workload, predictor) before the utilization becomes saturated.
On 5/30 workloads the `max` predictor slightly jumps from one linear trend to another. This is consistent with the first-fit policy of the scheduling algorithm, here we think that EASY becomes able to execute a job whose power consumption is over estimated by `max`, and that EASY then cannot backfill smaller jobs since it thinks that there is not enough available power.
## During the constrained time window, is the utilization roughly proportional to the powercap value for each predictor regardless of the workload?
```{r, dev="jpeg", fig.width=10}
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_utilization / nb_nodes, color=predictor_name)) +
geom_jitter(width=1/100, height=0) +
geom_smooth(method = "lm", se = FALSE) +
geom_abline(slope=1) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0,0.7,0.1), labels = scales::percent) +
scale_y_continuous(breaks=seq(0,1,0.2), labels = scales::percent) +
scale_color_viridis(discrete=TRUE) +
expand_limits(x=0) +
labs(
y="Utilization (proportion of nodes)",
x="Powercap value (proportion of the maximum dynamic power range). Shown with horizontal jitter."
)
```
**Conclusion**: Yes this is roughly proportional. For some workloads the utilization is saturated while using the `real_mean`/`mean` predictor for powercap > 55 %.
# Per-workload analysis
This section analyzes how the algorithm behaves on each workload.
We believe that this is the most important analysis section of this notebook, as scheduling results must be looked at for each workload to make sense.
## During the constrained time window, how much power is consumed on average for each (predictor, workload)?
```{r, dev="jpeg", fig.width=10, fig.height=16}
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_power/max_dynamic_power, color=predictor_name)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_abline(slope=1) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0.1,0.7,0.2), labels = scales::percent) +
scale_y_continuous(breaks=seq(0,1,0.2), labels = scales::percent) +
scale_color_viridis(discrete=TRUE) +
expand_limits(x=0) +
facet_wrap(vars(start_dt_s)) +
labs(
y="Mean power (proportion of maximum observed M100 power)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
```
**Conclusions**: The platform dynamic mean power consumption is linear to the powercap value (unless the platform is already saturated and increasing the powercap has small effect), which is expected. Furthermore we can clearly see that:
- This plot has the same shape as the corresponding utilization plot, which is expected.
The main difference is that the maximum mean power consumption is around 70 % while the utilization goes up to 100 %.
- The `real_mean` > `real_max` > `upper_bound` predictor order in terms of mean power consumption holds for all workloads.
- The `mean` > `max` > `upper_bound` predictor order in terms of mean power consumption holds for all workloads.
- Using the `mean` history-based predictor instead of the real value `real_mean` (which cannot be used in practice as it is unknown at decision taking time, but which represents a perfect oracle estimator without error) has almost no impact on the power used during the constrained time window.
- Using the `max` history-based predictor instead of the real value `real_max` (which cannot be used in practice as it is unknown at decision taking time, but which represents a perfect oracle estimator without error) decreases the mean power consumption. The decrease is very small on some workloads (*e.g.*, almost no impact on workload 19389030), but quite strong on other workloads (*e.g.*, on workload 10061708 while using powercap=70%, the mean power consumption moves from ~50 % with `real_max` to ~30 % with `max`).
## How is the scheduling performance (as measured by mean turnaround time) impacted by each predictor, for all workloads?
```{r, dev="jpeg", fig.width=10, fig.height=16}
data_nz %>% ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_turnaround_time_minus_zero, color=predictor_name)) +
geom_point() +
#geom_smooth(method = "lm", se = FALSE) +
geom_hline(yintercept=0) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0,0.7,0.2), labels = scales::percent) +
scale_y_continuous() +
scale_color_viridis(discrete=TRUE) +
facet_wrap(vars(start_dt_s)) +
expand_limits(x=0) +
labs(
y="Mean turnaround time difference against EASY without any powercap for each simulation (seconds)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
```
**Problem**: The scheduling performance is expected to be **degraded** on most instances as 1. this algorithm does not directly change the job ordering and 2. the lower the powercap, the lower the utilization. This means the mean turnaround time difference should be positive. **Workload 18474670 is clearly an outlier here**, as the workload scheduling performance has been **greatly improved** by the powercap for most instances. We think that in most instances of this workload the powercap prevented big jobs (in number of requested resources and area) to be executed, which enabled a lot of small jobs to be executed, which improved the mean turnaround time metrics.
Here is a look at the data without the outlier workload.
```{r, dev="jpeg", fig.width=10, fig.height=16}
outlier_workload_start_dt_s = 18474670 # sched metrics are strongly better than EASY on it
data_nz %>%
filter(start_dt_s != outlier_workload_start_dt_s) %>%
ggplot(aes(x=powercap_dynamic_value_ratio, y=mean_turnaround_time_minus_zero, color=predictor_name)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
geom_hline(yintercept=0) +
theme_bw() +
theme(legend.position='top', legend.title=element_blank()) +
guides(color = guide_legend(nrow = 1)) +
scale_x_continuous(breaks=seq(0,0.7,0.2), labels = scales::percent) +
scale_y_continuous() +
scale_color_viridis(discrete=TRUE) +
facet_wrap(vars(start_dt_s)) +
expand_limits(x=0) +
labs(
y="Mean turnaround time difference against EASY without any powercap for each simulation (seconds)",
x="Powercap value (proportion of the maximum dynamic power range)"
)
```
**Conclusions**: The scheduling performance degradation is clearly linear to the powercap value on some workloads (*e.g.*, 3079185 and 7934521), has a linear trend but with noise on most workloads (*e.g.*, 17539280), and is not linear on some workloads (*e.g.*,19389030 for predictors that are not `upper_bound`). Additionnally:
- The `mean` > `max` > `upper_bound` predictor order in terms of scheduling performance holds for most workloads.
- The `real_mean` > `real_max` > `upper_bound` predictor order in terms of scheduling performance holds for most workloads.
- The mean turnaround time difference metrics spans on the same range of values for most workloads, which means **this metrics can be aggregated over all workloads** without a per-workload normalization step.
- Similarly to the mean power consumption during the constrained time window metrics, using `mean` instead of `real_mean` seems to have very little impact on the mean turnaround time metrics on most workloads.
- Using `max` instead of `real_max` has a small impact (small performance degradation) on the mean turnaround time metrics on most workloads.
# Analysis aggregating all workloads together
While we think that per-workload analysis is the most relevant, it obviously cannot fit in the 1.5-page window dedicated to the analysis in the article, as per-workload view of the data takes a lot of place.
This section aggregates the result seen previously in smaller figures that can fit in the paper, and does additional analysis on the whole dataset.
## During the constrained time window, how far is the mean power compared to the powercap value for each predictor?
```{r, fig.width=10, fig.height=6}
data_nz %>%
#filter(powercap_dynamic_value_ratio %in% powercap_ratios_values_to_show) %>%
mutate(powercap_label = sprintf("pcap=%g", powercap_dynamic_value_ratio)) %>%
ggplot() +
geom_hline(aes(yintercept=powercap_dynamic_value_ratio)) +
geom_boxplot(aes(y=mean_power/max_dynamic_power, fill=predictor_method, x=predictor_metrics)) +
theme_bw() +
theme(
legend.position=c(0.2, 0.9),
legend.direction='horizontal',
legend.title=element_blank(),
legend.background=element_rect(color='black'),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
) +
expand_limits(x=0) +
scale_y_continuous(breaks=seq(0,0.7,0.1)) +
facet_wrap(vars(powercap_label), nrow=1) +
labs(
y="Mean platform power consumption",
x="Job power estimator"
) +
scale_fill_grey(start=0.8, end=1)
```
**Description**. This figure is very similar to the per-workload mean power consumption plot done in the previous section, but the mean power values are aggregated per workload and the predictor naming now uses the predicted metrics (x axis) and whether it is the real value or the predicted one (fill color). Standard [ggplot boxplots](https://ggplot2.tidyverse.org/reference/geom_boxplot.html) are used, which show the first (25 %) second (median, 50 %) and third (75 %) quartiles, and outlier values are shown as points if they are further away than 1.5 * the distance between the first and third quartile.
The final version seen in the article (Figure 4) is very similar, but for the sake of font readibility only half of the powercap values are shown.
```{r, fig.width=8, fig.height=4}
powercap_ratios_values_to_show = seq(0.1, 0.7, 0.1)
scale=0.9
width_scale=0.3
data_nz %>%
filter(powercap_dynamic_value_ratio %in% powercap_ratios_values_to_show) %>%
mutate(powercap_label = sprintf("pcap=%g", powercap_dynamic_value_ratio)) %>%
ggplot() +
geom_hline(aes(yintercept=powercap_dynamic_value_ratio), linewidth=width_scale) +
geom_boxplot(aes(y=mean_power/max_dynamic_power, fill=predictor_method, x=predictor_metrics), linewidth=width_scale, outlier.size=width_scale) +
theme_bw() +
theme(
legend.position=c(0.2, 0.9),
legend.direction='horizontal',
legend.title=element_blank(),
legend.background=element_rect(color='black'),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
) +
expand_limits(x=0) +
scale_y_continuous(breaks=seq(0,0.7,0.1)) +
facet_wrap(vars(powercap_label), nrow=1) +
labs(
y="Mean platform power consumption",
x="Job power estimator"
) +
scale_fill_grey(start=0.8, end=1)
ggsave("./fig4-sched-mean-power-distribution.pdf", width=8*scale, height=4*scale)
```
Here is the code that produces the summarized power underutilization values seen in Section 6.5 of the article.
The power difference to the powercap is normalized by the powercap value for each instance, such that the aggregation of the values makes sense (otherwise big powercap values would dominate the aggregation).
The **average** value has been used in the article.
```{r}
t = data_nz %>%
mutate(power_underutilization_ratio = (powercap_dynamic_watts - mean_power)/powercap_dynamic_watts) %>%
group_by(predictor_name) %>%
summarize(
average_power_underutilization_ratio = mean(power_underutilization_ratio),
median_power_underutilization_ratio = median(power_underutilization_ratio),
)
knitr::kable(t)
```
## How is the scheduling performance degraded by each predictor?
Very similarly to the previous plot, here is how Figure 5 of the article is produced.
```{r, fig.width=8, fig.height=4}
data_nz %>%
filter(start_dt_s != outlier_workload_start_dt_s) %>%
filter(powercap_dynamic_value_ratio %in% powercap_ratios_values_to_show) %>%
mutate(powercap_label = sprintf("pcap=%g", powercap_dynamic_value_ratio)) %>%
ggplot() +
geom_boxplot(aes(y=mean_turnaround_time_minus_zero, fill=predictor_method, x=predictor_metrics), linewidth=width_scale, outlier.size=width_scale) +
theme_bw() +
theme(
legend.position=c(0.16, 0.12),
legend.direction='horizontal',
legend.background=element_rect(color='black'),
legend.title=element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
) +
facet_wrap(vars(powercap_label), nrow=1) +
labs(
y="Mean turnaround time increase (s)",
x="Job power estimator"
) +
scale_fill_grey(start=0.8, end=1)
ggsave("./fig5-sched-mtt-diff-distribution.pdf", width=8*scale, height=4*scale)
```
Here is the code that produces the summarized scheduling performance degradation values seen in Section 6.5 of the article.
The scheduling performance has already been normalized by the performance of EASY without powercap for each instance.
The `average_mtt_increase_ratio` value (**average** of normalized mean turnaround time difference) has been used in the article.
```{r}
t = data_nz %>%
filter(start_dt_s != outlier_workload_start_dt_s) %>%
group_by(predictor_name) %>%
summarize(
average_mtt_increase = mean(mean_turnaround_time_minus_zero),
average_mtt_increase_ratio = mean(mean_turnaround_time_increase_ratio),
median_mtt_increase = median(mean_turnaround_time_minus_zero),
)
knitr::kable(t)
```
## How much energy is consumed during the time window compared to the energy that should be used by being at the powercap value for the whole window duration?
```{r, fig.height = 4}
data_nz %>% ggplot() +
geom_hline(yintercept=0) +
geom_violin(aes(x=predictor_name, y=energy_from_powercap / 1e9)) +
geom_jitter(aes(x=predictor_name, y=energy_from_powercap / 1e9), alpha=0.1) +
geom_boxplot(aes(x=predictor_name, y=energy_from_powercap / 1e9), width=0.025, outlier.shape=NA) +
theme_bw() +
labs(
x="Power predictor",
y="Distribution of the energy consumed (GJ)"
)
```
**Conclusions**: Energy values are consistent with the previous power plots. We can see that only the `mean` and `real_mean` used more energy than what the powercap enables on the analyzed workloads. We can see that `mean` frequently leads to more energy being used than what the powercap enables.
## How is the powercap exceeded during the time window for each predictor?
Whether the powercap has been exceeded or not has been computed for each second of each simulation.
```{r, fig.height = 4}
data_nz %>% ggplot() +
geom_hline(yintercept=0) +
geom_violin(aes(x=predictor_name, y=nb_seconds_above_powercap/constrained_time_window_duration_seconds)) +
geom_jitter(aes(x=predictor_name, y=nb_seconds_above_powercap/constrained_time_window_duration_seconds), alpha=0.1) +
geom_boxplot(aes(x=predictor_name, y=nb_seconds_above_powercap/constrained_time_window_duration_seconds), width=0.025, outlier.shape=NA) +
theme_bw() +
labs(
x="Power predictor",
y="Proportion of time above powercap"
) +
scale_y_continuous(labels = scales::percent)
```
**Conclusions**: Only `real_mean` and `mean` exceed the powercap on the analyzed workloads. They both exceed the powercap frequently, but `mean` breaks the powercap more frequently than `real_mean`.
Here is the code that produces the summarized maximum instantaneous powercap break values seen in Section 6.5 of the article.
The maximum instantaneous powercap break (in watts) is used (named `max_power_from_powercap` in the code).
For each simulation, the powercap break (current power minus powercap) has been computed for each second during the time window, and `max_power_from_powercap` is the maximum of all these values.
The value is normalized by the powercap so that the aggregation makes sense (otherwise the difference would be distorted and big powercap values would dominate the result).
The **average** and **median** values have been used in the article.
```{r}
t = data_nz %>%
mutate(powercap_break = pmax(max_power_from_powercap, 0)) %>%
mutate(powercap_break_ratio = powercap_break / powercap_dynamic_watts) %>%
group_by(predictor_name) %>%
summarize(
mean_powercap_break_ratio = mean(powercap_break_ratio),
median_powercap_break_ratio = median(powercap_break_ratio),
)
knitr::kable(t)
```
Similarly, here is the code that computes in how many cases the powercap is exceeded by all predictors.
```{r}
nb_simus = data_nz %>%
group_by(predictor_name) %>%
summarize(
total_count = n()
)
breaks = data_nz %>%
mutate(powercap_break = pmax(max_power_from_powercap, 0)) %>%
filter(powercap_break > 0) %>%
group_by(predictor_name) %>%
summarize(
break_count = n()
)
t = inner_join(nb_simus, breaks, by="predictor_name") %>%
mutate(break_ratio = break_count / total_count)
knitr::kable(t)
```
**Erratum note**: The first submitted version of the article states that `mean` breaks the powercap in 38 % of instances. The computation was wrong, `mean` breaks the powercap in 95 % of instances, and `real_mean` breaks the powercap 94 % of instances.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment