Skip to content
Snippets Groups Projects
user avatar
Maël Madon authored
7562790d
History
Name Last commit Last update
expe
out
sched
util
.gitignore
README.md
default.nix

RM4ES Practicals

This repository contains the material for the practical session of course RM4ES given in the Spring semester of 2024.

Session 1 : getting started

Objective for this session:

  • setting up all the simulation environment
  • understanding the workload representation in Batsim
  • having a first scheduling algorithm running

Install

Ensure that you are in your home directory (run cd).

Clone this repository:

git clone https://gitlab.irit.fr/sepia-pub/mael/RM4ES-practicals.git

Install Nix, that will manage for you all the dependencies you need for the course:

# download and install nix-user-chroot, a hack to install nix without root privilege
curl -L  https://github.com/nix-community/nix-user-chroot/releases/download/1.2.2/nix-user-chroot-bin-1.2.2-x86_64-unknown-linux-musl --output nix-user-chroot
# give exec rights
chmod u+x nix-user-chroot
# download and install nix
mkdir -p ~/.nix && ./nix-user-chroot ~/.nix bash -c "curl -L https://nixos.org/nix/install | bash"
# enable cachix
mkdir -p ~/.config/nix && cp ~/RM4ES-practicals/util/nix.conf ~/.config/nix
# add a line to your bashrc
echo 'source ~/.nix-profile/etc/profile.d/nix.sh' >> ~/.bashrc

IMPORTANT: Now, each time you want to test your schedulers, you need to run, from the repository RM4ES-practicals:

# enter the nix-user-chroot shell
~/nix-user-chroot ~/.nix bash
# enter the nix shell defined in file default.nix, with all dependencies managed.
# can take a few minute the first time you run it, to download everything. 
nix-shell -A expe

Introduction to Batsim

The resource and job management simulator Batsim, that we are going to use during these practicals, needs three inputs:

  • a platform: description of the IT infrastructure;
  • a workload: list of jobs that are going to arrive and their characteristics;
  • a scheduling algorithm: the "brain" of the system, taking decision on when and where to execute the jobs.

The platform is represented by an XML file. You have a simple example (monomachine platform) in the file expe/1machine.xml. This is enough for now! We will come back to it in more detail in session 3.

The workload is represented by a JSON file (see expe/2jobs.json):

{
    "jobs": [
        {"id": "1", "profile": "3UT", "res": 1, "walltime":10, "subtime": 0},
        {"id": "2", "profile": "4UT", "res": 1, "walltime":6, "subtime": 0},
    ],
    "profiles": {
        "3UT": {"delay": 3,"type": "delay"},
        "4UT": {"delay": 4,"type": "delay"}
    }
}

The meaning of each field in the JSON is explained in Batsim workload documentation.

The scheduling algorithm has to be developed separately, and communicates with Batsim through the messages defined in Batsim protocol. In this tutorial, we will develop our schedulers in Python thanks to pybatsim (version 3.2.0). The scheduler implementations will be located in the folder sched/.

Exercise 1: workload

Represent the workload used previously in this course (exercise 1 from the exercise document) in Batsim JSON format. You will put it in a new file named expe/ex1.json. We will use the field "walltime" to store the "deadline".

Exercise 2: run your first simulation

We will see if the installation from step Install worked, by running a first simulation.

For this, you will need two processes, launched in two separate terminal: one for the infrastructure simulator (batsim), managing the workload and the platform, and one for the scheduler.

On one Nix shell, launch batsim with platform input expe/1machine.xml, workload input expe/2jobs.json and export prefix out/:

batsim -p expe/1machine.xml -w expe/2jobs.json -e out/

The simulation should start, and wait for the scheduler to be started as well. Try to understand the output from batsim in the terminal.

On another Nix shell, run a scheduler. You are given a very simple scheduler, sched/rejector.py, that rejects all the jobs:

pybatsim sched/rejector.py

Try to understand the output of the scheduler in the terminal as well.

That's it, you have run your first simulation!

Exercise 3: understand and visualize Batsim outputs

Simulation outputs are stored in the out directory. You have schedule-centric outputs (schedule.csv) and job-centric outputs. Open them, read the doc, and make sure you understand most of the fields.

Have a look also at the file out/example_jobs.csv to see how the job output looks like when the jobs are not rejected.

Finally, to get a visual representation of the outputs, we will use the visualization software evalys. We provide a wrapper of evalys with the main functions that you need in the file util/plot.py. The jobs.csv output that was produced before is not suitable for visualization since it has no jobs that executed. Visualize the jobs on the example instead:

python util/plot.py out/example_jobs.csv

NB: the script has an option to store the output in a PDF file. Check available options with python util/plot.py -h.

Exercise 4: FCFS monomachine

Now, it's time to develop your own scheduler. We will start with a FCFS monomachine: it schedules the jobs by order of arrival.

Create a file fcfsMono.py in the sched directory. Draw inspiration from the template in sched/template.py to create your scheduler.

Warning: the file name and class name must match (for example, fcfsMono.py for the file name and FcfsMono for the class name).

You can use the following functions:

  • job.id, job.submit_time, job.profile.name, job.requested_resources, etc. to retrieve information about the job
  • self.bs.reject_jobs(list_of_jobs): to reject all the jobs in the list list_of_jobs (message REJECT_JOB in Batsim protocol)
  • self.bs.execute_jobs(list_of_jobs): to execute all the jobs in the list list_of_jobs (message EXECUTE_JOB in Batsim protocol). Warning: this function expects the jobs to have been allocated to one or several machines first. Use job.allocation = 0 in the monocore context.

Test your algorithm on 2jobs.json and ex1.json inputs, and other inputs that you will create. Visualize the outputs with the visualization script to see if it worked correctly.

Session 2: monomachine schedulers

Objective for this session:

  • implement another monomachine scheduler: EDF
  • compare FCFS and EDF in terms of mean waiting time and tardiness

Session 3: multimachine schedulers

  • séance 3 : multimachine hétérogènes en vitesse avec deadlines
  • séance 2 : monomachine + d'autres algos : RMS, EDF (3 cas: infra sous-dimensionnée / normalement dim / sur-dim, remarquer que sur du surdimensioné, l'algo change rien) => example simple avec 1 grosse tâche + 1 petite tâche avec deadline, en EDF sans preamption ça passe pas.

Session 4: energy constraint

  • séacnz 4 : EDF avec contrainte énergétique (powercap ou energycap)

Session 5: preamptive scheduling

  • séance 5 : implémenter la préamption