Test fail in the CI but not when run on my machine

Job #26268 failed for a220413d.

I ran the test suite on my machine, but couldn't reproduce, even with the option --pure passed to nix-shell.

Then I tried to run the test suite on my machine but inside a docker container, and I managed to reproduce.

Test log

_____________________________ test_expected_output _____________________________
    def test_expected_output():
        for log_file in os.listdir('test/expected_log'):
            instance_name = log_file[:-9]
>           assert_expected_output(instance_name)
test/test_zexpected_output.py:6: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_file = 'fb_user_think_time_only-many_start_sessions'
    def assert_expected_output(test_file):
        expected = 'test/expected_log/' + test_file + '_jobs.csv'
        obtained = 'test-out/' +  test_file + '/_jobs.csv'
>       assert filecmp.cmp(expected, obtained), f"\
            Files {expected} and {obtained} should be equal but are not.\n\
            Run `diff {expected} {obtained}` to investigate why.\n\
            Run `cp {obtained} {expected}` to override the expected file with the obtained."
E       AssertionError:         Files test/expected_log/fb_user_think_time_only-many_start_sessions_jobs.csv and test-out/fb_user_think_time_only-many_start_sessions/_jobs.csv should be equal but are not.
E               Run `diff test/expected_log/fb_user_think_time_only-many_start_sessions_jobs.csv test-out/fb_user_think_time_only-many_start_sessions/_jobs.csv` to investigate why.
E               Run `cp test-out/fb_user_think_time_only-many_start_sessions/_jobs.csv test/expected_log/fb_user_think_time_only-many_start_sessions_jobs.csv` to override the expected file with the obtained.
test/helper.py:63: AssertionError

The two files that differ: obtained_jobs.csv expected_jobs.csv

Batsim outputs

When I compare the batsim outputs between running on my machine and running on docker: on_docker_batsim.log on_machine_batsim.log

We see that the problem is that events occurring simultaneously are not handled in the same order. See for example line 31 of both logs, where 3 jobs (1:s1, 2:s2 and 3:s3) are submitted at the same time. They don't appear in the same order in the list ([1,3,2] on docker, [2,1,3] on my machine).

Line 32 of the log, we see that the scheduler treat them in that order:

  • docker: job 1 and 3 are executed now, job 2 is queued
  • machine: job 2 and 1 are executed now, job 3 is queued
Edited by Ghost User