Test fail in the CI but not when run on my machine
Job #26268 failed for a220413d.
I ran the test suite on my machine, but couldn't reproduce, even with the option --pure
passed to nix-shell
.
Then I tried to run the test suite on my machine but inside a docker container, and I managed to reproduce.
Test log
_____________________________ test_expected_output _____________________________
def test_expected_output():
for log_file in os.listdir('test/expected_log'):
instance_name = log_file[:-9]
> assert_expected_output(instance_name)
test/test_zexpected_output.py:6:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_file = 'fb_user_think_time_only-many_start_sessions'
def assert_expected_output(test_file):
expected = 'test/expected_log/' + test_file + '_jobs.csv'
obtained = 'test-out/' + test_file + '/_jobs.csv'
> assert filecmp.cmp(expected, obtained), f"\
Files {expected} and {obtained} should be equal but are not.\n\
Run `diff {expected} {obtained}` to investigate why.\n\
Run `cp {obtained} {expected}` to override the expected file with the obtained."
E AssertionError: Files test/expected_log/fb_user_think_time_only-many_start_sessions_jobs.csv and test-out/fb_user_think_time_only-many_start_sessions/_jobs.csv should be equal but are not.
E Run `diff test/expected_log/fb_user_think_time_only-many_start_sessions_jobs.csv test-out/fb_user_think_time_only-many_start_sessions/_jobs.csv` to investigate why.
E Run `cp test-out/fb_user_think_time_only-many_start_sessions/_jobs.csv test/expected_log/fb_user_think_time_only-many_start_sessions_jobs.csv` to override the expected file with the obtained.
test/helper.py:63: AssertionError
The two files that differ: obtained_jobs.csv expected_jobs.csv
Batsim outputs
When I compare the batsim outputs between running on my machine and running on docker: on_docker_batsim.log on_machine_batsim.log
We see that the problem is that events occurring simultaneously are not handled in the same order. See for example line 31 of both logs, where 3 jobs (1:s1
, 2:s2
and 3:s3
) are submitted at the same time. They don't appear in the same order in the list ([1,3,2] on docker, [2,1,3] on my machine).
Line 32 of the log, we see that the scheduler treat them in that order:
- docker: job 1 and 3 are executed now, job 2 is queued
- machine: job 2 and 1 are executed now, job 3 is queued