Skip to content

Stop sending CALL_ME_LATER when workload too big

I tried to execute batmen with a large input and I get an error from batsim:

Line 1269232 of batsim.log:

[70055051.000466] ../src/kernel/EngineImpl.cpp:718: [ker_engine/CRITICAL] Oops! Deadlock detected, some activities are still around but will never complete. This usually happens when the user code is not perfectly clean.

When looking at sched.err.log, I notice that the last CALL_ME_LATER message is sent at line 666377/713393, which is weird...

Expe cmd file: robinfile.yaml

TODO:

  • run with --debug? (problem: will make the log file explode.. already 221M)
  • try to isolate a MWE with a user sending a million CALL_ME_LATER...
  • use valgrind to see if memory management is OK

[Edit] The problem seems to come from fb_user_think_time_only. We can try:

  • rounding up to the nearest second the dates in the CALL_ME_LATER,
  • try to isolate a MWE with a fb_user_think_time_only with a SABjson of 500k jobs.
Edited by Ghost User