Stop sending CALL_ME_LATER when workload too big
I tried to execute batmen
with a large input and I get an error from batsim
:
Line 1269232 of batsim.log
:
[70055051.000466] ../src/kernel/EngineImpl.cpp:718: [ker_engine/CRITICAL] Oops! Deadlock detected, some activities are still around but will never complete. This usually happens when the user code is not perfectly clean.
When looking at sched.err.log
, I notice that the last CALL_ME_LATER
message is sent at line 666377/713393, which is weird...
Expe cmd file: robinfile.yaml
TODO:
-
run with --debug
? (problem: will make the log file explode.. already 221M) -
try to isolate a MWE with a user sending a million CALL_ME_LATER... -
use valgrind
to see if memory management is OK
[Edit] The problem seems to come from fb_user_think_time_only
. We can try:
-
rounding up to the nearest second the dates in the CALL_ME_LATER
, -
try to isolate a MWE with a fb_user_think_time_only
with a SABjson of 500k jobs.
Edited by Maël Madon