Hi guys,
We have migrated our prod platform from 2020.2.5 HF1 to 2020.2.6 HF2 and since this moment, rabbitMQ start to crash every 2/3 mn (sometimes no crash during 20 mn).
I checked on preprod platform and noticed that we have the same behaviour since the update (installed 2 months before prod)
I have fully reinstalled/cleared RabbitMQ 6 or 7 times, but we still have the problem
Do you guys experience the same on your platform ?
rabbit@xx.log
** Last message in was poll
** When Server state == {state,#Ref<0.1616053395.2357723137.59872>,5000,0.99,#{},#{}}
** Reason for termination ==
** {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,238}]},{aten_detector,handle_info,2,[{file,"src/aten_detector.erl"},{line,103}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,689}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,765}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
2021-12-06 08:28:11.677 [error] <0.26432.40> CRASH REPORT Process aten_detector with 0 neighbours exited with reason: {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}} in gen_server:call/2 line 238
2021-12-06 08:28:11.677 [error] <0.252.0> Supervisor aten_sup had child aten_detector started with aten_detector:start_link() at <0.26432.40> exit with reason {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}} in context child_terminated
2021-12-06 08:32:17.367 [info] <0.26553.40> RabbitMQ is asked to start...
2021-12-06 08:33:56.890 [info] <0.30067.40> RabbitMQ is asked to start...
2021-12-06 08:37:00.430 [error] <0.29882.40> ** Generic server aten_detector terminating
** Last message in was poll
** When Server state == {state,#Ref<0.1616053395.2366373902.169554>,5000,0.99,#{},#{}}
** Reason for termination ==
** {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,238}]},{aten_detector,handle_info,2,[{file,"src/aten_detector.erl"},{line,103}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,689}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,765}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
2021-12-06 08:37:00.744 [error] <0.29882.40> CRASH REPORT Process aten_detector with 0 neighbours exited with reason: {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}} in gen_server:call/2 line 238
2021-12-06 08:37:01.587 [error] <0.252.0> Supervisor aten_sup had child aten_detector started with aten_detector:start_link() at <0.29882.40> exit with reason {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}} in context child_terminated
crash.log
2021-12-06 08:19:00 =CRASH REPORT====
crasher:
initial call: aten_detector:init/1
pid: <0.29644.40>
registered_name: aten_detector
exception exit: {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,238}]},{aten_detector,handle_info,2,[{file,"src/aten_detector.erl"},{line,103}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,689}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,765}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
ancestors: [aten_sup,<0.247.0>]
message_queue_len: 2
messages: [{#Ref<0.1616053395.2364014603.60194>,#{}},poll]
links: [<0.252.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 28
reductions: 1570
neighbours:
2021-12-06 08:19:00 =SUPERVISOR REPORT====
Supervisor: {local,aten_sup}
Context: child_terminated
Reason: {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}}
Offender: [{pid,<0.29644.40>},{id,aten_detector},{mfargs,{aten_detector,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]
Case 00944505 already opened to support, but as full rabbitmq reinstallation did not work, he asks me to reinstall completely the main server !!