RABBIT MQ Crash

Question

RabbitMQ crashes periodically due to the below error. From what I found, could be related to disk running out of space, or possibly something blocking it such as AV. We removed AV and i validated the disk space at the time of the crash and it was less than 90% at the time. I have had cases opened with SolarWinds, we've rebuilt RabbitMQ, cleared subscriptions, repaired/rebuilt services, validated we're not over-utilized but the errors continue. This is on a server hosted in AWS if that means anything. Any help greatly appreciated. Has anyone seen anything like this before: 2020-11-08 10:52:35 =CRASH REPORT==== crasher: initial call: disk_log:init/2 pid: <0.202.0> registered_name: [] exception exit: {{{failed,{error,{file_error,"c:/PROGRA~3/SOLARW~1/Orion/RabbitMQ/db/RABBIT~1/PREVIOUS.LOG",enoent}}},[{disk_log,reopen,2}]},[{disk_log,do_exit,4,[{file,"disk_log.erl"},{line,1155}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [disk_log_sup,kernel_safe_sup,kernel_sup,<0.46.0>] message_queue_len: 0 messages: [] links: [<0.190.0>] dictionary: [{write_cache_timer_is_running,true},{quiet,false}] trap_exit: true status: running heap_size: 4185 stack_size: 27 reductions: 475761 neighbours: 2020-11-08 10:52:35 =SUPERVISOR REPORT==== Supervisor: {local,disk_log_sup} Context: child_terminated Reason: {{failed,{error,{file_error,"c:/PROGRA~3/SOLARW~1/Orion/RabbitMQ/db/RABBIT~1/PREVIOUS.LOG",enoent}}},[{disk_log,reopen,2}]} Offender: [{pid,<0.202.0>},{id,disk_log},{mfargs,{disk_log,istart_link,undefined}},{restart_type,temporary},{shutdown,1000},{child_type,worker}] 2020-11-08 10:52:35 =ERROR REPORT==== Mnesia(rabbit@^MAINPOLLINGENGINENAME^ ** ERROR ** (core dumped to file: "c:/PROGRA~3/SOLARW~1/Orion/RabbitMQ/MnesiaCore.rabbit@^MAINPOLLINGENGINENAME^_1604_832755_816870") ** FATAL ** {error,{"Cannot rename disk_log file",latest_log,"c:/PROGRA~3/SOLARW~1/Orion/RabbitMQ/db/RABBIT~1/PREVIOUS.LOG",{log_header,trans_log,"4.3","4.15.6",rabbit@^mainpollingenginename^{1604,832755,675869}},{file_error,"c:/PROGRA~3/SOLARW~1/Orion/RabbitMQ/db/RABBIT~1/PREVIOUS.LOG",enoent}}} 2020-11-08 10:52:45 =SUPERVISOR REPORT==== Supervisor: {local,mnesia_sup} Context: child_terminated Reason: killed Offender: [{pid,<0.179.0>},{id,mnesia_kernel_sup},{mfargs,{mnesia_kernel_sup,start,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}] 2020-11-08 10:52:45 =SUPERVISOR REPORT==== Supervisor: {local,mnesia_sup} Context: shutdown Reason: reached_max_restart_intensity Offender: [{pid,<0.179.0>},{id,mnesia_kernel_sup},{mfargs,{mnesia_kernel_sup,start,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}] 2020-11-08 10:52:45 =ERROR REPORT==== Mnesia(rabbit@^MAINPOLLINGENGINENAME^): ** ERROR ** mnesia_event got unexpected event: {'EXIT',<0.181.0>,killed} 2020-11-08 10:52:45 =CRASH REPORT==== crasher: initial call: gen_event:init_it/6 pid: <0.177.0> registered_name: mnesia_event exception exit: {killed,[{gen_event,terminate_server,4,[{file,"gen_event.erl"},{line,354}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [mnesia_sup,<0.175.0>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: true status: running heap_size: 2586 stack_size: 27 reductions: 51327 neighbours: 2020-11-08 10:52:45 =CRASH REPORT==== crasher: initial call: application_master:init/4 pid: <0.174.0> registered_name: [] exception exit: {killed,[{application_master,terminate,2,[{file,"application_master.erl"},{line,232}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.173.0>] message_queue_len: 0 messages: [] links: [<0.43.0>] dictionary: [] trap_exit: true status: running heap_size: 6772 stack_size: 27 reductions: 104517

acurrent · Answer

Yes i have tried. I cant say how it was rebuilt i wasn't around then, but it was rebuilt with the SolarWinds team at their reccomendation

pgimuriman · Answer

Have you try to run permission checker application to see if there is any permission issues with the disk?

if you have, can you please provide steps you perform to rebuild rabbitmq environment. ( We had similar issue with rabbit mq crashing and only after completely rebuilding rabbit mq environment, it started to work. Here is document I used to rebuild my rabbit mq environment.

https://support.solarwinds.com/SuccessCenter/s/article/Full-RabbitMQ-Reset-in-Orion-Platform-products?language=en_US

)