Detect Linux "OOM Killer" action

Question

I'm now working in an environment with a lot of Linux servers.  Full disclosure: my background is 90% Windows admin.

I have a request from Linux admins to detect when the "OOM Killer" process starts running.  I guess OOM means "Out Of Memory" and is a kernel process that kicks in when the system runs out of memory.

Linux is Red Hat Enterprise of fairly recent vintage (I don't have the exact version number).

Thanks in advance for any inputs from the community.

adam.beedell · Accepted Answer

Not a linuxey person here. Your linux admins should have given you more to work with here really!

If you've just got SNMP you're probably screwed. I'd start with just making sure there's memory spare.

The agent would be able to report on the processes and/or scrape the syslog for OOM killer events (the events bit is an easy google)
You could set the syslog to forward to solarwinds to use in the events format. Ask the linux folk what the name of the process/daemon is for OOM exactly.

I'd probably get a memory alert, an event alert, and a PID alert (as the memory one will come early, the PID one is unlikely to grab it as it'll run and bin itself off between polling cycles) and an events one that'd grab it most of the time but after the issue occurs