Can someone tell me why this alert has been triggered? I have roughly 6 servers out of about 75 that are triggering this alert but they shouldn't be, maybe some fresh eyes can see something.
Server is Oracle Linux.
Thanks
One thing to remember is that by default CPU polling in NPM happens every 9 minutes. So by the time you receive an alert the issue might have resided. You do have the condition must exist checked which is what I was going to recommend (and it's what I do too).
Have you watched the CPU directly on that Linux host to see how it rides in real-time? It might just be that the CPU spikes at the same time that NPM polls the node. Those 6 nodes in questions probably have a average CPU load.
Thanks chad.every - I unfortunately don't have direct access other than the real-time process explorer from SolarWinds. Watching that the server never went over 60% CPU.
Is it possible that it's triggering on the level of one cpu versus the total cpu?
I upped the time threshold from 10minutes to 20, just to see if some of the alerts would clear. No luck yet, in fact a new server came in on the same alert. It's also at about 50% utilization. Ugh
My CPU alert is a little different. Here is what I have.
Not having direct access to those servers does make it more difficult. Maybe see if you can work with that Linux team to see if there are any abnormalities. If everything checks out ok and you're still getting CPU alerts then those 6 servers are probably just normal operation. I would edit the CPU thresholds on those servers to accommodate for that.
Thanks - I restructured my rules to be like yours.
Here's what I still get happening, the memory is the issue but the CPU alert is being triggered. The one for Memory utilization already triggered.
I'd try setting those nodes to use dynamic baseline thresholds and see if that improves.
Hi @chad.every ,
hope you are doing good,
Did you notice, in 2020.2.5 version, This alert is working when you applied setting ?
i have setup new alert in 2020.2.5 version for CPU load when "critical value reached" to "Yes" and set the new value "10" under critical section in one node, btw that time node's cpu load was 15% .
So in this situation alert should work. But it is not working.
can you check at your end as well.
regards