Discrepancies between CPU load percentage alerts and CPU utilization...

Hello everyone,

We are continuously breaching thresholds with CPU load alerts in SAM. Upon investigation, I am finding that the actual CPU utilization is much less. (CPU Load=100% vs. CPU Utilization=40%). Has anyone else run into this? Where do you suggest that we start?

  • When you say that the actual CPU utilization is less, how are you determining that?

    Is this a virtual machine or cloud server?

    Is it monitored as a node? 

    There are some cases where virtualization or cloud will report virtual CPU utilization in a completely different manner than an agent or a user at the keyboard.  Basically the difference between looking inside the operating system to determine how much of the systems perceived CPU is being used versus looking from the virtualization layer at how much of a VM's potential CPU allocation is being used.  This phenomenon is sometimes referred to as "CPU steal". 

    I'm not sure that his is what you're seeing, but it is a possibility.

  • I know this post is a year old but.... bump emoticons_happy.png

    I have an Orion server (primary poller, a VM) showing itself having all 4 CPU cores consistently (for hours) at 198-00% "CPU Load" (Its showing this in the web UI as I have the Orion server in there with standard node monitoring). All the OS tools (perfmon), vSphere tools, and anything else I find *but* Orion show the CPU running waaaaaay below this (20-50%, with some spikes to 90%), but these are reporting % CPU Usage (or other metrics) and not "CPU Load".  From every perspective *but* Orion itself  the server looks like it's not using much CPU.  Orion thinks its almost on fire.

    Is "CPU Load" some sort of calculation over x time points?  How is "CPU Load" determined?

    The reason I'm a little concerned is because if we're alerting on "CPU Load" and it's not indicative of what people think it is (The Total % CPU Usage......or average, or total at the time of polling) then we may need to adjust our alerting thresholds.

  • BUMP.

    we're also getting this.. anyone have an explanation? TIA