11 Replies Latest reply: Aug 27, 2013 9:51 AM by fitzy141 RSS

% Processor Time   vs   CPU Utilization

clausendb

What is the difference?  I have a Windows 2008 R2 x64 virtual machine (VMware).  Orion NPM says that the CPU has not gone over 25% in the past day.  Orion APM, Windows 2003-2008 Template, says that % Processor Time has been riding about 75-80 % constant in the past day.  There are no alarms in NPM, but APM is in a near constant alarm state.

The virtual server is a vSphere 5, vCenter 5 server - 4GB RAM and 2 x CPU.

Any help is appreciated.

- Dave Claussen

 
  • Re: % Processor Time   vs   CPU Utilization
    Andy McBride

    Hi Dave - Can you add a couple of screen shots showing this? I am having trouble picturing exactly which resources you are describing.

    Andy

    • Re: % Processor Time   vs   CPU Utilization
      clausendb

      Because of the alerts, I removed the the application template.  I added it back this morning, but the issue has not yet shown itself.

      Picture this:

       - Date/Time 3/2/12 1:00 PM to 3/2/12 5:00 PM

      If you look at a chart for CPU utilization (in NPM), it will show between 10 and 20 %.

      If you look at a chart for " % Processor Time " (in APM), it will show between 70 and 80%.

      I am asking about this because the threshold for " % Processor Time " (in APM) is 70% and the element is alerting and showing RED on the console.

       

      As soon as this thing generates enough data for charts, I will post the screen shots.

       

      Thank you Andy.

  • Re: % Processor Time   vs   CPU Utilization
    clausendb

    The charts below do not illustrate the actual problem that I am having (that seems to have stopped of course), but they still pose the same question.

     

    The time frame for both charts is 3/8/12, 1:00AM to 4:00AM.

     

        NPM shows the CPU at or below ~18%

     

     

     

       APM shows % Processor Time at or above 50%.

     

    What is the difference?

  • Re: % Processor Time   vs   CPU Utilization
    clausendb

    Task Manager on the server matches the CPU numbers from NPM.

     

     

    • Re: % Processor Time   vs   CPU Utilization
      aLTeReGo

      These charts are actually not as different as they may appear. The one from APM shows the minimum and maximum values over a given polling period (1am - 4am it appears) where the average is a little under 20%. The NPM chart your showing is an average of an already averaged number "Average of Average CPU load" so I would expect it to be lower than actual. As you can see below the % Processor Time is the most accurate value you can use for monitoring a Windows host. This is also the counter monitored for WMI nodes in Orion. It should always be in lockstep with the Windows Performance Monitor as that's what's represented in the CPU Usage and CPU Usage History chart of the Windows Task Manager. The value being collected and displayed by SAM/APM is raw and unchanged from the monitored host. SNMP however is a trickier beast that requires multiple polls of the host because CPU usage is a calculation based on time tick differences between individual polling intervals.

      Below is a screenshot showing % Processor Time in comparison to the Windows Task Manager running on my server.

  • Re: % Processor Time   vs   CPU Utilization
    clausendb

    Look at the charts below.  They are from 10:00 Am to 11:00 AM today.

     

    You will see that the APM chart below (Green) shows that the % Processor Time went above 75% around 10:45AM.  The default warning threshold for % Processor Time in APM is 75%.  An alert was generated.

     

    BUT, in the NPM chart below (Red), the CPU utilization for this machine was at < 20% at that same time frame.

     

     

    That is the question.  If NPM says that the CPU is at < 20%, why am I receiving an alert from APM that my % Processor Time is at 75%.

    I am thinking of disabling the % Processor Time monitor altogether.

    • Re: % Processor Time   vs   CPU Utilization
      aLTeReGo

      You will see that the APM chart below (Green) shows that the % Processor Time went above 75% around 10:45AM.  The default warning threshold for % Processor Time in APM is 75%.  An alert was generated.

      Correct. That is what occurred on the server at this time in absolute values. The server spiked for some reason at 10:45am to 75%. That is a statement of fact, and if you were on the server looking at the Windows Task Manager or the Windows Performance Monitor at 10:45am you would've seen the same thing there.

      BUT, in the NPM chart below (Red), the CPU utilization for this machine was at < 20% at that same time frame.

      There's a few reasons this would occur. The first being that the default polling interval for APM component monitors is 5 minutes, and it collects an absolute value from the operating system. This is twice a frequent of a polling interval as NPM which collects statistics every 10 minutes. The most obvious of reasons why you would see differences in the graphs is NPM wasn't polling at the time of the CPU spike. The second reason why you wouldn't see the the same information in NPM is that your chart is showing an average of an already averaged number. This double averaging of a number influences how the data is charted and displayed. Lastly, the way SNMP requires two polling cycles, each 10 minutes apart from one another in NPM to calculate CPU utilization you're seeing a calculation of CPU utilization based on difference of values between polling intervals measured in ticks. The more frequent the polling interval the more accurate this information becomes to what's actually occurring on the server. The longer the length of time between these polling intervals the more averaged the data becomes. Unfortunately Windows doesn't update SNMP statistics any more frequent then once every two minutes, but WMI/RPC information is updated as frequently as every second and provides real/true values that don't need to be calculated over multiple polling intervals.

      If NPM says that the CPU is at < 20%, why am I receiving an alert from APM that my % Processor Time is at 75%.

      The long answer is above. The short answer is that the CPU did spike to 75% and APM/SAM reported it as such. That condition may not have been sustained for very long as CPU utilization can spike occasionally when a process spins up, a job is executed, or maybe someone logged into the server via RDP. SAM will always be more accurate than NPM via SNMP because it's collecting the true raw value as provided by the operating system at twice the polling frequency as NPM.

      I am thinking of disabling the % Processor Time monitor altogether.

      That's certainly one way to go. Another might be to adjust the alert so that the condition must be sustained for longer than 2 or 3 polling intervals before an alert is generated. That's what I would recommend.