3 Replies Latest reply on Mar 25, 2014 8:29 AM by marunderwood

    CPU Load at 100% but is really 16% for set-snmp Linux server

    marunderwood

      Hi!  I have an advanced alert which is triggering & the system's CPU Load status shows it at 100%.  However, when we look at the server and the real-time process explorer, it's about 16%

      I suspect the cpu load is being calculated using the delta (rawstatus) counter values instead of the new (total) values for these devices.

       

      Current environment:  Orion Platform 2013.2.1, SAM 6.0.2, IPAM 4.1, NCM 7.2.2, NPM 10.6.1, NTA 3.11.0, IVIM 1.9.0

      Background:  Installed NPM 10.6.1 from v 10.5 a few weeks ago.  Upgrading to 10.7 isn't an option right now, without upgrading server to x64 & upgrading the sql server.

      This is the only server with this error, although we have 5 other Linux oracle db servers.

      Analysis:  After reading Re: NPM is showing 100% CPU usage for my linux server when it's actually 2% from 2009, I have this additional information:

      • Pollers are N.Cpu.SNMP.HrProcessorLoad.  (I haven't played around with changing these.)
      • Nodes are using snmpv2c
      • Server has 4 CPUs
      • I added UnDP from the article above for all of the oracle db servers on Linux.  This collects the cpuRaw counters using a unit = blank and TimeFrame=None.  It is storing the delta/difference value between polls in the status and rawstatus column and the new value is stored in the total column.  All servers appear to be calculating the same (incorrect values) for the cpu load.  The 100% server's ssCpuRawIdle value (1.3.6.1.4.1.2021.11.53) continues to retain the same value, so the delta value stored in the tables is 0.  This delta of 0 is causing the calculation for CPU load to come out to 100% because it appears to be using the delta values for the calculations instead of the new value (total column in the table).
      • When I view the UnDP (Universal Device Poller) values, especially the transforms of NetSNMP, they appear to be calculated correctly (using the totals) in the UnDP poller application.  However, on other views, such as the web page for the node showing the poller values, it's showing the delta values (which are misleading). Secondary design issue.

      Other articles:  Linux CPU usage July 2012, Re: MIB for CPU Utilization - Linux Server Feb 2010.

       

      Are my suspicions correct?  Do I have to 'turn off' the CPU monitoring for the Linux servers and handle these via a SAM monitor or has this been fixed in a later version?

        • Re: CPU Load at 100% but is really 16% for set-snmp Linux server
          RichardLetts

          hrProcessorLoad is:

          "The average, over the last minute, of the percentage of time that this processor was not idle. Implementations may approximate this one minute smoothing period if necessary."

          It may be that the system really was running at 100% for a minute, but by the time you looked at it the load had dropped down...

          I have a similar issue with Juniper routers -- they only report(ed) the instantaneous CPU load which was not much use.

          I'd suggest requiring that the CPU load 100% for more than two or three polling cycles..

           

          on your UnDP, this should not be storing the deltas -- did you configure it as a counter type instead of a rawtype?

          1 of 1 people found this helpful
            • Re: CPU Load at 100% but is really 16% for set-snmp Linux server
              marunderwood

              The CPU has been at 100% for several days (since 3/17 7 am).  And I have the alert to only trigger if it's been > 80% for over 1 hour.  (I've had this alert running for months.  Plus the "average cpu load & memory" guage has been reporting 100%.)  I'm not getting any SNMP results from this one server now.  I'll have to get with a server admin to see what's up.

               

              On the UnDP, yes it's setup as a counter.  It's been changed to raw value & is working.  (I thought I tried rawvalue yesterday and it failed. There must have been another issue going on too.)  Thanks!!  At least I have another way to verify the CPU Load value.

                • Re: CPU Load at 100% but is really 16% for set-snmp Linux server
                  marunderwood

                  Since we weren't getting any SNMP results, we rebooted the server.  This reset the alert.  We assume there was a hung process (or SNMP wasn't returning the results) which was causing the false alert.  Before removing the UnDP's I compared the CPU %'s to the ones showing on the web page's "Min/Max/Average of Average CPU Load".  To me, they didn't seem to agree.  The UnDP for one was consistently reporting around 50% but the average was showing 2%.   Hopefully, it's due to the way the averages are calculated, but I'm not entirely convinced.