Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

NPM is showing 100% CPU usage for my linux server when it's actually 2%

darkfiber

Solarwinds is showing that my linux server is at 100% CPU usage all the time but it's actually around 0-10%.

The server is Red Hat Enterprise ES R4 with Net-SNMP version 5.1.2

What can I do to get it to read the CPU usage correctly? Thanks!

Find more posts tagged with

network_monitoring

Linux

CPU

100

network_management

xenserver

Accepted answers

All comments

justty

I am having the same issue with a Citrix XenServer that I am trying to monitor. Not sure what version of Net-SNMP I am running, but I have XenServer 5.5 (Xen is based on CentOS which is based on RedHat)

I see the 100% CPU on my main screen, NODES WITH HIGH CPU and on the pop-up window for the node. However, when I drill into the node, I do not see the CPU and memory gauge even though I have chosen them for this display.

I just added the memory and CPU gauges separately and now I see CPU usage (pegged 100%) but no memory gauge.

mcampbell

We are seeing similar from our Secure Computing Snap Gear devices, which are running a Linux Kernel. Not sure on the version.

Solar Winds show 80 - 100% CPU or Memory.

Actual box showing nominal usage.

MarieB

Hi Everyone-

I've sent this thread to the Product Manager, who should get back to you asap!

Question: have you all run diagnostics on your NPM? Also, have you opened a support ticket?

justty

Opened case for my issue

Case #126229

Karlo.Zatylny

Hi,

This sounds like our known issue of using the old CpuIdle OID for net-snmp:

1.3.6.1.4.1.2021.11.11.0

We are switching to use the new OID ssCpuRawIdle:

1.3.6.1.4.1.2021.11.53.0

For each of these OIDs we will take 100 - OID.Value for your CPU utilization. If people on this thread could verify that the number we are showing now is wrong because of the value your device reports from the first OID and would be corrected by using the second one, we would greatly appreciate the verification.

Thanks

justty

When I walk my XenServer this OID is not located. I only get 38 OID's returned.

When I try to create an UnDP based off this OID and assign it to the server, I get "No such Name"

Karlo.Zatylny

We're looking at justty's diagnostics to see what might be going on.

Thanks

FormerMember

I just setup SNMP on my XenServer and I am getting the same thing. Any thoughts on how to correct it

branden

I am having the same issue on two (out of several dozen) Windows servers.

-branden

jbaulsir

Any update on this? we are seeing incorrect cpu percetages on 75% of our linux boxes. the oid 1.3.6.1.4.1.2021.11.53.0 is not a percentage it is actual time, how can we use the to calculate a percent of processor time.

Karlo.Zatylny

Hi,

Yeah, you're right that isn't a percentage. Guess I was multi-tasking when I wrote the earlier post. If users could confirm that this OID is being populated on their devices that are having problems, then we're on the right track for your device. If not, you may have to search the MIBs your device supports to see if you can find the CPU OIDs.

Thanks

jbaulsir

Ok, I think managed to figure this out...

The OIDs (ssCpuRawUser,ssCpuRawnice,ssCpuRawSystem,ssCpuRawIdle,ssCpuRawWait,ssCpuRawKernel,ssCpuRawInterrupt) are The number of 'ticks' (typically 1/100s) spent processing those catagories.

the formula Truncate({100-{ssCpuRawIdle}/({ssCpuRawKerne}+{ssCpuRawIdle}+{ssCpuRawSystem}+{ssCpuRawUse}+{ssCpuRawWait}+{ssCpuRawnice})*100},2)

Seem to give correct results but this is all done through UnDP and doesn't integrate with all the canned charting and alerting.

I'd like to know your thoughts on what Solarwinds will do to address this.

olivern

Another option is to use the pass function of net-snmp. This can also be used to monitor other values of your linux system.

In your /etc/snmp/snmpd.conf include this line:

pass .1.3.6.1.4.1.4413.4.1 /bin/bash /root/cpu.sh

the root.sh file will look like this:

echo ".1.3.6.1.4.1.4413.4.1.1"
echo integer
vmstat | grep -v procs |grep -v free |awk {'print $15'}

Once you restart the snmpd service you can now poll that OID and report the values. I've used this for other purposes like number of apache connections, mysql connections, number of items in the mailq, etc.

I threw this together real quick, in production environments I would include a pid lock in the script.

Let me know if you have any questions.

jbaulsir

That's pretty nice but wouldn't that give you processor free %? We still need to invert it.

olivern

That's correct, this will give you percent used:

echo ".1.3.6.1.4.1.4413.4.1.1"
echo integer
CPU=`vmstat | grep -v procs |grep -v free |awk {'print $15'}`
USED=$((100-$CPU))
echo $USED

contracer

Hi,

Is there any way to use this script to get FS data (FS capacity, FS used, FS available)?

Thanks

Jaja28

Hi Karlo,

I tried testing 1.3.6.1.4.1.2021.11.53.0 with UDP, "A value was not returned" prompted. May we now what to do next? Do we need MIB of Red Hat Enterprise Linux AS release 4 (Nahant Update 7)?

Thank you very much

Karlo.Zatylny

Hi,

We have confirmed that the new OIDs we are using are working with many customers' Linux machines with net-snmp. Perhaps you need to update your net-snmp to the latest version.

What OIDs are getting used for CPU currently on your device and is it accurate?

Thanks

Jaja28

Hi,

What was the latest version of net-snmp? Presently, we are using version 2

I do not know how to extract OID thru Solarwinds, i cant compare value with the server because we are not the one handling the equipment. But I tried using MIB walk toolset. Kindly see below OID's. No OID value appears to be the same as value seen in solarwinds. CPU Utilization of the server is 100%

MIB OID Name Value

SNMPv2-MIB 1.3.6.1.2.1.1.1.0 sysDescr.0 Linux phcab-x4100-LDAP01 2.6.9-78.ELsmp #1 SMP Wed Jul 9 15:39:47 EDT 2008 i686

SNMPv2-MIB 1.3.6.1.2.1.1.2.0 sysObjectID.0 1.3.6.1.4.1.8072.3.2.10

SNMPv2-MIB 1.3.6.1.2.1.1.3.0 sysUpTime.0 30761049

SNMPv2-MIB 1.3.6.1.2.1.1.4.0 sysContact.0 Root <root@localhost> (configure /etc/snmp/snmp.local.conf)

SNMPv2-MIB 1.3.6.1.2.1.1.5.0 sysName.0 phcab-x4100-LDAP01

SNMPv2-MIB 1.3.6.1.2.1.1.6.0 sysLocation.0 Unknown (edit /etc/snmp/snmpd.conf)

SNMPv2-MIB 1.3.6.1.2.1.1.8.0 sysORLastChange.0 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.1 sysORID.1 1.3.6.1.2.1.31

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.2 sysORID.2 1.3.6.1.6.3.1

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.3 sysORID.3 1.3.6.1.2.1.49

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.4 sysORID.4 1.3.6.1.2.1.4

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.5 sysORID.5 1.3.6.1.2.1.50

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.6 sysORID.6 1.3.6.1.6.3.16.2.2.1

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.7 sysORID.7 1.3.6.1.6.3.10.3.1.1

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.8 sysORID.8 1.3.6.1.6.3.11.3.1.1

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.2.9 sysORID.9 1.3.6.1.6.3.15.2.1.1

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.1 sysORDescr.1 The MIB module to describe generic objects for network interface sub-layers

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.2 sysORDescr.2 The MIB module for SNMPv2 entities

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.3 sysORDescr.3 The MIB module for managing TCP implementations

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.4 sysORDescr.4 The MIB module for managing IP and ICMP implementations

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.5 sysORDescr.5 The MIB module for managing UDP implementations

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.6 sysORDescr.6 View-based Access Control Model for SNMP.

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.7 sysORDescr.7 The SNMP Management Architecture MIB.

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.8 sysORDescr.8 The MIB for Message Processing and Dispatching.

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.3.9 sysORDescr.9 The management information definitions for the SNMP User-based Security Model.

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.1 sysORUpTime.1 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.2 sysORUpTime.2 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.3 sysORUpTime.3 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.4 sysORUpTime.4 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.5 sysORUpTime.5 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.6 sysORUpTime.6 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.7 sysORUpTime.7 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.8 sysORUpTime.8 0

SNMPv2-MIB 1.3.6.1.2.1.1.9.1.4.9 sysORUpTime.9 0

HOST-RESOURCES-MIB 1.3.6.1.2.1.25.1.1.0 hrSystemUptime.0 889475924

vinay.by

What version of NPM are you using?

marunderwood

10.6.1

marunderwood

Not sure if this will help, but to troubleshoot, I installed the Undp (earlier in this thread), the NetSNMP (transforms) TEST value is correct (= 16.17) when I run a test on the device (during the UnDP editting/test screens). However, the Node Details (Universal Device Poller Status resource) it's not the same value (it's 100), nor is my report (it's 100 raw status & status), nor the 'Orion Universal Device Poller application's historical poll results (also 100). So, why is the 'Test' showing the correct value, but the stored & displayed values are different?

Ahh... I added the 'total' column to the report, and it contains the correct value. So, it probably has something to do with this being captured as a COUNTER (I tried other types but couldn't get any to work) so it's storing the differences. So, when the Transform computation runs, it's probably using the difference instead of the total column. Is there a way to force it to use the total? The formula is currently .. Truncate({100-{ssCpuRawIdle}/({ssCpuRawKernel}+{ssCpuRawIdle}+{ssCpuRawSystem}+{ssCpuRawUser}+{ssCpuRawWait}+{ssCpuRawNice})*100},2).

I also noticed the UnDP for the ssCpuRaw fields have unit = blank and Timeframe=None. Are these right?

marunderwood

Continued on CPU Load at 100% but is really 16% for set-snmp Linux server.