Solarwinds is showing that my linux server is at 100% CPU usage all the time but it's actually around 0-10%.
The server is Red Hat Enterprise ES R4 with Net-SNMP version 5.1.2
What can I do to get it to read the CPU usage correctly? Thanks!
I am having the same issue with a Citrix XenServer that I am trying to monitor. Not sure what version of Net-SNMP I am running, but I have XenServer 5.5 (Xen is based on CentOS which is based on RedHat)
I see the 100% CPU on my main screen, NODES WITH HIGH CPU and on the pop-up window for the node. However, when I drill into the node, I do not see the CPU and memory gauge even though I have chosen them for this display.
I just added the memory and CPU gauges separately and now I see CPU usage (pegged 100%) but no memory gauge.
We are seeing similar from our Secure Computing Snap Gear devices, which are running a Linux Kernel. Not sure on the version.
Solar Winds show 80 - 100% CPU or Memory.
Actual box showing nominal usage.
Hi Everyone-
I've sent this thread to the Product Manager, who should get back to you asap!
Question: have you all run diagnostics on your NPM? Also, have you opened a support ticket?
M
Opened case for my issue
Case #126229
Hi,
This sounds like our known issue of using the old CpuIdle OID for net-snmp:
1.3.6.1.4.1.2021.11.11.0
We are switching to use the new OID ssCpuRawIdle:
1.3.6.1.4.1.2021.11.53.0
For each of these OIDs we will take 100 - OID.Value for your CPU utilization. If people on this thread could verify that the number we are showing now is wrong because of the value your device reports from the first OID and would be corrected by using the second one, we would greatly appreciate the verification.
Thanks
When I walk my XenServer this OID is not located. I only get 38 OID's returned.
When I try to create an UnDP based off this OID and assign it to the server, I get "No such Name"
We're looking at justty's diagnostics to see what might be going on.
I just setup SNMP on my XenServer and I am getting the same thing. Any thoughts on how to correct it
RC
I am having the same issue on two (out of several dozen) Windows servers.
-branden
Any update on this? we are seeing incorrect cpu percetages on 75% of our linux boxes. the oid 1.3.6.1.4.1.2021.11.53.0 is not a percentage it is actual time, how can we use the to calculate a percent of processor time.
JB
Yeah, you're right that isn't a percentage. Guess I was multi-tasking when I wrote the earlier post. If users could confirm that this OID is being populated on their devices that are having problems, then we're on the right track for your device. If not, you may have to search the MIBs your device supports to see if you can find the CPU OIDs.
Ok, I think managed to figure this out...
The OIDs (ssCpuRawUser,ssCpuRawnice,ssCpuRawSystem,ssCpuRawIdle,ssCpuRawWait,ssCpuRawKernel,ssCpuRawInterrupt) are The number of 'ticks' (typically 1/100s) spent processing those catagories.
the formula Truncate({100-{ssCpuRawIdle}/({ssCpuRawKerne}+{ssCpuRawIdle}+{ssCpuRawSystem}+{ssCpuRawUse}+{ssCpuRawWait}+{ssCpuRawnice})*100},2)
Seem to give correct results but this is all done through UnDP and doesn't integrate with all the canned charting and alerting.
I'd like to know your thoughts on what Solarwinds will do to address this.
Another option is to use the pass function of net-snmp. This can also be used to monitor other values of your linux system.
In your /etc/snmp/snmpd.conf include this line:
pass .1.3.6.1.4.1.4413.4.1 /bin/bash /root/cpu.sh
the root.sh file will look like this:
echo ".1.3.6.1.4.1.4413.4.1.1"echo integervmstat | grep -v procs |grep -v free |awk {'print $15'}
Once you restart the snmpd service you can now poll that OID and report the values. I've used this for other purposes like number of apache connections, mysql connections, number of items in the mailq, etc.
I threw this together real quick, in production environments I would include a pid lock in the script.
Let me know if you have any questions.
That's pretty nice but wouldn't that give you processor free %? We still need to invert it.
That's correct, this will give you percent used:
echo ".1.3.6.1.4.1.4413.4.1.1"echo integerCPU=`vmstat | grep -v procs |grep -v free |awk {'print $15'}`USED=$((100-$CPU))echo $USED
Is there any way to use this script to get FS data (FS capacity, FS used, FS available)?
Hi Karlo,
I tried testing 1.3.6.1.4.1.2021.11.53.0 with UDP, "A value was not returned" prompted. May we now what to do next? Do we need MIB of Red Hat Enterprise Linux AS release 4 (Nahant Update 7)?
Thank you very much
We have confirmed that the new OIDs we are using are working with many customers' Linux machines with net-snmp. Perhaps you need to update your net-snmp to the latest version.
What OIDs are getting used for CPU currently on your device and is it accurate?
What was the latest version of net-snmp? Presently, we are using version 2
I do not know how to extract OID thru Solarwinds, i cant compare value with the server because we are not the one handling the equipment. But I tried using MIB walk toolset. Kindly see below OID's. No OID value appears to be the same as value seen in solarwinds. CPU Utilization of the server is 100%
What version of NPM are you using?
10.6.1
Not sure if this will help, but to troubleshoot, I installed the Undp (earlier in this thread), the NetSNMP (transforms) TEST value is correct (= 16.17) when I run a test on the device (during the UnDP editting/test screens). However, the Node Details (Universal Device Poller Status resource) it's not the same value (it's 100), nor is my report (it's 100 raw status & status), nor the 'Orion Universal Device Poller application's historical poll results (also 100). So, why is the 'Test' showing the correct value, but the stored & displayed values are different?
Ahh... I added the 'total' column to the report, and it contains the correct value. So, it probably has something to do with this being captured as a COUNTER (I tried other types but couldn't get any to work) so it's storing the differences. So, when the Transform computation runs, it's probably using the difference instead of the total column. Is there a way to force it to use the total? The formula is currently .. Truncate({100-{ssCpuRawIdle}/({ssCpuRawKernel}+{ssCpuRawIdle}+{ssCpuRawSystem}+{ssCpuRawUser}+{ssCpuRawWait}+{ssCpuRawNice})*100},2).
I also noticed the UnDP for the ssCpuRaw fields have unit = blank and Timeframe=None. Are these right?
Continued on CPU Load at 100% but is really 16% for set-snmp Linux server.