We recently patched our Linux servers and afterwards Solarwinds NPM reports memory on only the RedHat 7 servers as negative numbers. I tried upgrading to the latest NPM/SAM(12.5/6.9) version, but this did not help. I'm reaching out to OS support as well, but also wanted to get suggestions from the SW community. Has anyone seen this before? It only appears to affect RHEL 7 memory, all CPU/response time/disk/etc appear to be accurate.
I know I am a bit late to the party, but I was just finding some of the exact same issues that you all are experiencing. I noticed that there was no real answer to the issue posted so here we go. After looking into the problem it seems that snmp is targeting the wrong OID. the default that it looks for is N.Memory.SNMP.NetSnmpReal. while this still does kick back information it is incorrect in some RHEL 7.x verions. I think the OID that is tied to that got changed but I can not find it in the patch notes. What I did find though is This bit of information from redhat.com.
|HOST-RESOURCES-MIB::hrSystem||Contains general system information such as uptime, number of users, and number of running processes.|
|HOST-RESOURCES-MIB::hrStorage||Contains data on memory and file system usage.|
|HOST-RESOURCES-MIB::hrDevices||Contains a listing of all processors, network devices, and file systems.|
|HOST-RESOURCES-MIB::hrSWRun||Contains a listing of all running processes.|
|HOST-RESOURCES-MIB::hrSWRunPerf||Contains memory and CPU statistics on the process table from HOST-RESOURCES-MIB::hrSWRun.|
Contains a listing of the RPM database.
using HOST-RESOURCES-MIB::hrStorage returns accurate information on memory related resources. Using the Poller Checker tool you can confirm what I am looking at when it comes to RHEL 7.7 (results may vary with different versions and distros). With that same tool you can modify which OID you are targeting.
For me this resolved the issue on the very next polling cycle. Please let me know if it works for you as well.
Issue also occur on our environment after updating Linux Servers.
Memory information is equal to 0% instead of the correct value.
I followed @scarelette workaround and use hrStorage via Poller Checker Tool and we got a result (even it is not as exact but near its value - 866mb on Linux / 831mb on SW)
We upgraded from CENTOS 7 version 1004 to CENTOS 7 version 1005.
I hope we can have an updated OIDs for the newer version of linux on upcoming SW versions.
Thanks @scarelette for sharing!
Did you open a case on this or find a solution? I just noticed I'm having the same set of issues on a new group of servers that were recently spun up and added into NPM/SAM. These boxes are being monitored via SNMPv2.
Bump - Anyone else run into this issue? I'm following the below documentation for how Net-SNMP calculates things but I'm definitely seeing negative numbers on these new boxes with 192GB Memory installed. Trying to determine if maybe this is some sort of integer32 limitation with net-snmp
I haven't looked in about a year, but the information should be coming from /proc/meminfo does that return the expected statistics or are those numbers negative/off some way?
The only calculation that should be done (again, assuming nothing's fundamentally changed in a year) is FREE + BUFFERS + CACHED as unused memory.
You can review the scripts directly if you were so inclined (as I was) at:
I may have had to help our Linux Admins understand some memory related polling once.
If the numbers are correct then it's likely something somehow with the agent/Orion. You can try unmanaging/remanaging to start as that's fixed some weirdness I've seen before.
The /proc/meminfo shows accurate numbers, nothing negative. I don't have a /opt because the agent is not installed, we are monitoring using SNMPv3. Should the agent be used instead? Also, unmanaged/managed didn't seem to work either.
Another strange thing to note, if i do real time polling with Perf Analysis, the number changes from negative to around 100MB, which is also inaccurate. When i stop Perf Analysis, it goes back to a negative number.
Most of my information was assuming the agent. I'd give it a try on one system and see if that helps clear things up for you, but if you're using SNMP I'm sure there's a place you could use to confirm the information locally on the server, I just don't know offhand.
I'd start with an SNMP walk and make sure of what you're seeing. For example, I'd expect it to be the same negative number information that Orion is reporting. If it is, then there's an issue server side or with SNMP. If you're seeing accurate information then there's some weird issue with Orion that you'd need to take up with Support.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.