cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

NPM displaying negative memory after OS patches

We recently patched our Linux servers and afterwards Solarwinds NPM reports memory on only the RedHat 7 servers as negative numbers. I tried upgrading to the latest NPM/SAM(12.5/6.9) version, but this did not help. I'm reaching out to OS support as well, but also wanted to get suggestions from the SW community. Has anyone seen this before? It only appears to affect RHEL 7 memory, all CPU/response time/disk/etc appear to be accurate.

0 Kudos
12 Replies
Level 8

Hello everyone, 
    I know I am a bit late to the party, but I was just finding some of the exact same issues that you all are experiencing. I noticed that there was no real answer to the issue posted so here we go. After looking into the problem it seems that snmp is targeting the wrong OID. the default that it looks for is N.Memory.SNMP.NetSnmpReal. while this still does kick back information it is incorrect in some RHEL 7.x verions. I think the OID that is tied to that got changed but I can not find it in the patch notes. What I did find though is This bit of information from redhat.com. 

OID Description

HOST-RESOURCES-MIB::hrSystemContains general system information such as uptime, number of users, and number of running processes.
HOST-RESOURCES-MIB::hrStorageContains data on memory and file system usage.
HOST-RESOURCES-MIB::hrDevicesContains a listing of all processors, network devices, and file systems.
HOST-RESOURCES-MIB::hrSWRunContains a listing of all running processes.
HOST-RESOURCES-MIB::hrSWRunPerfContains memory and CPU statistics on the process table from HOST-RESOURCES-MIB::hrSWRun.
HOST-RESOURCES-MIB::hrSWInstalled

Contains a listing of the RPM database.

 

using HOST-RESOURCES-MIB::hrStorage returns accurate information on memory related resources. Using the Poller Checker tool you can confirm what I am looking at when it comes to RHEL 7.7 (results may vary with different versions and distros). With that same tool you can modify which OID you are targeting. 

For me this resolved the issue on the very next polling cycle. Please let me know if it works for you as well. 

Links:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_...

https://support.solarwinds.com/SuccessCenter/s/article/Use-the-Poller-Checker-tool

Level 11

Bump! 🙂

Issue also occur on our environment after updating Linux Servers.
Memory information is equal to 0% instead of the correct value.

I followed @scarelette workaround and use hrStorage via Poller Checker Tool and we got a result (even it is not as exact but near its value - 866mb on Linux / 831mb on SW)

We upgraded from CENTOS 7 version 1004 to CENTOS 7 version 1005.
I hope we can have an updated OIDs for the newer version of linux on upcoming SW versions.

Thanks @scarelette for sharing!

0 Kudos
Level 9

I have the exact same issue after patching a whole heap of CentOS servers

0 Kudos
Level 13

We have also opened a case on this issue and heard back today that it is a known issue and development is working on a fix.

Level 16

any tentative date given by support?

0 Kudos
Level 13

No, they haven't given a timeline.. I expect the more cases that are opened on it the more priority it will be given.

Level 11

Did you open a case on this or find a solution?  I just noticed I'm having the same set of issues on a new group of servers that were recently spun up and added into NPM/SAM.  These boxes are being monitored via SNMPv2.

0 Kudos
Level 11

Bump - Anyone else run into this issue? I'm following the below documentation for how Net-SNMP calculates things but I'm definitely seeing negative numbers on these new boxes with 192GB Memory installed.  Trying to determine if maybe this is some sort of integer32 limitation with net-snmp

Success Center

0 Kudos
Level 13

It seems the agent is the way to go instead of SNMP lately for these kinds of anomalies.

Level 14

I haven't looked in about a year, but the information should be coming from /proc/meminfo does that return the expected statistics or are those numbers negative/off some way?

The only calculation that should be done (again, assuming nothing's fundamentally changed in a year) is FREE + BUFFERS + CACHED as unused memory.

You can review the scripts directly if you were so inclined (as I was) at:

/opt/SolarWinds/Agent/bin/Plugins/Core/

I may have had to help our Linux Admins understand some memory related polling once.

If the numbers are correct then it's likely something somehow with the agent/Orion. You can try unmanaging/remanaging to start as that's fixed some weirdness I've seen before.

0 Kudos
Level 9

The /proc/meminfo shows accurate numbers, nothing negative. I don't have a /opt because the agent is not installed, we are monitoring using SNMPv3. Should the agent be used instead? Also, unmanaged/managed didn't seem to work either.

Another strange thing to note, if i do real time polling with Perf Analysis, the number changes from negative to around 100MB, which is also inaccurate. When i stop Perf Analysis, it goes back to a negative number.

0 Kudos
Level 14

Most of my information was assuming the agent. I'd give it a try on one system and see if that helps clear things up for you, but if you're using SNMP I'm sure there's a place you could use to confirm the information locally on the  server, I just don't know offhand.

I'd start with an SNMP walk and make sure of what you're seeing. For example, I'd expect it to be the same negative number information that Orion is reporting. If it is, then there's an issue server side or with SNMP. If you're seeing accurate information then there's some weird issue with Orion that you'd need to take up with Support.

0 Kudos