TrapService.log: ERROR Main - SNMPManager::InternalQuery() unable to send request

Question

System: I'm on a single, physical application server running NPM v 12.0.0, with other SolarWinds apps loaded on it.  The NPM database is on another server.

Question: Is there a configuration/setting somewhere that limits (or can expand) the amount of memory used by an NPM polling engine?  I believe I'm only needing the SNMP pollling engine, unless there's something else "memory related" or global you can configure on the system that's normally set to a definite number or percentage of available resources given to the polling engines

Info: I monitor the the Orion/NPM polling engines on my system so that if they have not updated the database in 5 minutes I get an alert.  These do not alert often and when they do it's usually indicative of an issue *somewhere* in the Orion server (unless we are patching the database server and the DB is not available).

I have recently started seeing several alerts (maybe 3-6 per day), but only for the SNMP polling engine.  Usually they clear in about 5-10 minutes.  I started looking around and it looks like when SNMP polling engine stops updating for a few minutes (I get my alert), I see (several of)  these in the TrapService.log:

2017-01-30 12:46:06,365 [967] ERROR Main - SNMPManager::InternalQuery() unable to send request

2017-01-30 12:46:07,021 [1883] ERROR Main - SendRequest Error: Exception of type 'System.OutOfMemoryException' was thrown.

I don't believe the process is running out of physical memory because: I'm on a physical server with 72 GB of memory and it normally uses about 12-17 GB of memory (it was a borrowed server...lots of room to grow 
), so unless it's suddenly grabbing a *lot* of memory and then releasing it, I'm not seeing spikes that large in Orion.  In addition, the only process on the server over 1 GB of memory is the "SolarWinds.Collector.Service.exe", at 1.2 GB.  I'm not sure, but if the SNMP poller handles SNMP traps as well we only process about 400-500 traps per hour and I don't see any floods of traps so large I could imagine it taking up 50+ GB of memory to process them.

Anyway, this makes me think there's a software level configuration that's limiting how much memory the SNMP trap polling engine will use, and I'm hitting some sort of soft limit.

Edit: I know there's differences between SNMP polling (actively pulling data in from devices) and traps (data being sent in from devices), just not sure how NPM builds it all out (if they're related or shared in any way in software).  I'm grasping at straws a bit trying to tie what I'm seeing together in some way as I'm seeing the "polling engine" have DB issues, and the trap log have memory issues around the same time, but maybe it's coincidence.

tigger2 · Answer

Update: Support did the standard "shut down Orion, uninstall + re-install services" to several services.  They also did some research and indicated that [I'm abbreviating what was said] there is a known crashing issue(s) with some component  (I forgot the specifics, but it wasn't specifically listed as the SNMP trap poller) and that it has been patched or the patch is in the next RC. The issue has crept up again so I'm going to wait out the next GA release/point release as this issue isn't hurting us too bad. After patching off of 12.0.0 I'll re-open if it doesn't stop...and keep you updated

tigger2 · Answer

Update: I opened a ticket on it.  I'm also having the SolarWinds Job Engine (64 bit) crash infrequently.