Since so much of NPM relies on SNMP working, I started wondering how to monitor SNMP itself.
At this time I am only concerned with unix systems running some variant of net-snmp.
What have other users done to ensure SNMP is up and running on their monitored nodes?
This is a common problem with most NMS' -- they do not allow for the fact that monitoring SNMP is a requirement for determining monitored system health. The thought seems to be that an ICMP ping is sufficient.
The answer is, there really is no good answer. I've setup simple monitors to test that something is responding on port 161 (poor at best) but after a little thought one will realize that this is all ICMP does -- insure that *something* is responding ot echo requests. Ping's have little to do with system health.
I personally like the idea of a two pronged approach -- having the port test and polling sysuptime via APM.
I usually have a couple of alerts.
In SW, monitor the 'Unknown' attribute over a specific period of time.
When SNMP polling is not working you'll get a value of '-2' in CPULoad, TotalMemory, MemoryUsed, PercentMemoryUsed.
(Same thing for Volumes / Interfaces - but I usually do a separate alert for volumes / Interfaces because it could be an index change or other legitimate causes like a cluster failover.)
You can also run a report on a daily basis. I usually run one to find the 'NULL' status of Custom Pollers. ( I have a lot of devices - F5's, NetScreens, BlueCoats, etc. that are monitored exclusively by UNDP's.)
It's not perfect, but it does work.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.