cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 12

How do people monitor snmp availability ?

Since so much of NPM relies on SNMP working, I started wondering how to monitor SNMP itself.

At this time I am only concerned with unix systems running some variant of net-snmp.

 

Option 1:

  • Build a custom poller to retrieve sysUpTime from the agent.

Problem:

  • TimeTicks are an increasing number (except when a restart occurs). I can find no way to construct an alert on the non-existance of data from sysUpTime.

Option 2:

  • Build an APM template to retrieve sysUpTime.

Problem:

  • Works better but doesn't tell me if the host resource mib is responding
  • Takes more resources than a Custom Poller.

Option 3:

  • Build an APM template to test for the existance of the snmpd process.

Problem:

  • Sun systems have a different process name
  • Takes more resources than a Custom Poller.

 

 

What have other users done to ensure SNMP is up and running on their monitored nodes?

0 Kudos
2 Replies
Level 7

This is a common problem with most NMS' -- they do not allow for the fact that monitoring SNMP is a requirement for determining monitored system health.  The thought seems to be that an ICMP ping is sufficient.

 

The answer is, there really is no good answer.  I've setup simple monitors to test that something is responding on port 161 (poor at best) but after a little thought one will realize that this is all ICMP does -- insure that *something* is responding ot echo requests.  Ping's have little to do with system health.

I personally like the idea of a two pronged approach -- having the port test and polling sysuptime via APM.

0 Kudos

I usually have a couple of alerts.

In SW, monitor the 'Unknown' attribute over a specific period of time.

When SNMP polling is not working you'll get a value of '-2' in CPULoad, TotalMemory, MemoryUsed, PercentMemoryUsed.

(Same thing for Volumes / Interfaces - but I usually do a separate alert for volumes / Interfaces because it could be an index change or other legitimate causes like a cluster failover.)

 

You can also run a report on a daily basis.  I usually run one to find the 'NULL' status of Custom Pollers.  ( I have a lot of devices - F5's, NetScreens, BlueCoats, etc. that are monitored exclusively by UNDP's.)

 

It's not perfect, but it does work.

 

-v

0 Kudos