Understanding SNMP Polling and Counters
Two of the things that I get asked about a lot are a) how often you should poll your devices and b) how SNMP counters work. Let's address the question about polling frequency first.
There are two primary types of polling - polling for status (up/down/warning) and polling for statistics (latency, traffic, errors, CPU, memory, etc). There are many schools of thought on how often you need to poll for status. On the extreme I've worked wiht customers that want everything from "once per day" to "real-time or every second". Across the industry 5 minute polling for status is pretty standard, while some products like Orion use a default of 2 minutes. When thinking about how often you need to poll for status there are few things you should know. First, a strong fault management strategy will include both status polling and monitoring for SNMP traps. Status polling gives you a guaranteed way of finding outages but doesn't happen in real-time. Traps happen in real-time but you have no guarantee of receiving the trap. By leveraging both technologies together you gain both speed and accuracy. Second, you need to understand that there are different types of status polling. Most polling for device status is done via ICMP (ping). Status for interfaces, CPUs, volumes, and other sub-elements are ususally done via SNMP and status checking for applications may use SNMP, WMI, or the actual application protocol.
Polling for statistics is a little different. The first thing that you should know is that most of this type of data is stored within two MIB types - gauge and counter. If you're polling a stat this is stored within a gauge, then you need to know what the gauge represents. For instance, most Cisco devices support three different MIBs for CPU load - real-time, 1 minute avg, and 5 minute avg. So, if you're polling every 2 minutes stick to the 1 minute average but if you're polling every 9 minutes go with the 5 minute average. Mature products like Orion figure this stuff out for you, but you can also manually force the behavior so that you can experience he different results. The other common type of MIB used for polling statistics is a counter. Using an SNMP counter to calculate a rate is much like using the odometer in your car to calculate gas mileage. For instance, if your tank is full and you drive 100 miles and then can add 10 more gallons to the tank to get it back to full then you've gotten 10 miles to the gallon (not uncommon in a diesel guzzling 4x4 behemoth like I drive). An example using SNMP counters would be the MIB used to measure traffic on a network interface. The MIB will show you a running total number of the octets (bytes) of traffic that have went in/out of the interface. So, if you poll the MIB and find out that it's sent 1000 octets and then poll it again in 10 minutes to find that it's now sent 10,000 octets you know that in 10 minutes it sent 9,000 octets which correlates to 120 bps. ((9000*8)/(10*60))=120. Generally speaking most people poll these status every 15 minutes, but poll critical connections every 1-2 minutes. Orion pulls this data every 9 minutes by default. Just like how it's important to use status polling along with SNMP traps, it's important to use a technology like NetFlow along with polling for traffic rates.
Anyhow, that's a really brief introduction to this subject. Ping me if you have questions.
Flame on...
Josh