My Trapped! …but not in a bad way, and I still seem to have traps on the brain. So here’s another trap-related feature introduced in NPM 10. The problem we were trying to solve with this feature is how to mark an interface as down as quickly as possible.
By default, NPM checks the status of interfaces every 120 seconds. The way NPM determines interface status is different than How Does Orion Mark a Node as Down?. For interfaces, we check that status with an SNMP query of the appropriate MIB on the node. If the MIB says it’s down, we mark it down. If the MIB says it’s up, we mark it up. If the whole node is down or otherwise unavailable, then the query fails and we mark the interfaces on that node as unknown. Consequently, if an interface goes down, the delay to marking it down would be, on average, about a minute. Now, in most cases that is more than soon enough. Still, there are times when you want the interface marked down immediately. We introduced a relatively simple way to accomplish this goal.
The key scenario where two-minute polling isn’t good enough is when an interface is “flapping”; that is, it’s alternating rapidly between up and down. So it’s possible for it to go down and back up again between two polls. That’s where the new feature comes in: When an interface goes down, the node hosting that interface can send an SNMP trap indicating the change in status. In NPM 9.x (or even 8.x), you can create a trap rule that sends an email that tells you the interface is down, but the status of the interface will remain up (i.e., a happy green dot) until the change in status is detected on the next poll. For those times when you just can’t wait, we added a new alert action. So in addition to the email, you can now directly change the status of the interface to down.
Note that this will only work with nodes that send the standard MIB-2 LinkDown trap (188.8.131.52.184.108.40.206.2), which should be true of all MIB-2 compliant devices. And when you set up an alert, you’ll need to create a rule that matches on the string “linkdown” in the Trap Details tab (see the blog called “Trapped! …but not in a bad way…” for how to do this). Once you create the rule, simply add the new alert action. It’s the last one of the list. Go to the Alert Actions tab and click “Add Action”. Scroll to the bottom of the list and select “Change the status of an interface”.
Click “OK” and you’ll be asked to choose what status you want to set. Most of the time it’ll be “down”, but feel free to go crazy with it.
And that’s it. What happens in the bowels of Orion is that the alert action will update the status field in the database for the target interface. On the next regularly scheduled poll of that interface, the status will be updated. If the interface is truly down, then it will still be down when the SNMP poll checks. Of course, it’s possible that the interface just went down for an instant, triggered the trap, and then came back up. In that case, the status will go back to up. Regardless, you’ll have had a very quick notification of status change.