1 of 1 people found this helpful
As of now, without some complicated SQL/alerting hackery, you're stuck with building two alerts per spoke. One alert is for the router that provides access to the site and the second alert is for the switches. You should configure the router alert to go off all the time, but configure the switch alert with a suppression that follows the logic of "If router X status is Down, don't alert".
That said, I've never seen a router that's dropping pings go into "Unknown" status - it always goes into "Warning", then "Down". About the only time I've seen things go into "Unknown" is when an interface is not responding to SNMP (or the node itself is no longer responding to SNMP). There are other reasons for "Unknown", but that's the majority of what I come up against.
Thank you for the quick response!
As Elizabeth is also monitoring this...is functionality being built to address this suppression need? Seeing/using other similar tools (came from the OpenView world), I understand how difficult this is to develop; but I also understand the need for this. If there is anything I can do to help develop such functionality (beta test, discussions on what I would like to see, etc.), I'm all in... :)
As for the "unknown," the router is not available at all (ICMP or SNMP) because of a data T1 failure. So, the serial interface is down, and the router becomes unavailable, so the router shows down...and on the second poll after the outage, the router becomes unknown. Then, subsequantly, the switches follow the same behavior.
Am I not monitoring devices correctly? As stated above, I'm not new to the monitoring game, but that doesn't mean I'm doing it correctly. I just need NPM to provide value to our team. It is doing so, but not per management expectations.