We are doing a tiered alerting system and I would like some help with setting up, timing and testing it.
What this means to us is we identify the failure points and setup alerts in such a way that, for instance, the core switch is down, it will not alert on the nodes connected/dependent on the switch... same with a Wan Router in a satellite office... if it is down, we really don't need to know that it can't talk to the servers there.
The problem with this is that our setup has become quite complex. We have two core switches for instance, so my Alert suppressions all have something like this:
Suppress Alert if ANY are true
Suppress Alert if ALL are true
Suppress Alert if ALL are true
Suppress Alert if ANY are true: If Node Status = DOWN or UNKNOWN
Suppress Alert if Node Name = CORESWITCHA
Suppress Alert if ALL are true
Suppress Alert if ANY are true: If Node Status = DOWN or UNKNOWN
Suppress Alert if Node Name = CORESWITCHB
Suppress Alert if ALL are true
Suppress Alert if ANY are true: Node Status = DOWN or UNKNOWN
NODE Name= WANROUTER
if I am doing this right, that gets me to suppress the alert if BOTH core switches are down OR if the WANROUTER is down. With two core switches, I can't just assign a numeric code to the value TIER (say 1 for the switch and 2 for the router) and set up a alert that says if TIER =< 2, and SITE=DATACENTER and STATUS = DOWN or UNKNOWN, then SUPPRESS the alert because there are two switches. Is there a better way of doing this?
Also when you setup alerts and have it check every 2 minutes but only fire the alert if it is down for 5 minutes... does it check every two until the failure and then continously for the next five minutes or does it check every 2 minutes ALWAYS and only alert after the 3rd failure?
On testing alerts, I know we have a way to do so, but that just test fires the alert. Is there a way to simulate a failure of a node and see how the alerts would come out?
I've become THE Orion guy as we re-vamp our setup so I can use and appreciate any help I can get. Advice, concerns whatever.
Thanks!
Roger