cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 7

Orion alerts: Setup, testing and timing

We are doing a tiered alerting system and I would like some help with setting up, timing and testing it.

What this means to us is we identify the failure points and setup alerts in such a way that, for instance, the core switch is down, it will not alert on the nodes connected/dependent on the switch... same with a Wan Router in a satellite office... if it is down, we really don't need to know that it can't talk to the servers there.

The problem with this is that our setup has become quite complex. We have two core switches for instance, so my Alert suppressions all have something like this:

Suppress Alert if ANY are true

    Suppress Alert if ALL are true

       Suppress Alert if ALL are true

            Suppress Alert if ANY are true: If Node Status = DOWN or UNKNOWN

            Suppress Alert if Node Name = CORESWITCHA

        Suppress Alert if ALL are true

            Suppress Alert if ANY are true: If Node Status = DOWN or UNKNOWN

            Suppress Alert if Node Name = CORESWITCHB

   Suppress Alert if ALL are true

         Suppress Alert if ANY are true: Node Status = DOWN or UNKNOWN

         NODE Name= WANROUTER

 

if I am doing this right, that gets me to suppress the alert if BOTH core switches are down OR if the WANROUTER is down. With two core switches, I can't just assign a numeric code to the value TIER (say 1 for the switch and 2 for the router) and set up a alert that says if TIER =< 2, and SITE=DATACENTER and STATUS = DOWN or UNKNOWN, then SUPPRESS the alert because there are two switches. Is there a better way of doing this?

Also when you setup alerts and have it check every 2 minutes but only fire the alert if it is down for 5 minutes... does it check every two until the failure and then continously for the next five minutes or does it check every 2 minutes ALWAYS and only alert after the 3rd failure?

On testing alerts, I know we have a way to do so, but that just test fires the alert. Is there a way to simulate a failure of a node and see how the alerts would come out?

I've become THE Orion guy as we re-vamp our setup so I can use and appreciate any help I can get. Advice, concerns whatever.

Thanks!

Roger 

 

 



 

0 Kudos
4 Replies
Level 13


That is never going to work.


 


 


JB

0 Kudos
Level 15

if I am doing this right, that gets me to suppress the alert if BOTH core switches are down OR if the WANROUTER is down. With two core switches, I can't just assign a numeric code to the value TIER (say 1 for the switch and 2 for the router) and set up a alert that says if TIER =< 2, and SITE=DATACENTER and STATUS = DOWN or UNKNOWN, then SUPPRESS the alert because there are two switches. Is there a better way of doing this?

Oh boy. Yeah, this has been a long requested feature for Orion to have some level of inteliigence regarding the network infrastructure. SolarWinds seems to indicate that it will be a feature in a future revision of Orion. Unknown as to whether it will available this year or next.

Also when you setup alerts and have it check every 2 minutes but only fire the alert if it is down for 5 minutes... does it check every two until the failure and then continously for the next five minutes or does it check every 2 minutes ALWAYS and only alert after the 3rd failure?

As I understand it, it Orion will check the status again (not continuously) after five minutes. If it's still down at that point, then it will fire.

On testing alerts, I know we have a way to do so, but that just test fires the alert. Is there a way to simulate a failure of a node and see how the alerts would come out?

None that I know of unless you actively deny ICMP packets to that node as part of the simulation.
0 Kudos
Level 8

I would suggest contacting support and requesting an additional license for a QA environment. I test everything by mocking it up in my lab. In a lab you can create failures by modifying your ip address.

0 Kudos

 I would if I could, but we do not have the budget for it.

0 Kudos