Hello all. I've been tasked by my manager to create alerts based on certain conditions for our network & devices. However, his request on how the alerts should notify our staff is somewhat different than how I've seen alerts configured in the past.
First, there are two groups involved. They are the IT Group and the Technicians Group. He wants alerts to notify a certain group when a condition arises... so let's get to it.
Let's say an office has router X that supplies network connectivity for servers A, B, & C. Router X is the responsibility of the IT Group. Servers A, B, & C are card for the Technicians Group. My manager's request is that when servers A, B, or C go down, the alert needs to check and see if router X is up. If router X is up, then send an e-mail notification to the Technicians Group. In the event that router X is down when those servers are down, then the notification should go to the IT Group instead. The reasoning for this is because he doesn't want a notification for each device that is offline. If the router goes down, then he already knows that everything behind it is down.
Now I've spent hours on this and I don't think there is a way to do all that within a single alert... especially a condition to notify specific groups. So I assume that you need to do it with 2 or more alerts. I initially tried using an Advanced Alert for the servers, but found you couldn't specify several NodeNames within the alert, else the alert won't trigger. So I added a custom property and placed all the respective servers within that group and that worked. However when I added a suppression that checks to see if router X is up/down, it didn't work. This is how my Advanced Alert looks like:
Trigger Condition (Node) - Trigger Alert when all of the following apply:
Node Status is equal to Down
Comments is equal to Group1
Reset Condition - Reset Alert when all of the following apply:
Node Status is equal to Up
Alert Supression - Suppress Alert when all of the following apply:
Node Status is equal to Up
Node Name is equal to RouterX
I'm finding that in my tests, when the Alert Suppression as written above, the alert does not function anymore. I'm not sure how else to configure the suppression. If it did work, then I can just configure two rules and be done with it.
So then I tried Basic Alerts, where you can select the condition of Node Status Up/Down & then you select your nodes (servers A,B,C). I added the suppression of "Supress Alerts when Status of Router X is Up". This functions fine where it won't send a notification to IT but it then needs to send an e-mail to the Technicians Group... but because of the Suppression, it never sends a notification alert at all. Then, I can't figure of a way to monitor the status of servers A, B, C & send a notification if Router X is down as well & then send it to the IT Group.
I'm thinking that I'm going to have to configure several alerts for each device/device group just to accomplish what he wants... but that is a huge hassle because we have at least 100 or so devices we're monitoring in Solarwinds just for my office. Then he wants me to setup alerts for the other 200+ more devices that belong to another IT department in another state. So that means at least 1000 or so alerts will need to be created for his conditions.
Any suggestions?
Does anyone have any suggestions?