So maybe I'm doing something wrong, but this seems much harder then it should, but here is what I'm trying to accomplish:
I have a group of applications that are maintained by a specific department. They want to know when these applications have problems. My department wants to know basic node health like hardware health status, drive space, CPU usage, etc.
So I made alerts for them and an alert for us. The "Us" alert basically says "tell us about every problem the happens on every component during. Depending on the time of day, email this address"
The "them" alert has the following logic.
Trigger when any
node name = server1
node name = server2
node name = server3
node name = server5
trigger when any
application component is down
application component is critical
application component is warning
So this is all well and good, except server2,3 and 5 have scripts that shut the applications down during a backup window. It turns out that no one wants to be alerted if their machine is down during a maintenance window at 4am. Crazy, right?
So do I need to copy the existing alert, remove the server 1 and set the time of day in the new alert, then remove serve 2,3 and 5 from the old alert? This seems overly complicated to me. There has to be a better way to do this. Why is this so complicated?