Grouping applications and adding exceptions

Question

So maybe I'm doing something wrong, but this seems much harder then it should, but here is what I'm trying to accomplish:

I have a group of applications that are maintained by a specific department. They want to know when these applications have problems. My department wants to know basic node health like hardware health status, drive space, CPU usage, etc.

So I made alerts for them and an alert for us. The "Us" alert basically says "tell us about every problem the happens on every component during. Depending on the time of day, email this address"

The "them" alert has the following logic.

Trigger when any

node name = server1

node name = server2

node name = server3

node name = server5

trigger when any

application component is down

application component is critical

application component is warning

So this is all well and good, except server2,3 and 5 have scripts that shut the applications down during a backup window. It turns out that no one wants to be alerted if their machine is down during a maintenance window at 4am. Crazy, right?

So do I need to copy the existing alert, remove the server 1 and set the time of day in the new alert, then remove serve 2,3 and 5 from the old alert? This seems overly complicated to me. There has to be a better way to do this. Why is this so complicated?

mjbehrendt · Answer

This sounds along the lines of what I'm looking for. I didn't know this tool existed. Now to it's manual to figure out how it works...

mjbehrendt · Answer

The unmange scheduler tool sounds like what I'm looking for. The application shuts down for about 3 hours each night for backups.

I am really complaining about dividing it into two alerts. It introduces an extra level of complexity that isn't necessary in my opinion. In 3-6 months when some one wants to change the formatting of an alert email, add something to a group, change something around, etc. I have to remember to change both alerts.

aLTeReGo · Answer

If you don't want to be alerted during a maintenance window, overnight, or whenever then I recommend unmanaging the node. You can do this in a one-off fashion though the web interface by clicking the "unmanage" button and entering when you want the node or application unmanaged and remanaged. You can also do this for multiple nodes and or applications on a reoccuring basis automatically using the "Unmanage Scheduling Utility" on the Orion server. It can be found under [Start -> SolarWinds Orion -> Advanced Features]

Node and applications that are unmanaged will not be alerted upon, so if your trigger condition works as-is, with the notable exception being the maintenance windows, then using the Unmanage Scheduling Utility seems like just what you're looking for.