I am looking for anyone who has experience in setting up a custom alert for groups.
We created the group and use a custom attribute in the trigger condition.
The members of this group make up all of the APM, WPM, Windows Servers and SQL Servers that make up a critical application.
The logic we need is as follows:
If Server X is down - Group is Down
If both Server Y and Z are down - Group is Down
If APM component A and WPM A are both down for X minutes - Group is Down.
Our goal is to get a customized trigger condition that reflects the nature of our application infrastructure, fail-overs, and redundancies..
So that the Group is Down alert is triggered when the application is actually down and not just a single component that may not be 'down' but might be 'critical'
Thank you for any info or examples.
The only part that you can't really do in Orion is the X minutes part, groups automatically build their status based on the status of all their direct members, you can wait 10 minutes to trigger your alert logic, but the group itself will change pretty much right away once a child status changes.
So the tree could possibly look something like this
MyApplication - Parent group folder, set to worst status so if any of these are down, mark the whole thing down
MyApplicationRedundantServerGroup - child group with status set to best (group stays green as long as any child is up)
MyApplicationAGroup - child group with status set to best (group stays green as long as any child is up)
Application A (I can't remember off the top of my head if you can do components directly. If you can then just use the component, if not make a template containing only the one component)
Thank you, I'll work on this and see if we are able to obtain the results and behavior we are looking for.
I will let you know how it goes and mark the answer accordingly.
A major point I see people messing up is that groups only inherit up/down status, so something going critical will not impact the group statuses at all. Just to keep in mind as you build out your monitors.
Yes, I agree and that won't be an issue for this situation. We already have good monitoring on all of those individual components. Our goal was to use a Group is Down / Group is Up alert that would trigger a slack message to let people know that an entire critical application is down. Anything else regarding the details and components that may be in critical or warning are handled through my team. So this is actually purely for automated messaging that the application is either up or down.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.