What I want to do is:
- Create groups (e.g. Database Administration) and add nodes as applicable
- Attach application monitors as applicable to the nodes in the group (e.g SQL / DB2)
- Trigger an alert and send an email to the appropriate team (in correlation with the group) to notify them when any monitored component or application goes critical or down, including node down. The goal here is to have one all-encompassing alert defined for each group and call it a day.
I'd like to use this approach to streamline administration, using template overrides where necessary on nodes. It seems I can accomplish the above task with the ${GroupStatusRootCause} variable. However, when a node in this test group (Database Administration) has an application monitor in a state of down (i.e service down) or critical (e.g. cpu>90%), the group still shows a status of "up" (green.) This in turn renders the "Group Status is equal to Critical" or "Group Status is equal to Down" trigger useless. I'm assuming I have a misunderstanding of GroupStatus or have a setting incorrect?
In addition, I'm pretty new to this and my alert creation skills are lacking. Shouldn't something like the attached be sufficient?
Thanks in advance!
Orion Core 2012.1.0, SAM 5.0.1, IPAM 3.0, NPM 10.3