Monitoring Clusters and Alerts

Hey Everyone,


I'm looking for some feedback on how people have tackled monitoring HA Clusters for example Exchange.  For clarity and terminology that I use, ill briefly explain.  Exchange provides a service, Email for example.  If I have two Exchange servers for HA, for example, EXCH01 and EXCH02, we can afford to lose EXCH01 without the service of Email being affected because it will use EXCH02.

What I am trying to achieve is a Serious Alert if EXCH01 or EXCH02 goes down separately, but a Critical Alert, if both EXCH01 and EXCH02 go down at the same time because it's at this point the service of Email to the business will be affected.  Do I simply monitor the VIP as Critical and EXCH01 and EXCH02 as serious?  Is there a clever way to achieve this with custom attributes and dynamic queries as I have multiple clusters?

Do people do this and if so how?  Is there a best practice to follow for this type of setup?  We have multiple clusters for multiple services such as Exchange, SQL so I was hoping to achieve the above with the minimal amount of alerts being created.

Any feedback would be appreciated.  

Thank you

For starters I would monitor the two nodes (if the cluster has two nodes, all nodes if more... 🙂 ). On the nodes I monitor resources that "don't move" like local disks etc.

Then I monitor the exchange cluster VIP for services and resources that move from node to node.


Whether to have critical alerts on all three parts, 2 nodes and 1 VIP, or serious vs critical alert is up to your organization and needs. Some would say loss of one node is critical as we really don't want downtime on that system. 

In your case I would probably say serious on the 2 nodes and critical on the VIP.

