I have been working with the Windows Failover Cluster Monitoring templates for Windows 2008 and I would like to get some input on what the best way is to setup the actual monitoring alerts from this template?
I would think that we would want to page out for a couple of different things depending on which cluster we were talking about.
On all of them we would want to be alerted on an actual fail over from one node to the other -- Which component type would you monitor? How would you structure the actual alert?? Which variables do you suggest using so we can identify it properly in the Email that is sends out??
It would be nice if we could somehow construct an alert that would send specific information on if any of a number of things failed on one of the nodes that is causing the issue - i.e. Failed to join the cluster, storage functionality issues, witness problems etc...
I just would not want to construction one alert for each unless it is necessary, but I think if the alert and the email were constructed properly, you could do it with one alert.
I guess more than anything I looking for some input on the best way and accepted best practices for monitoring this and I would welcome input.
The template we designed to be applied to all nodes in the cluster, not the VIP so you should alert on any of the Windows Event Log Monitors in the template. You can then include the node name variable as part of the alert so you know which member of the cluster is having issues. In SAM 5.2 we've greatly enhanced the Windows Event Log Monitor to include the full message details of the matching event. This should provide even greater clarity to your alerts as to what the problem is, and with which node.
I need to better understand how to use this template. When applied to both nodes in the cluster it just seems to go off over and over agian and sometimes it isn't a real failover event.
What else can you tell me about using this template?
What monitors within the template are not really needed?
What should I be alerting on to indicate that the node has failed over?
I sure would like some clarity and suggestions around the configuration and setup of the actual alerts. What should the look like and hope are the configured?
The performance metrics aren't required if you're simply interested in being alerted to failover events. The Windows Event Log monitors are the primary method for determining when a failover occurs.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.