With the key application I support, our production environment is spread across a Citrix Farm of 24 servers connected to an AppServer Farm of 6 Servers all with load balanced Web and App Services. So the question is when is my application down? If a Citrix server is off line? If a Web or App Service is down on one AppServer? We have to assess the criticality of our application status.
We have determined it 1 service is down, it does not affect the availability of the application or even the experience of our user. Truth be told, the application can support all users even if only one AppServer is running the Document Service for example. Of course, in that scenario we have not redundancy and no safety net.
So I created a script that allows us to look at a particular service and based on the number of instances running, we determine a criticality.
Within this script you can identify a list of servers to poll, a minimum critical value and return either up, warning or critical for the application component based on the number of instances.