Hello,
I have six applications on six nodes monitoring some windows services. If somehow SolarWinds did not restart a service that was down. I want Solarwinds to send an email/page. Thanks for your help.
You can use alert escalation to trigger an email under trigger actions if the alert condition remains true for longer than "X" minutes.
This is the tricky part how frequently does this alert run (Evaluation Frequency of Alert , this would be under properties tab) ? If its every 1 min then we have a problem, set evaluation freq to 10 mins, like say every 10 mins the alert validates the trigger condition and under trigger actions as you have mentioned you are restarting the service if service is down, now add alert escalation and set the wait period for 5 mins.
Order:
1. Alert for service up/down is validated once in X mins, this is Evaluation Frequency of Alert that you set (for example 10 mins or 6 mins)
2. Trigger Action - You will try to restart the service
3. Add alert escalation to check the status of service again after X wait minutes, if service is still down fire an email (for example 5 mins or 3 mins - half of what is defined under Evaluation Frequency of Alert)
I am assuming you are restarting the service through the defined alert in solarwinds, if so check the below post and thats what you are looking for
Restarting Services via An Alert (I have Questions)
You will have to make minor modification to the alert mentioned on the post, instead of alert and then restart service with a wait reverse the order, when your defined alert detects a service down try to restart the service first in your alert ,then wait for 5 mins and then check if service is up/down and if service is still down then fire an email to yourself staring service restart failed.
Correct Vinay, I am restarting the service through the defined alert in Solarwinds. So, if the followng Trigger APM\APMServiceControl.exe ${N=SwisEntity;M=ComponentAlert.ComponentID} tries to restart the service and service stills down. Where do i set the 5 minutes to check again if service is up/down ? Thanks for your help,