We are trying to perform some automated fixes for some linux services and are trying to use alert escalations to be able to escalate an issue to someone (via email) when those automated fixes fail.
Currently I have an action to execute a powershell script which connects to the node and restarts the linux service if its down.
What i want to do is if the service is still down after attempted restart then escalate to human via email.
Basic procedure would be:
- Alert fires for component down
- Automated restart/fix deployed
- If still down after 10 minutes email human
Is there any way we can achieve this via alert escalation? I know escalations are based on acknowledgement but can we change this behavior? I am thinking of modifying the script to wait then auto acknowledge if successful.