We want to maintain the granularity of our synthetic uptime monitors, but have more control over the state of the alert itself once triggered so that we're not spammed with notifications if that endpoint starts to flap. Where there are checks to determine whether an alert is active, we would also like checks to determine whether an alert should be considered resolved.
Ex. "Alert is considered resolved after X successful checks in Y [seconds/minutes]."
Scenario
An Uptime alert monitors an HTTPS endpoint with:
- Check interval: 1 minute
- Consider down after: 30 second timeout
- When down, alert after: 5 minutes
- Resend alert: Never
- Alert when back up: Yes
The endpoint goes up/down every 2-3 minutes, which produces a notification each time. We do not want to adjust our check, down, or alert interval because that would delay the initial alert and notification. Instead, we want to define an additional parameter:
- Consider resolved after: 10 successful checks in 15 minutes.
Were the endpoint to flap, it would still be considered down until it was stable for a pre-determined amount of time.