I'm having some weird alerting issues, and I'm wondering if anyone has seen these, or might have some ideas as to what could cause these hiccups with alert triggers.
Both of these misses happen a couple of times a month.
The first miss is:
Alert triggers fire normally most times, but occasionally don't fire at all. This is most noticeable when it causes missed pages, or doesn't send an acknowledgement or reset to our PagerDuty api, which results in alerts being escalated unintentionally.
The other miss is:
Let's say we have an alert that has time-restricted actions. On occasion, these alerts will trigger after hours, and run through all of the escalation triggers simultaneously. This sends up to 7 pages to on-call people in the middle of the night for alerts that aren't supposed to send any notifications during non-business hours.
Any thoughts as to what would cause these types of misfires with the alert service?