I have some remote sites and devices that are flaky and will come back up on their own, reboot independently on a schedule, etc.
My current monitoring app I use has the ability to only alert after x consecutive polling failures. That way if its a self correcting transient issue I dont get alerted for every little glitch in the system. For example, if a remote admin reboots a router, the system sees the router down on pass 1 and does nothing but log it. If it sees it is still down 90 seconds later during the next polling pass, it will then take action and page me. It still emails the individual alerts to a mailbox so we can see potential problems such as a bouncing circuit, but wont alert us at 2am just because a device rebooted during a polling cycle.
Is there a way to only alert (in our case send a text message) to the recipients if the outage is longer than x minutes, >x polling failures, >x SNMP Traps, etc?