I have a set of VPN appliances that when their primary Internet connections (DSL, T1, whatever) fails, it falls to a cellular EVDO backup solution. I've been trying to tweak Orion Adv. Alerts to inform me when a device has failed over and is using EVDO (another thread). I've had some success with this logic:
Trigger:
InterfaceStatus = Down
NodeStatus = Up
Reset:
InterfaceStatus = Up
NodeStatus = Up
"Do not reset this action until condition exists for more than 4 minutes".
My problem is I get some false reset alerts because the "more than 4 minutes" isn't a continual check, it is a single check, 4 minutes later. When my devices fail over to using EVDO, their main eth1 interface cycles through being UP, Admin disabled, DOWN, Admin enabled...then repeats. It does this as the device is checking if the primary Internet connection has "returned". With the "Do not reset this action until...4 minutes", ever once in a while the advanced alert does a 4 minute check and it happens to land on a cycle where two UPs have been seen. Below is an event log of one of these devices so you can how the interface continually cycles when it is EVDO failed over.
11/16/2007 08:21 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Up
11/16/2007 08:21 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Enabled
11/16/2007 08:20 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Down
11/16/2007 08:19 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Shutdown
11/16/2007 08:17 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Up
11/16/2007 08:17 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Enabled
11/16/2007 08:16 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Down
11/16/2007 08:16 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Shutdown
11/16/2007 08:13 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Up
11/16/2007 08:13 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Enabled
11/16/2007 08:12 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Down
11/16/2007 08:12 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Shutdown
11/16/2007 08:10 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Up
11/16/2007 08:10 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Enabled
11/16/2007 08:08 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Down
11/16/2007 08:08 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Shutdown
11/16/2007 08:06 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Up
11/16/2007 08:06 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Enabled
11/16/2007 08:05 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Down
11/16/2007 08:05 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Shutdown
11/16/2007 08:03 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Up
11/16/2007 08:03 AM CORP : NetServices Lab (ANIRA)-eth1-Outside (12.135.205.229) Administratively Enabled
What I think would solve my problem is if I could define RESET logic that considered events OVER time. i.e. Reset only if Interface has NOT been Administratively Shutdown at least (2) times in (5) minutes, etc. I think I also wouldn't have a problem if the Advanced Alerts actually 'continually' checked during that 4 minute interval. Or, checked at the alert evaluation frequency (15 seconds on this alert), for the whole 4 minutes. Ideally, I'd like Orion to see that the Interface hasn't actually BEEN Up for 4 minutes. It is just cycling through and MIGHT be up at the "zero minute" and might have cycled to be up AGAIN at the 4th minute.
Thoughts on how I can overcome my cycling/resetting issue?