Gurus,
I have configured the Node Status Alerts to be triggered to notify if node is not up. I went ahead and added a wait time of 2 minutes to kind of suppress any false alerts for Warning/Down states due to High WAN utilization and other factors. All was good...until this weekend when we had a node that got corrupt and kept bouncing.
The initial notifications alerted us of the bouncing. Is there a way to Suppress/Limit/Dampen the Alerts to reduce the number of alerts due to the Flaps. Since I am not a programmer my hands are a little tied with my limited capabilities.I was thinking Something on the lines of the below should help.... Please let me know if there is a way to address the issue than get bombarded with about 400 mails over the weekend.
Generate an Alert If Node is down for 2 minutes & Generate a Reset if Node came back up for 2 minutes --- This part was straight and is in place. Now if the node is flapping, identify it, send an alert that node is unstable and mark it as unstable and supress all further alerts until it is stable again --- This is the battle
I was considering something like this, but not quite sure how to accomplish it
- Every time Node goes down & comes back up Update update/increase the value of Custom property (Node Flap Count) by 1
- If node stays stable for over 10 minutes reduce the Node Flap to 0
- Now if node goes down again within Check Node Flap status to see if it is > 1.
- If it is grater than 1, Supress the Alerts/Resets and increase the Node Flap Count by 1.
- If it is a 0 then trigger the alert.
ANy help is appreciated
Thanks
KV