We are trying to work on alert clean up in the system and set our alerts so that all configured alerts will integrate with WHD and create a ticket.
I'm currently having issues with the node down alerts and the high packet loss alerts. From reading online the system determines if a nodes status by ping every 120 seconds (we left the default here). Once the first ping is missed the node status is set to warning and then it goes into "fast ping" where it will ping the node every 10 seconds for 120 seconds. If all pings are missed it will set the status as down.
Now for % Packet loss, this is calculated by looking at the last 10 pings in memory and from what i have read includes the "fast ping" responses.
How is % Packet Loss calculated?
This means that we will get an alert triggered on a % packet loss just before the node is considered down. Now i have two separate tickets for what really should be just a down node. I am trying to figure out what i can do to fix this logic and still get notified of issues as soon as possible. Our site uses NPM and SAM, any recommendations on this would be appreciated.