This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SLIGHT PACKET LOSS CAUSING NODE DOWN ALERTS

We're seeing slight packet loss and barrage of Node down alerts will drop in.

Where is packet loss configured that would affect our basic Node Down triggers? I'm stumped.

Thanks

  • Node is been marked as down when its reached the set default Threshold

    Settings > Polling Settings >

    pastedImage_1.png

    by default the Node Stauts is been polled every 120 sec  .

    Please try increase the polling cycle frequency for the nodes to  10 ~ 20 or  ~ 30 so even  its drooping  the packet or unreachable  the next packet will be responded and it will not trigger the Threshold

    Edit Node >

    pastedImage_0.png

    You can also try SNMP status poll if this also helped as below

    Difference between ICMP & SNMP response time

  • Hi orioncrack,

    Changing polling frequency will not help you if you are getting alerts due to frequent packet loss.

    The better method is to set a hold time on the alerts.

    For setting hold time:

    1.Goto Alert manager where you configured Node down alerts

    2.In conditions tab there will be option called don't trigger the alert until the condition exists for xx sec/min.

    3. Update the time to xx min, this will check for the status for given intervals and if the device is not responding till that time, then it will trigger the alert.

  • Thanks. I'm aware of these basic steps. I just think there should be more control over this as the polling warning settings don't seem very finite and to me are not exact. An alert should only be generated if there is 100% packet loss. I suppose I can throw that conditional statistic at it. But I feel more control is needed on this.

  • Changing polling frequency is very normal for the Nodes such as Firewall as they have configured Intrusion prevention on the interfaces .

    In other cases the network latency could cause such issues as well therefor instead of going UP to the global alert its better to fix the root issue addressing the frequency so the Node will only be marked as down when its really is down so IT DOES HELP in such a way .

    In large environment changing the whole Alert condition for one or two nodes causing the issue as above is not really should be recommended as in this case he only had an issue with single Node which is been marked as Down due to be polled after 120 second and the packet is either dropped due to network latency or the up to the interface level .

  • Go to the All Settings -->Orion Thresholds(in Section thresholds and polling)  -- from here you can change the polling critical and warning percentage.

    packetloss1.png

    OR

    Setup a alert with the below condition --

    packetloss.png

    Hope it will help you...

  • NPM 12.5 includes support for sustained thresholds, allow you to alert when a node threshold has been exceeded for more than 'X' consecutive, or 'X' out of 'Y' polls. For more information, see the following post.

    Orion Platform 2019.2 - Enhanced Node Status