2 Replies Latest reply on Oct 6, 2017 10:03 AM by mesverrum

    Urgent Request Please. SW is triggering a Node DOWN

    bootneck1111

      Hi Guys

       

      We have a Node DOWN Alert that seems to trigger when there is Packet Loss which generate a P1 call and waking my Engineers up and 03:00 in the morning. The Node isn't DOWN as in it has rebooted or powered down, but instead has experienced some packet loss on the interface and/or network.

       

      How do I filter out Packet Loss on the Node DOWN Alert please?

       

      In advance thank you for your support.

        • Re: Urgent Request Please. SW is triggering a Node DOWN
          dodo123

          Sounds like your node down alert is too sensitive, change it to alert after 2 or more consecutive polls and play about with it.

           

          is the node over a wan link maybe, is it over agent snmp or wmi

           

          also node down is a P1 wow you must have a great team  

           

          are your apps not load balanced across multiple servers. If so I would change to p2 as would just be a dip in performance.

           

          i know thats not a qiick fix sla’s and that .

           

          Hope you manage to sort it nothing worst in getting engineers up so early for false readings.

          1 of 1 people found this helpful
          • Re: Urgent Request Please. SW is triggering a Node DOWN
            mesverrum

            The way nodes get marked down already gives a lot of room for minor packet loss, it is probably important to understand exactly how that process works if you aren't familiar.

            Solarwinds pings a node, if it gets a response the node is up and it waits until the next polling cycle to ping again. If it gets no answer to the ping the node is marked as in warning and it goes into fast polling mode where it sends a ping every 10 seconds for the duration of the warning period (120 seconds by default, set this in your polling settings). Packet loss is measured as the percent of the last 10 pings that don't get answered, so it will always be a multiple of 10 and it can possible miss small intermittent loss situations.  If Orion gets any responses during that warning period it stays in warning status until packet loss stops. If you go the entire warning period with no responses then the node is marked down and could potentially trigger your alert.

             

            So if you have an unusual amount of packet loss and want to cut back the alerting you can do a few things, extend the warning period or set your Node down alerts to only trigger if the device stays down for a few more minutes.

            2 of 2 people found this helpful