12 Replies Latest reply on Oct 6, 2010 6:33 PM by jspanitz

    Roughly 5 minute delay before hosts are marked "down" - Normal ?

    EllisD

      Hi All, new to NPM, so bear with me if I've made a noob error.

      My Cisco devices are monitored using SNMP (and I presume IMCP?).  If I take down my test switch, it takes sometimes 4 - 6 minutes before NPM marks the node as "down". It alerts me due to "packet loss" on the affected node very quickly.

      I've changed the default polling interval from 120 to 60 seconds, with not much luck.

      Am I being too impatient, or have I done something wrong?

      Thanks in advance,

      Ellis

        • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
          Questionario

          Hi,

          at the settings page you can select polling settings, at the bottom of that page you can find "Node Warning Level", you can change the value there to determine after how many seconds a node is considered as down.

            • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
              EllisD

              Hi - thanks for your answer.

              As a test I changed the "Node Polling Interval" to 30 seconds and changed the "Node Warning Level" to 10 seconds. It still took around 3 mintues for NPM to mark it as down.

              Am I still being too impatient?!

                • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                  kweise

                  EllisD

                  The other thing you can check is under Web Console Settings on the Admin page.  By default, the Orion page only refreshes every 5 minutes.

                  • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                    Questionario

                    with those settings (assuming you are constantly refreshing either manually or by ways that kweise mentioned) or it should take less than a minute for the node to appear as down... maybe you should open a support case to investigate this issue

                    PS: if you changed the polling interval for all nodes and you have a lot of them, this might overload your database which in turn might cause this delayed notification.

                    with "appear" as down you mean the icon being green instead of red or do you mean an alert being sent out? because you also need to adjust alerts if you want them to trigger sooner or later...

                      • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                        EllisD

                        I was manually refreshing the web page. Think it's time for a support query to find out what's going on.

                        I only changed the polling interval to 30 seconds as a test, it was still slow/sluggish when set to 1 minute, 2 minutes etc.

                        By "appear", i meant turn red :-(

                        Thanks for your help.

                        Ellis

                          • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                            njoylif

                            isn't there also a setting somewhere that indicates 3 missed polls before it marks it down?

                              • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                                kweise

                                That sounds vaugely familiar but I can't find that setting anymore.  However, there is a setting under Orion Polling Settings for Node Warning Level.  It sets the number of seconds before Orion marks devices as down.  I'm guessing the default is 120 seconds.  It's at the bottom of the page under Calculations & Thresholds.

                                  • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                                    njoylif


                                    That sounds vaugely familiar but I can't find that setting anymore.  However, there is a setting under Orion Polling Settings for Node Warning Level.  It sets the number of seconds before Orion marks devices as down.  I'm guessing the default is 120 seconds.  It's at the bottom of the page under Calculations & Thresholds.

                                     



                                    So SW, does that mean you went to a FAST POLL if a poll is missed or what is the scenario?
                                    thx

                                      • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                                        EllisD

                                        Realised that I never thanked you all for your responses.

                                         

                                        Transpires the issue was down to the fact that I had ramped up the SNMP polling to every 1 minute.  Poor NPM was getting swamped.

                                        Have knocked it back to 10 minutes, nodes now get marked down within timeframe specified by the "ICMP polling interval" and the "node warning threshold".

                                        Support also sent me this useful info:

                                         A device may drop packets or fail to respond to a poll for many reasons. Should the device fail to respond, the device status is changed from Up to Warning. On the Node Warning Interval tab, you specify how long it will remain in the Warning status before it is marked as Down. During the interval specified, the service performs "fast polling" to continually check the node status.


                                        Please see below on changing the fast poll interval in Network Performance Monitor


                                        1. Click on File.
                                        2. Click on Advanced Settings.
                                        3. Select the Node Warning Interval tab.
                                        4. Adjust the scrollbar to suit your network's needs.


                                        The "Fast Poll" only occurs from the time NPM detects a problem thru the "Node Warning Period". Once NPM has determined the interface or node is DOWN it marks it as down and reverts to the normal polling interval timeframe.


                                        Notes:
                                        - To reduce the amount of packet loss reported by Orion NPM, configure the polling engine to retry ICMP pings a specific number of times before reporting packet loss. To do this, add the string value: “Response Time Retry Count” to the Windows Registry in the Settings folder of: HKEY_LOCAL_MACHINE/SOFTWARE/SolarWinds.Net/SWNetPerfMon/. Set the value data to the number of retries you prefer.


                                        - You may see events or receive alerts for down nodes that are not actually down. This can be caused by intermittent packet loss on the network. Set the Node Warning Interval to a higher value to avoid these false notifications.
                                        Please let me know if you require any further information.

                                         

                                         

                                        Thanks,

                                        Ellis

                                  • Re: Roughly 5 minute delay before hosts are marked "down" - Normal ?
                                    byrona

                                    In the advanced alert for Node Down you have two different settings that you can tune, one setting for how often the Alert Engine checks for the alert criteria to be true and there is another setting as was previously mentioned by Njoylif that tells the Alert that it must be in an alert state for x number of minutes before the alert is triggered.

                                    So consider all of the different systems in play...

                                    1. Polling Interval
                                    2. How often alert is checked
                                    3. How long the alert must be tripped before an alert is generated

                                    If each of these is set to 5 minutes then your node could be down up to 15 minutes before an alert is tripped.

                                     

                                    Check these settings on your system, they may be the cause of your problem.

                                    Hope this helps!