5 Replies Latest reply on Mar 6, 2015 8:38 AM by zackm

    Global default polling interval

    sk3l3t0r

      Ive changed what I thought was the global default polling interval from 120 seconds to 30, but nodes still show 120 seconds.  Does this change only appy to newly added devices, and I therefore have to edit all the existing nodes polling intervals?

       

      120s seems a bit long, and we found our old system (SNMPC which defaults to 30s), was actually alerting us to things happening before SW knew about them, so we want to change to have the same level of awareness...

       


        • Re: Global default polling interval
          blackh0leson

          I just changed the default interface statistics polling interval to 1 minute from 9 minutes to get more granular reports.  I noticed that it only works for interfaces I have added after I made the change or interfaces that I remove and re-add.  Hope this helps.

          • Re: Global default polling interval
            zackm

            Just to add some insight to this thread, there a couple of things to remember when thinking about the ICMP polling intervals.


            1. The default for all alert definitions is to only query the database every 60 seconds to look for new alerts/resets.
            2. The default warning interval for all node down events is 120 seconds.


            So, for reference, if you use a 30 second polling interval, and leave the other 2 items above default, you would get a scenario like this:


            • 1 single ping sent every 30 seconds. If that returns, the node is marked UP and the next ping will go out in 30 seconds.
            • If the packet does not return, the node is marked WARNING and moves into a fast poll for 120 seconds. (1 ping every 10 seconds)
            • If any of these return, the node returns to an UP status and the next ping is in 30 seconds.
            • If all of these fail, the node is marked DOWN and the next ping is in 30 seconds.


            So, how does that measure out in regards to a Node Down alert?


            It could be between 120 and 209 seconds before you are alerted on a Node Down event. (assuming that your alert definition does not have a time threshold to hold down the trigger. ex: alert me when a node has been down for XX minutes)


            Optimal Conditions:

            t0 = node goes down in real life and ping fails (warning starts)

            t60 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t120 = warning is over and node is marked DOWN

            t120 = alert definition query just happens to coincide with down event and alert is triggered.


            More likely:

            t0 = ping replies, node is up

            t1 = node goes down in real life

            t30 = ping fails (warning starts)

            t39 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t89 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t149 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t150 = warning is over and node is marked DOWN

            t180 = ping fails (node is still down)

            t209 = alert definition queries the database - node is down and alert fires


            Worst case (missed event):

            t0 = Ping Passes

            t1 = node goes down in real life

            t30 = ping fails (warning starts)

            t39 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t89 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t149 = alert definition queries the database - node is in warning (not down) - no alert is triggered

            t150 = warning is over and node is marked DOWN

            t180 = ping returns and node is marked UP

            t209 = alert definition queries the database - node is up and no alert fires (missed event)


            The takeaway from this? You are usually better off by editing your warning level time from the default of 120 seconds. This is what will actually "speed up" your alerting. While editing your polling interval and alert query intervals is possible, it can cause a very heavy load on your polling engine and should be done cautiously. Generally speaking, it is better to identify your highly critical devices and edit those intervals per device and leave the defaults. But that is really a decision for you to make in your own environment. I just wanted to point out the full alerting picture so you can make an informed decision.


            Happy polling!





            -ZackM

            Loop1 Systems: SolarWinds Training and Professional Services

            1 of 1 people found this helpful
              • Re: Global default polling interval
                jodelgado

                Thanks for the detailed info ZackM, you mentioned "a couple of things to remember when thinking about the ICMP polling intervals"

                 

                I'm assuming the same goes for SNMP ?

                  • Re: Global default polling interval
                    zackm

                    Absolutely correct. SNMP/WMI statistics polling will exponentially increase the load on your polling engine as well.

                     

                    However, there is not an equivalent to the Node Warning Level with statistics polling. Default is 9 min interface, 10 min node, 15 min volumes. If you need to change any of that for faster responses, I would make that change directly against the device(s) and not change the default global settings. However, I work with different clients on an almost weekly basis and it is VERY rare to see a real need to change these intervals, even at the device level. For reference, I don't think I have had a client with non-default intervals in at least 6 months. I'm not saying it isn't possible, or even necessary in the right circumstances, just be aware of what the costs of monitoring are.