cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 10

Global default polling interval

Ive changed what I thought was the global default polling interval from 120 seconds to 30, but nodes still show 120 seconds.  Does this change only appy to newly added devices, and I therefore have to edit all the existing nodes polling intervals?

120s seems a bit long, and we found our old system (SNMPC which defaults to 30s), was actually alerting us to things happening before SW knew about them, so we want to change to have the same level of awareness...


Tags (1)
5 Replies
Level 15

Just to add some insight to this thread, there a couple of things to remember when thinking about the ICMP polling intervals.


  1. The default for all alert definitions is to only query the database every 60 seconds to look for new alerts/resets.
  2. The default warning interval for all node down events is 120 seconds.


So, for reference, if you use a 30 second polling interval, and leave the other 2 items above default, you would get a scenario like this:


  • 1 single ping sent every 30 seconds. If that returns, the node is marked UP and the next ping will go out in 30 seconds.
  • If the packet does not return, the node is marked WARNING and moves into a fast poll for 120 seconds. (1 ping every 10 seconds)
  • If any of these return, the node returns to an UP status and the next ping is in 30 seconds.
  • If all of these fail, the node is marked DOWN and the next ping is in 30 seconds.


So, how does that measure out in regards to a Node Down alert?


It could be between 120 and 209 seconds before you are alerted on a Node Down event. (assuming that your alert definition does not have a time threshold to hold down the trigger. ex: alert me when a node has been down for XX minutes)


Optimal Conditions:

t0 = node goes down in real life and ping fails (warning starts)

t60 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t120 = warning is over and node is marked DOWN

t120 = alert definition query just happens to coincide with down event and alert is triggered.


More likely:

t0 = ping replies, node is up

t1 = node goes down in real life

t30 = ping fails (warning starts)

t39 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t89 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t149 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t150 = warning is over and node is marked DOWN

t180 = ping fails (node is still down)

t209 = alert definition queries the database - node is down and alert fires


Worst case (missed event):

t0 = Ping Passes

t1 = node goes down in real life

t30 = ping fails (warning starts)

t39 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t89 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t149 = alert definition queries the database - node is in warning (not down) - no alert is triggered

t150 = warning is over and node is marked DOWN

t180 = ping returns and node is marked UP

t209 = alert definition queries the database - node is up and no alert fires (missed event)


The takeaway from this? You are usually better off by editing your warning level time from the default of 120 seconds. This is what will actually "speed up" your alerting. While editing your polling interval and alert query intervals is possible, it can cause a very heavy load on your polling engine and should be done cautiously. Generally speaking, it is better to identify your highly critical devices and edit those intervals per device and leave the defaults. But that is really a decision for you to make in your own environment. I just wanted to point out the full alerting picture so you can make an informed decision.


Happy polling!





-ZackM

Loop1 Systems: SolarWinds Training and Professional Services

Level 10

Thanks for the detailed info ZackM, you mentioned "a couple of things to remember when thinking about the ICMP polling intervals"

I'm assuming the same goes for SNMP ?

0 Kudos
Level 15

Absolutely correct. SNMP/WMI statistics polling will exponentially increase the load on your polling engine as well.

However, there is not an equivalent to the Node Warning Level with statistics polling. Default is 9 min interface, 10 min node, 15 min volumes. If you need to change any of that for faster responses, I would make that change directly against the device(s) and not change the default global settings. However, I work with different clients on an almost weekly basis and it is VERY rare to see a real need to change these intervals, even at the device level. For reference, I don't think I have had a client with non-default intervals in at least 6 months. I'm not saying it isn't possible, or even necessary in the right circumstances, just be aware of what the costs of monitoring are.

Level 9

I just changed the default interface statistics polling interval to 1 minute from 9 minutes to get more granular reports.  I noticed that it only works for interfaces I have added after I made the change or interfaces that I remove and re-add.  Hope this helps.

0 Kudos
Level 9

I just clicked on the "Re-apply polling statistics interval" and this seems to have added my new polling interval to existing interfaces.  Woo hoo!