Summary: We want to configure one of our alerts to fire after 4 consecutive node status checks and where the node status is "down" (we don't want false negatives, hence the desire for 4 consecutive status checks).
In order to do this, it feels like we need to reliably know how often solarwinds is checking the node status. At this time, it isn't clear to me how often solarwinds is checking the node status. Is the node status check done at the frequency mentioned on the node details page (The "polling interval"?)? Or is the node status check done at the frequency defined on the polling settings page?
Details:
We're using solarwinds to validate our nodes (servers) are up and working. We are also using Server and Application Monitor and configured alerts to page us when a node is in a "down" status.
But it isn't clear to me how to configure the frequency at which solarwinds to checks the node status (nor what the default is).
On the node details page I can see the "Polling Details" section says the "Polling Interval" is 300 seconds
Is that the frequency at which the polling engine will check the node status?
If I go to Polling settings I see this:
Does this mean that the poll intervial is 120 seconds for every node? Why does this differ from the polling interval on the node details page? Or maybe this is a global polling interval default, and this it mean that for newly created nodes the poll interval will be 120 seconds?
Note: I've tried clicking "re-apply polling intervals" on the above admin page thinking it would change the poll interval for each node, but when I go to the node details page the poll interval remains at 120 seconds. (I've tried to follow the instructions here: https://documentation.solarwinds.com/en/success_center/orionplatform/content/core-polling-intervals-sw1826.htm , but as mentioned, after clicking the re-apply polling intervals button, I see no change to the polling interval on the node details page)
Thus, it isn't clear to me how often my nodes are being checked for their status. It feels like 300 seconds because that is what the node details page says?
Thank you~!
Note: I'm asking because we are getting paged when a node goes down and last night we got paged at 2am. But when we inspected the node, everything was fine.
Message Center is telling us the following:
9:08pm - node has stopped responding
9:11pm - node is responding again
10:48pm - node has stopped responding
10:51pm - node is responding again
1:43am - node is not responding
1:46am - node is responding again
1:53am - node is not responding
2:04am - alert fires
2:06am - node is responding agian
2:09am - alert recovers
Our alert is configured to evaluate the trigger condition every 2 minutes, and the condition must exist for more than 7 minutes (that explains why the alert didn't fire until 2:04am... the node had been down for 11 minutes at that point, but if the condition is only checked every 2 minutes and the condition must exist for 7 minutes, then that explains the delay).
Obviously we have a communication issue, however, we don't want our alerts to be overly sensitive and page us when the node is fine. Thus we are adjusting our trigger condition and we wanted to adjust the frequency at which Solarwinds is checking the node status.
We're thinking we want our alert to fire only if 4 consecutive checks of the node status indicate a down node. In order to do that, though, we need to reliable know how often solarwinds is checking the node status. If it is checking every 300 seconds, we'll adjust the condition must exist for setting to be 20 minutes (4 consecutive polls). If it is checking every 120 seconds, then we'll probably adjust the condition must exist for setting to be 8 minutes (or perhaps a little more to avoid false negatives).
Thanks again