Working on an alert for response time using NPM v10.7. If I am correct, the warning and critical response time thresholds that I can set on the Node edit properties page are related to "current" response time. If my trigger condition is pointed to the variable Response Time and the Trigger must be sustained for 4 minutes (2 polls), why am I not seeing any alerts when I have a node that has been above the critical threshold for several polls in a row?
aLTeReGo, what's the difference in the average reponse time and current response time. What's the time frame it takes average of?
Also, I could see below two column names in the alert trigger configuration. What's the difference in these two?
Depends upon the context and where it's used. If you select this metric in Alerting, the last polled value is used, so no Average is applied. if you select this same metric in reporting and apply a timeframe like last 30 days, then the value is averaged over the last 30 days.
So If I have to create an alert on "If node response time is higher than 20 ms for last 30 minutes" what do I use? Current response time? Does it store the current response time values database for each poll or just last poll and then starts averaging it?
if there is only 1 value for current response time then my alert condition of if higher than 20 for X minutes will never fire alert and can be good only for the last poll condition.
if it is on average response time then I am afraid the averaging of spike would never capture spikes for consistent period of time.
Correction to my statement above. A quick peek at the source code reveals that Min, Max, and Average Response Times are calculated based upon the last 10 polled values.
So as long as my wait time is less than 10 last polls, I can use current response time? Which from the below should I be using if I am polling
Also confused between:
It is a critical alert and I dont want to answer my management on why the alert did not trigger when needed. Your help will be appreciated.
'Response Time' is the last polled value. The Average, Minimum, and Maximum are based upon the last 10 values. Depending upon what you're most interested in, you will need to choose the value that's best for you. 'Average' is the most common as current 'Response Time' and Min/Max values can be heavily influenced by a single value.
So if I need to wait for 30 minutes (and I poll every 2 minutes), current response time is out of question since it only stores last polled value and is meaningless for my alerting.
Using Average is not the best because it can always bring down the consistently high value.
Also in my screenshot above I see 2 Average response time value, which one is more suitable for the condition of alert I have already explained above - If node response time polled value is consistently higher than 20 ms for last 30 minutes.
yes, that's right!
Since, both 'average response time' and 'current response time' variables are available for setting up trigger condition, its really confusing for what the 'average response time' implies here.
Thanks for the quick help !
Well I finally got an alert but I should not have. The current response time in the alert message was 349 ms, which is NOT greater than either the warning or critical thresholds assigned on the Edit Properties page.
So why did I get an alert?
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.