For reporting purposes I've implemented CPU monitors but I've set their max value threshold to 100% so I never get alerts. Now that I've got some historical data on CPU averages I want to set my thresholds so I get alerts when I go above 10% average. I am having some trouble figuring out how the Last-Value and Short-Term thresholds work with one another. Would someone please provide a quick synopsis of how the two work with each other along with an example of how the accumulated failures per alert play into it? To put it into context, let's say I want to alert if my CPU hits 60% but only if it's been at higher than 40% for the last 5 minutes.
The CPU Usage Monitor utilizes two separate threshold values when analyzing test results.
1. Last-Value Threshold - This is the primary threshold rate, which is compared to the value acquired by the Monitor during the most recently performed test.
2. Short-Term Threshold - This is the secondary threshold rate. It uses the test results accumulated over the period of time set within the Sample Size field to detect a slow CPU load creep versus a spike in processor usage.
If the Last-Value Threshold rate is exceeded, the CPU Usage Monitor will re-analyze the data by comparing it against the Short-Term Threshold value specified. During this comparison process, the most recent test results are measured against the data obtained from tests that occurred within the length of time specified in the Sample Size field.
This dual threshold method prevents false Alerts from occurring each time a spike in CPU consumption is detected. Alerts will only be triggered if processor load does not return to normal within the required period of time, indicating a steady CPU load increase that may affect system performance.
Thanks for the reply. I was playing around with this for a while and I think I'm missing something still. I set my Last-Value Threshold to 99% and my Short-Term Threshold to 5% for 120 seconds to see if it would cause the monitor to fail. The machine in question runs at 60-80% all the time so I thought I'd get an alert for certain, but I never saw the monitor go into a failed state. If the monitor works as explained I should have seen it fail because it exceeded the Short-Term Threshold even though it never hit the Last-Value Threshold, right?
The Last-Value Threshold is the key. It would need to be set < 60% in order for the alert to trigger. This is what tells IPM that it needs to start further analysis. As you have it set now, the CPU would have to spike to 99% or above for 120 seconds.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.