Hi all,
I'm hoping any wise gurus in Thwack can help me on this.
I'm having massive headache on configuring alerts for my servers. I'm trying to configure a CPU & Memory alerts which will send email to me whenever their utilization crosses certain threshold. A special condition that it must fulfill before alerting is the utilization must stay high for a period of time.
So my general setup would be
Evaluate the trigger condition every: 1min
The actual trigger condition: Node | CPU Load | Is Greater than | 85%
Condition must exist for more than: 2min
I think by looking at it, basically it's fulfilling what I need.
However the alerts doesn't behave like that at all! This is what frustrates me, I'm working with support guys and trying to explain to them but I doubt they even understand me.
Here's what I'm getting.
- I receive email alert stating I have high CPU at 1:18PM
- I logon to Solarwinds console to check on the server, server is normal no high CPU!
- I open Perf Analyzer, zoom in to 1PM & 2PM. Verified from avg CPU load chart indeed there is a spike at 1:15PM.
- However the spike didn't last long, by the time at around 1:17PM utilization has come down yet at 1:18PM i receive email alert for an alert which happen at 1:15PM!!
What has gone wrong here? I've been explaining to support guys on my situation & he just didn't understand me and stated that this behaviour is normal and I should increase "Condition must exist for more than" to a higher value. However in a running production environment, any server which have high CPU / Memory longer than 2 mins or even 5 mins will suffer terrible performance issue.
Please dear thwack communities is there anyway I can adjust to fulfill my requirement? I'm pulling my hair everytime I tried to explain to the support guys.