Hi all, I'm trying to configure some alerts on my servers but I'm facing some difficulty on configuring it based on the requirement my company wants. E.g.
Alerts to send email if server CPU reaches more than 70%. Alerts engineer every 4 hours, server CPU must stay more than 70% for more than 1minute.
Based those information, I set as below
Evaluation Frequency of alert - 4hrs
Select options "Condition must exist for more than" - 1mins
By right it should work correct?
However after several days I notice some problem, none of my engineer receive any alerts even though the server CPU reaches more than 70% I run checks on the historical utilization and indeed there's several times where the CPU hit 80%-90% for more than 10mins but no alerts were triggered. Anyone knows why is that so? I'm suspect this is due to the option "Evaluation Frequency of alert - 4hrs" cause SW could be scanning for this alerts every 4 hours, so the issue occur and ended less than 4 hours then SW will not capture it.
Is this true? If yes then any other way I could configure the alert based on my requirements provided SW must be actively scanning the servers for all this alerts.
like @stuartd mentioned the issue is the evaluation time of the Alert. This setting defines, how often the alert trigger conditions are checked against the solarwinds database.
So if you set it to 10 minutes, solarwinds will check every 10 minutes if the alert trigger conditions are met.
However, if you set the option "condition must exist for x" keep in mind its only checked against the solarwinds database.
So by default solarwinds polls CPU every 10 minutes, so the value in the database wont change for 10 minutes, so it makes the "condition must exist for 1 minute" quite obsolete.
So you could lower the pollinginterval for CPU/memory but i wouldnt recommend that, because it will put a much higher load on your solarwinds-installation!!!
Or you need to adjust requirements. If we stick to the 10 minutes polling interval, i would recommend to set the "condition must exist for x"
to 22 minutes. So you get atleast 2 polling intervals and if the cpu is still above your trigger condition of >70% it will trigger the alert.
I would set up your alert like this:
Evaluation Frequency of Alert - 1 minutes
trigger conditions how you need them (CPU >70%)
"condition must exist for x" - 22 minutes
And to keep your engineers informed every 4 hours as long as the alert is active, on the trigger action tab where you configure your alert actions like sending an email, please look at the "execution options".
Here you can set the action to be executed every 4 hours.
Currently my alert is able to trigger, however it's causing quite a havoc as alot of the alerts that were triggered is false positive. Meaning, by the time my users login into Solarwind the node is no longer showing the issue.
I wanted to try and set the continue must exist to 22mins but I have a concern here. Usually on a production system we can't afford to wait for 22mins before receiving the alerts, yes it will get rid of false positive alerts however if it's a legit issue, having the node sustain the issue for 22mins seems before alerting doesn't seems correct.
Any idea how I can improve it?
What are your settings on the trigger condition screen?
Typically we set the following to 2m. As in, the node, interface, etc must be down for 2m before it triggers
For WAPs we don't really care about we set that figure to 60m down before alerting. It seems to me that this is where you need to do your 22m setting.
>> Here you can set the action to be executed every 4 hours.
Ah yes, I'd forgotten that you can do escalations further into the alert, and it's probably the best place to do so.
Here's the changes I've made, evaluate the alert change to 5 mins, trigger condition must be in the condition for 1 mins.
For email alerts, i made changes to the execution settings as below. This should ensure my email will be send every 4 hrs as long the condition is true right?
If the engineers didn't acknowledge this alert, but the condition is no longer true. Will this action still sends email since the engineer didn't acknowledge it?
I don't specifically know the answer you need and I'm not the best at understanding the logic behind building alerts, but I suspect your"Evaluation Frequency if Alerts set to 4hrs" is the issue.
As it stands, what I understand your alert is saying is:
"check every 4hrs" and if it alert trigger condition (whatever you've configured) is matched then trigger.
Which is not what you want. I suspect what you need to do is investigate the trigger "reset condition" and have it reset after x hours, where x = 4 -we have never used this section so I don't know if this is what you need. But for me the logoc works like this:
Others may know a better way.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.