This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Configuring Alert

Hi all, I'm trying to configure some alerts on my servers but I'm facing some difficulty on configuring it based on the requirement my company wants. E.g.

Alerts to send email if server CPU reaches more than 70%. Alerts engineer every 4 hours, server CPU must stay more than 70% for more than 1minute.

Based those information, I set as below

Evaluation Frequency of alert - 4hrs

Select options "Condition must exist for more than" - 1mins

By right it should work correct?

However after several days I notice some problem, none of my engineer receive any alerts even though the server CPU reaches more than 70% I run checks on the historical utilization and indeed there's several times where the CPU hit 80%-90% for more than 10mins but no alerts were triggered. Anyone knows why is that so? I'm suspect this is due to the option "Evaluation Frequency of alert - 4hrs" cause SW could be scanning for this alerts every 4 hours, so the issue occur and ended less than 4 hours then SW will not capture it.

Is this true? If yes then any other way I could configure the alert based on my requirements provided SW must be actively scanning the servers for all this alerts.

  • I don't specifically know the answer you need and I'm not the best at understanding the logic behind building alerts, but I suspect your"Evaluation Frequency if Alerts set to 4hrs" is the issue.

    As it stands, what I understand your alert is saying is:

    "check every 4hrs" and if it alert trigger condition (whatever you've configured) is matched then trigger.

    Which is not what you want. I suspect what you need to do is investigate the trigger "reset condition" and have it reset after x hours, where x = 4 -we have never used this section so I don't know if this is what you need. But for me the logoc works like this:

    • Evaluate every minute
    • If condition(s) are met then trigger alert
    • The alert is reset after 4hrs
    • At which point, if condition still exists then when it is next evaluated (1m) it will re-trigger

    Others may know a better way.

  • Hi ,

    like  mentioned the issue is the evaluation time of the Alert. This setting defines, how often the alert trigger conditions are checked against the solarwinds database.

    So if you set it to 10 minutes, solarwinds will check every 10 minutes if the alert trigger conditions are met.

    However, if you set the option "condition must exist for x" keep in mind its only checked against the solarwinds database.

    So by default solarwinds polls CPU every 10 minutes, so the value in the database wont change for 10 minutes, so it makes the "condition must exist for 1 minute" quite obsolete.

    So you could lower the pollinginterval for CPU/memory but i wouldnt recommend that, because it will put a much higher load on your solarwinds-installation!!!

    Or you need to adjust requirements. If we stick to the 10 minutes polling interval, i would recommend to set the "condition must exist for x"

    to 22 minutes. So you get atleast 2 polling intervals and if the cpu is still above your trigger condition of >70% it will trigger the alert.

    I would set up your alert like this:

    Evaluation Frequency of Alert - 1 minutes

    trigger conditions how you need them (CPU >70%)

    "condition must exist for x" - 22 minutes

    And to keep your engineers informed every 4 hours as long as the alert is active, on the trigger action tab where you configure your alert actions like sending an email, please look at the "execution options".

    Here you can set the action to be executed every 4 hours.

    Best Regards

    Rene

  • >> Here you can set the action to be executed every 4 hours.

    Ah yes, I'd forgotten that you can do escalations further into the alert, and it's probably the best place to do so.

  • Thanks all! 

    Here's the changes I've made, evaluate the alert change to 5 mins, trigger condition must be in the condition for 1 mins.

    For email alerts, i made changes to the execution settings as below. This should ensure my email will be send every 4 hrs as long the condition is true right?

    Jason_TF_0-1600672284044.png

    If the engineers didn't acknowledge this alert, but the condition is no longer true. Will this action still sends email since the engineer didn't acknowledge it?

  • Hi  & ,

    Currently my alert is able to trigger, however it's causing quite a havoc as alot of the alerts that were triggered is false positive. Meaning, by the time my users login into Solarwind the node is no longer showing the issue.

    I wanted to try and set the continue must exist to 22mins but I have a concern here. Usually on a production system we can't afford to wait for 22mins before receiving the alerts, yes it will get rid of false positive alerts however if it's a legit issue, having the node sustain the issue for 22mins seems before alerting doesn't seems correct.

    Any idea how I can improve it?

  • What are your settings on the trigger condition screen?

    Typically we set the following to 2m. As in, the node, interface, etc must be down for 2m before it triggers

    stuartd_0-1603098807323.png

    For WAPs we don't really care about we set that figure to 60m down before alerting. It seems to me that this is where you need to do your 22m setting.