This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

CPU ALERTS FINE TUNING

I'm seeing a lot of alerts that have cleared by the time the alert email gets fired off.  For example, I just had an alert trigger and by the time the email was sent, the CPU was already below the 90% threshold.  See below:

The CPU on Server is currently running at 74 %. The top 10
processes running at the time of this poll are listed below:

                Name                    Process ID                           CPU

oracle.exe                           1684                       44.16 %

WmiPrvSE.exe                  2820                       9.54 %

svchost.exe                        924                         3.42 %

System                 4                              1 %

sass.exe                              588                         1 %

oraagent.exe                     4736                       0.14 %

smss.exe                             268                         0 %

csrss.exe                             408                         0 %

wininit.exe                         464                         0 %

csrss.exe                             476                         0 %

If you acknowledge this alert send a email to sysops with
your reasoning on acknowledging the alert rather then fixing the alert

At the same time the following alert also was sent out:

CPU reset Alert

Since yesterday evening, I've seen about a half dozen of these occurrences and am thinking I need to tweak my settings in order to cut down on the noise.  Here is how I have the alerts set, any tips or opinions on these alerts would be appreciated.  My environment is running NPM 12.1 and yesterday migrated to a new server which is running Windows 2016 standard.

pastedImage_9.png

pastedImage_10.png

pastedImage_11.png

pastedImage_12.png

  • You could configure one or more servers to send traps on CPU utilization to help validate processor utilization.

  • This is the problem with not having CPU queue length built in.

    Yes, you can hack it out with Leo's blog, but it sure would be nice to have an intelligent native CPU alert.

  • Without getting to crazy and going over board with SQL/SWQL, You could create Custom properties like Impact Summing like, n_impact, so your High impact Device you could have a delay for 5 mins then Medium 10 mins then Low 30 mins for the less significant devices, for example.

    Means 3 alerts which is annoying.

    You could also introduce something like Zones all devices in zone 1 are alerting for 15 mins for CPU and all devices for Zone 2 alert 30 mins so a less significant area.

    This is just one of a TONE of ways to be creative with noise emoticons_happy.png

  • following up on what "i_like_eggs" mentioned,  you could set up your alerts to trigger when utilization has been over 90% for more than X minutes and then have the alert to clear once the CPU utilization has been less than 85% for X minutes... 

  • Here is a similar thread that recently started about the same thing. jvb​ added his two cents as well as what I added.

    Intelligence into alerting

  • How frequent is your SolarWinds Instance polling for cpu and memory? My polls are set at 10 min and my alert condition is set to wait for 31 minutes before triggering.

    Doing so requires the cpu value to be at the threshold for at least 3 polls and possibly 4 polls depending on polling sequences. I also have a reset condition for the half the time so we are getting multiple alerts and the alerts haven't reset by the time an action takes place.

    pastedImage_0.png