3 Replies Latest reply on Apr 26, 2018 11:42 AM by raymondakrawi

    CPU Alert with normal peak usage exclusions

    raymondakrawi

      I have recently recreated my CPU and Memory alerts using a SWQL query in order to standardize the conditions across my 3 different instances. The new alerts are being sent to a operations group and they aren't happy with the reoccurring cpu and memory alerts whether it is during high peak usage hours or whether a backup or a/v scan.

       

      I am testing the option of creating new custom properties cpuPeakStart and cpuPeakEnd for cpuloads and memPeakStart and memPeakEnd. In my swql query I will add an additional where condition similar to the one listed below.

      where

      hour(DATETIME) > 5

      and

      hour(DATETIME)  < 8

       

       

      Has anyone tried this approach or have any other alternatives to handling high reoccurring kpi utilizations?

       

      Another approach would be to unmange the node but that puts us at risk if a node goes down and no alerts get triggered.

        • Re: CPU Alert with normal peak usage exclusions
          mesverrum

          What is the expected procedure if these devices are using high CPU? 

          Is there a scenario that would actually require an intervention for these boxes?

           

          To me it sounds like no action is considered necessary if you don't care during peak hours, and you don't care during off hours then when would it ever actually matter? You could just come up with a scheme to exclude these devices from CPU alerting in general.  Assuming you have SAM or some other form of application monitoring in place, maybe it would be more effective to just alert on some kind of synthetic transaction against the application this server supports and just rely on that since ultimately CPU load is just an indirect measure that we normally associate with slow application performance.

          • Re: CPU Alert with normal peak usage exclusions
            nglynn

            What I did was created a 'Highly_Utilized' Custom Property that I could then add an exclusion condition to my alert rules for.  So for nodes that tend to be running hot but it's expected behavior I can have those excluded from my default rules.  I then created a rule for the highly utilized nodes that is a bit more lenient.  In addition to that you could also adjust the scheduled hours for the specific alert rules should things be consistent.

            • Re: CPU Alert with normal peak usage exclusions
              raymondakrawi

              I like the option of creating an additional alert for the high cpu offenders but that can lead to a nightmare as offenders increase and the time the cpu peaks occur vary between the devices.

               

              I was toying with the idea of creating the cpuPeakStart and cpuPeakStop custom property and then adding a condtion in my swql alert query which will make sure the cpuload time is not between certain values. One of many issues is that the date and time is store together and if I get just the hour and concat the time 7:08 will display as 7:8. It would be nice if the mute function would be granular and allowed repeats.