7 Replies Latest reply on Oct 12, 2017 11:39 AM by ryan.davis26

    Condition must exist for more than - Question

    happyal27

      Hello,

       

      I'm having an issue with the Condition must exist for more than xx minutes settings.

       

      Currently I have several alerts setup that use "Condition must exist for more than" settings. The problem is that I'm getting alerts logged that don't meet this condition.

       

      For example. I have an alert set for a service. It Evaluate the trigger condition every 30 minutes, and the Condition must exist for more than 45 minutes. But it's logging an alert after the problem exists for only 20 minutes.

       

      Can anyone see what is wrong with my setup? Is there something I can change to make it work?

        • Re: Condition must exist for more than - Question
          thirtyfivefox

          I would be curious what your polling interval on the node is as well and what sort of variable is being compared in the alert metrics?  Is it an average of usage or raw data?  CPU/Network Load/Memory Utilization?  Cheers.

          • Re: Condition must exist for more than - Question
            Geoff Smith

            Change it to evaluate every minute.  What you're doing is alerting is the issue appears in 2 sample periods 30 minutes apart.  so

             

            Check 1 throws                                                     Check 2 throws                         Alert throws because it's seen high CPU each time it's checked for 45 minutes

            1:00 - high CPU  1:10 low CPU  1:20 Low CPU  1:30 High CPU  1:40 low CPU  1:45 low CPU

             

            So you're going to see some odd conditions with a setup like that.  I use long check times only for stuff that I don't care about knowing right away.  I don't need to check the expiration date of SSL Certs every minute, for example.

            • Re: Condition must exist for more than - Question
              ryan.davis26

              I hate the wording of these settings.  This is what I've done with all my alerts:

              I set the "Evaluation Frequency" to match the polling of the object I want to alert on.

              So as an example if I'm polling node statistics every 3 minutes, there is absolutely no point in setting this value to anything less than 3 minutes - you're just wasting resources on your poller for anything less than the amount of time you're polling something:

              I set the "Condition Must Exist For More Than" based on the evaluation frequency.  I consider this setting a "trigger interval"

              So as an example, if I'm evaluating the conditions of the alert (evaluation frequency) every 3 minutes and I want to trigger the alert after say 5 "polling cycles" then I'll set the ""Condition Must Exist For More Than" value to 15 minutes:

               

              This doesn't specifically answer your question but it might help.

              I also agree with what gsmith@houseloan.com is saying.

                • Re: Condition must exist for more than - Question
                  Geoff Smith

                  ryan.davis26  wrote:

                  So as an example if I'm polling node statistics every 3 minutes, there is absolutely no point in setting this value to anything less than 3 minutes - you're just wasting resources on your poller for anything less than the amount of time you're polling something:

                  I disagree.  The alert timing won't necessarily line up with your polling timing.  In the above case, it could be nearly 6 minutes between when a condition occurs and when an alert notices it.  It could also be if one or the other isn't EXACTLY 3 minutes, one might run longer than the other (you'll have 2 alert checks for 1 polling data, so it'll think you have 6 minutes of the same data, or the above mentioned skipping of a value).

                   

                  I would always set the alert frequency to be less than the polling frequency.  I found it annoying to try to keep up with polling frequency for everything, so I just set it for 1 minute so the alerts have the most timely data they can get.  YMMV, though... I just tend to hit every edge condition with my "luck"

                   

                  If you're looking to save cycles, dump some of the alerts to daily or more.  You don't need to check SSL Cert expiration every minute, for example.  Every day should do.  (then your daily alert E-Mails will contain a nice, annoying countdown until the cert is replaced )

                    • Re: Condition must exist for more than - Question
                      ryan.davis26

                      I don't know if anyone understands what I'll call the "alert polling cycle" however I do hear what you're saying as youll see in the below example but I have never found that to be the case (your 6 minute example)   What I really wish you could do is just attach the alert's eval frequency to the polling cycle.  Programmatically speaking that wouldn't be hard to do solarwinds! 

                      But I've said it in other posts before, these settings don't really make a whole of of sense to me for this very reason.  I mean really, whats the point of evaluating the conditions every minute if there isn't any data for 5 minutes?  And if your example were true then why have an eval frequency setting at all?  It really just doesn't make sense.

                       

                      Below is an example of a component alert where we have,

                      Eval Freq: 5min (same as application polling interval)

                      Condition must exist for: 15min (3-4 polling cycles depending on where its caught)

                       

                      in the below example you can see the critical status started on line 108 /107 and then the alert triggered 15 minutes later (coincidentally right before it came back up but thats a different story )

                       

                        • Re: Condition must exist for more than - Question
                          mesverrum

                          The eval frequency is simply how often the alerting engine queries the database for any objects that match your conditions.  Lining it up with the polling intervals is something I often do as well where clients have existing performance issues or concerns about them.  Technically, even if your stats are on a 5 minute polling interval each node is staggered throughout that 5 minutes.

                           

                          For example, node1 polls at 0:00, 5:00,10:00, 15:00 etc but node2 might poll at 0:30, 5:30, 10:30, 15:30, etc.  If the alert engine happens to checks for problems at 0:15, 5:15, 10:15, 15:15 and it requires 10 minutes above the threshold to trigger the alert then it would alert on node1 at 10:15 but node2 wouldn't trigger an alert until the 15:15 eval.  Change the eval frequency to 1 minute and it would trigger at the 11:15 alert cycle, potentially notifying you a few minutes sooner.  In most cases being off one way or the other by a couple minutes isn't the end of the world so you just balance your desire for efficiency against any requirements for alerting speed.  If you had a really brutal set of alert conditions and lots of alerts at high speed intervals it could be a problem in terms of db load but it usually isn't as hard on the system as loading up an interface details page with 6 charts all loading 30 days of data or whatever.

                           

                          In any case, none of that is a problem with your alert.  Looking at those timestamps you were in warning (5) or critical (6) from 1:44:20 until 2:39:20 and the alert triggers just after the 9th poll in either of those states (45 mins).  I assume your alert logic is either statistic > warning threshold for 45 min or component status is warning or critical for 45 min, both of those would be true.

                           

                          I'd be curious to test if you changed the logic to say status = warning would the alert not fire because at some points it was critical or does the system have any kind of implicit understanding that critical is worse than warning and treat it inclusively in some way.  I bet it doesn't as solarwinds mostly just does straight sql logic on these things. 

                            • Re: Condition must exist for more than - Question
                              ryan.davis26

                              Yup, you're right - the polling engine cant poll every device on the second every second or it would choke.

                               

                              So my screenshot was just to illustrate aligning the eval freq with the polling cycle.

                              In the above example, we trigger only when the component goes critical which is why you see the alert trigger ~15 min after the component went critical for that sustained 15 min period starting at line 107