This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alert Reset Conditions - How do special conditions work?

I'm been building out some custom alerts for a client recently and I decided to build some custom reset conditions for some alerts.  Why?  Well, perhaps I am a glutton for punishment, but mostly because I wanted to ensure that when an application component (in this case) changed statuses away from the trigger condition (down or warn or critical, or whatever) that it moved into a valid status and not into Unknown or something like that.

As I was updating the alerts today I thought to myself : "Do I really understand how special reset conditions work?"

Here is the scenario.

I've built an alert that has defines the trigger scope of the alert as follows:

Application custom property Prod_Alert = TRUE

AND Application custom property Is_24x7 = TRUE

AND Application custom property Trigger_If_Warning = FALSE

I then defined the trigger conditions as:

Component = Down

Simple enough, right?  The alert only checks the status of components where the application monitor custom properties Prod_Alert = True and Is_24x7? = True.

When I set the reset alert conditions I wanted to make sure that the component came back into a valid state but since I have some other alerts that use the same custom properties just with different values (specifically, Trigger_If_Warning = TRUE) I wanted to make sure that the alert reset condition only applied to these specific alerts.

I defined the reset scope the same as the trigger scope:

Application custom property Prod_Alert = TRUE

AND Application custom property Is_24x7 = TRUE

AND Application custom property Trigger_If_Warning = FALSE

I then defined the reset conditions as a whole bunch of valid reset states:

Component = Up

Component = Critical

Component = Warning

Component = Unmanaged

Makes perfect sense, right?  If my alert triggers when the component is down, and only if the application monitor has those 3 custom properties set, then I only want to reset the alert if those some custom properties are set AND a valid component status is reached.

But what if someone changes a custom property?!?!

Remember, the alert triggers on a specific custom property and the reset is scoped to the SAME custom properties. If someone changes a custom property on an application monitor (say, Is_24x7 = FALSE) while an alert is triggered then the reset condition will never clear the alert because the trigger scope will never be true for that alert.

So, after the long explanation, here is my question...

Can the reset condition for an alert reset a triggered instance of another alert?

Basically, if Alert1 triggers an alert because Is_24x7 = TRUE and someone changes it to Is_24x7 = FALSE (meaning that the special reset condition is no longer scoped to catch this application monitor), can the triggered instance of that alert be reset by another alert with a reset condition that has a scope of Is_24x7 = FALSE even though that 2nd alert did not trigger the alert in the first place?

  • Now, I don't actually know the answer for sure.  (It's true -- MVPs aren't unending sources of knowledge!  Opinions, yes, knowledge, no. emoticons_laugh.png)

    That said, if you look in your database you'll find a table called AlertConditionStateAlertConditionState has a column called AlertID.  That AlertID links that triggered instance of an alert to a specific alert configuration as defined in the AlertConfigurations table.

    For example, these are some of the values from the AlertConditionState table.

    pastedImage_0.png

    If I query for AlertID 303 in the AlertConfigurations table I get back the details of the alert.  (Note:  I removed the name of the alert as it is from a client. Just pretend that the big blank in that screenshot has an alert name like Alert when component is down or something to that effect).

    pastedImage_11.png

    These details suggest to me that regardless of the reset scope, a triggered instance of an alert can only cleared by the alert logic that triggered it.

    What does that mean?  It means that if someone does change a custom property to de-scope an application monitor from the trigger condition, the triggered instance of that alert will reset IF I don't include the Is_24x7 custom property in the scope of the reset.

    If I am wrong, please let me know that I am wrong. I've been working off this assumption for a while now and it seems to make sense in practice, but it occurred to me that 1) I might be wrong, and 2) It might not be something that others have thought about when designing their reset conditions​.

  • The way I've always interpreted this was that the reset conditions can only clear alerts that were triggered by the same alert.  On my reset conditions I actually clear out all the scope filters and custom properties and everything else and just focus on the specific conditions that would warrant a reset to me.  So the trigger might say, for example, node has to be a windows node in prod with a disk free space < 10%, but my reset conditions would only say disk free space > 10.  I dont want any other combinations of events to interfere with my alerts clearing.

    Also, if you change the logic after the alert triggers then that new logic does not seem to get applied to already existing instances of the alert, they keep their original reset logic from the time that they were triggered.

    This type of logic has worked successfully for me for over 4 years without problems so I feel like it has been pretty close to accurate.

  • Thanks. I'm definitely crowd-sourcing consensus on my opinion here because the documentation doesn't specify the actual relationship.  The data and experiences gathered so far seem to match.  If it keeps up, I'll mark this as solved and leave it here as an educational post for people wondering the same thing.

    Proper alert design seems so simple up front, but ends up being a larger discussion.  Every time.

  • My understanding of reset conditions has also always been that they go hand in hand with the trigger conditions from the same alert. I suppose a way to test that would be to have two alerts with identical trigger conditions, but different reset conditions. Include the alert ID number in the notification and then see which alertID is referenced when a particular reset condition is met. What do you think?