This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Need help with supressing alert after another has taken over (timewise)

I have created 3 alerts for Node Status down situations.


First one kicks in once the node is down. Another is delayed until the node has been down for at least 30 minutes. The third one is delayed until the node has been down for at least 60 minutes.


I am having trouble suppressing each previous alert. Once the node is down for 30 minutes I would like the intial alert to stop sending emails. Once the node is down for 60 minutes I would the 30 minute alert and the intial alert to stop sending emails.


 


Can anyone help?


 


Thanks,


George

  • George,

    You have to stick with me on this one, the answer is a little long and drawn out.

    As you said in your post you need to suppress a previous alert after a given period.  Problem is NPM does not allow alerts to be aware of each other.  In order to get around this you have to add a way for one alert to check the status of another.  Knowing that is a big challenge we can cheat and store some information in a custom field we create for a node.  The value you store can really be anything, 1, 2, 3 or a, b, c etc.

    If we create a custom field called alert_value we have a place to store some information

    When an alert is triggered for a node going down there are a series of things you want to happen. 

    Alerts at 0, 30, 60 90 etc.

    Using the "delay the execution of this action" option you can build in the delay of each subsequent alert.

    So at this point we have four alerts each with a delay at a set time.

    Now we have to deal with how to suppress the previous when the next triggers.  We can do this using the custom field and two trigger actions. 

    Log the alert to a file - We use the Log to file to write a file with the Node ID and the value we want stored in custom property.
    and
    Execute an external VB script - We use the execute VB script to parse the file we wrote and update the custom property.  (you can use anything to complete this step as NPM allows you to execute a program also.  basic idea is parse text file and update custom property)

    Now that we have everything in place we use the Alert Suppression or Reset Condition to check the value of our custom property and either Suppress or Reset from there.

     
    Here is a walk through

    Node goes down, alerts 0, 30, 60, and 90 fire
    When alert 0 fires it sets the alert_value to 0
    30 minutes in the next alert fires and sets the alert_value to 1.  This causes the first alert to Suppress or Reset.
    60 minutes in the next alert fires and sets the alert_value to 2.  This causes the second alert to Suppress or Reset.
    etc.

    There are some gotchas with this
    - make sure to write to the event log
    - use > 0, > 1, etc. so the previous alerts to do act again.
    - make sure your program or script used to update the custom property is reliable.


    Best of luck
    -Cam