This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How Do I Set Different Thresholds for Alert and Reset

I have two nodes that require special action when they become unavailable due to high CPU load. I've created an alert for these that says:

2017-12-08 16_04_32-Edit Alert - _(SysInfo).png

So any server with SYSPROD in the name, which are my two special cases, will alert if CPU load goes above 90 for 2 minutes.

How do I create a reset condition to have that alert clear only when CPU load gets down to below 80? I don't want it to clear right away when the alert condition clears, and I don't want it necessarily cleared after a certain time span. I want it to get below a certain lower value before it clears. I want the alert threshold and the reset threshold to be different. I figured it would be as easy as just setting a custom condition on the reset, just like on the alert. But the reset condition page looks just like the alert condition page. You select either "all objects" or "following set of objects". Then the condition. I don't want to specify a specific object or set of objects. What I want is for the "scope of alert" to be this object. The one that alerted, whichever one that is. Based on the alert condition, it could be either node that alerted.

I thought maybe I was overthinking it, and that the reset really does apply only to the specific object that alerted, but then why have the two options for "scope of alert"? Why can't I specify this same object?

Parents
  • you can copy the same settings from trigger action to reset action... just modify the threshold as per ur need.

    pastedImage_0.png

  • Thanks for the suggestion, partikmehta003. But doing that won't work. The fact that that won't work is what my post was about. The problem is that the reset condition page looks just like the alert condition page. You select either "all objects" or "following set of objects". Then the condition. If I copy the alert condition I just copy that same scope. I don't want to specify a specific object or set of objects. What I want is for the "scope of alert" to be this object. The one that alerted, whichever one that is. Using the scope options that are available, it could be either server that goes below 80 and thus clears the alert. If I have two servers and one goes above 90, the alert condition is met and it triggers. But the other server is still below 80 and so the reset condition is immediately met as well, thus resetting the alert. It should be the one that caused the alert that the reset is checking. Not either server. Just the one that caused it.

  • The setting I mentioned will still hold true for your explanation... For

    ex: u have set alert condition for 10 servers using a group or custom

    property, and u apply the same for ur reset then u will get alert only for

    the devices that breach the condition and not others... The reset will

    apply as soon as it doesn't meet the alert condition...

    I m not sure if this answers ur query.. another option I can think of is

    having separate alert for separate threshold, but this makes sense only

    when you have less number of devices

Reply
  • The setting I mentioned will still hold true for your explanation... For

    ex: u have set alert condition for 10 servers using a group or custom

    property, and u apply the same for ur reset then u will get alert only for

    the devices that breach the condition and not others... The reset will

    apply as soon as it doesn't meet the alert condition...

    I m not sure if this answers ur query.. another option I can think of is

    having separate alert for separate threshold, but this makes sense only

    when you have less number of devices

Children
  • Yes to add to that, the Reset condition is only applied to the element that triggered the alert in the first instance. So, for example, your Server A (eg Name SYSPROD01) triggers an alert for CPU @ 95% it won't reset until that server goes below 80%.

    If in that period of time Server B (eg SYSPROD02) hits 92% it will initiate a new alert trigger action which in turn will not be cleared until that specific server has met the reset condition.

    If this were not true then we would need to have 1000's of alerts for all different variations of groups.

  • Thanks guys. That would make sense to me, that the reset conditions would only apply to the server that triggered the alert. But that's not what the reset page is saying will happen. The reset page asks for scope for the reset condition, and the only options it gives are:

    • "when the alert condition clears" (which I assume means the alerting node)
    • "all objects"
    • "the following set"

    It doesn't let you specify the reset condition and apply that condition only to the alerting server node. It's one or the other. You can either say "when the alert condition clears" and not be allowed to specify a reset condition. Or you can specify a reset condition but not scope it to the alerting node.

    If it's true that the reset condition is only applied to the element that triggered the alert, then the options I apply on the reset page aren't what it actually looks for.

  • can you paste your reset condition screenshot here?