I have two nodes that require special action when they become unavailable due to high CPU load. I've created an alert for these that says:
So any server with SYSPROD in the name, which are my two special cases, will alert if CPU load goes above 90 for 2 minutes.
How do I create a reset condition to have that alert clear only when CPU load gets down to below 80? I don't want it to clear right away when the alert condition clears, and I don't want it necessarily cleared after a certain time span. I want it to get below a certain lower value before it clears. I want the alert threshold and the reset threshold to be different. I figured it would be as easy as just setting a custom condition on the reset, just like on the alert. But the reset condition page looks just like the alert condition page. You select either "all objects" or "following set of objects". Then the condition. I don't want to specify a specific object or set of objects. What I want is for the "scope of alert" to be this object. The one that alerted, whichever one that is. Based on the alert condition, it could be either node that alerted.
I thought maybe I was overthinking it, and that the reset really does apply only to the specific object that alerted, but then why have the two options for "scope of alert"? Why can't I specify this same object?
Thanks for the suggestion, partikmehta003. But doing that won't work. The fact that that won't work is what my post was about. The problem is that the reset condition page looks just like the alert condition page. You select either "all objects" or "following set of objects". Then the condition. If I copy the alert condition I just copy that same scope. I don't want to specify a specific object or set of objects. What I want is for the "scope of alert" to be this object. The one that alerted, whichever one that is. Using the scope options that are available, it could be either server that goes below 80 and thus clears the alert. If I have two servers and one goes above 90, the alert condition is met and it triggers. But the other server is still below 80 and so the reset condition is immediately met as well, thus resetting the alert. It should be the one that caused the alert that the reset is checking. Not either server. Just the one that caused it.
The setting I mentioned will still hold true for your explanation... For
ex: u have set alert condition for 10 servers using a group or custom
property, and u apply the same for ur reset then u will get alert only for
the devices that breach the condition and not others... The reset will
apply as soon as it doesn't meet the alert condition...
I m not sure if this answers ur query.. another option I can think of is
having separate alert for separate threshold, but this makes sense only
when you have less number of devices
Yes to add to that, the Reset condition is only applied to the element that triggered the alert in the first instance. So, for example, your Server A (eg Name SYSPROD01) triggers an alert for CPU @ 95% it won't reset until that server goes below 80%.
If in that period of time Server B (eg SYSPROD02) hits 92% it will initiate a new alert trigger action which in turn will not be cleared until that specific server has met the reset condition.
If this were not true then we would need to have 1000's of alerts for all different variations of groups.
Thanks guys. That would make sense to me, that the reset conditions would only apply to the server that triggered the alert. But that's not what the reset page is saying will happen. The reset page asks for scope for the reset condition, and the only options it gives are:
It doesn't let you specify the reset condition and apply that condition only to the alerting server node. It's one or the other. You can either say "when the alert condition clears" and not be allowed to specify a reset condition. Or you can specify a reset condition but not scope it to the alerting node.
If it's true that the reset condition is only applied to the element that triggered the alert, then the options I apply on the reset page aren't what it actually looks for.
I think I am guessing here, but it sounds like you need a more complex trigger, and should still use your simple reset with the adjusted threshold of below 80% for 2 mins before reset.
Instead of using your naming prefix/circumfix/interfix/suffix or whatever affix, separate your two servers by name within an OR condition and include each separate thresholds for trigger. Now if you want different thresholds for reset, you would need separate alerts.
A screenshot would help us all understand as well.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.