Check out this new Technical Reference on Understanding Orion Advanced Alerts.
The suppression is tied to the trigger condition, so it can be as specific as the trigger.
This is a wonderful guide, thanks for posting this.
One question I still have regarding suppression;
Is the suppression an open ended query that can be met by any object or set of objects in the database that if met will suppress the alert or is it specific to the object that triggered the alert in the first place?
If the first is true then the example in the guide of a suppression versus a suppression embedded in the Alert Conditions are not the same. The first would look through the entire database for any interface that met the condition where as the second would only look at the interfaces on the device that triggered the alert.
And then there is the third possibility of me completely not understanding something here. = )
Per this post: and based upon my experience, I was under the impression that the suppression was not related to the triggering object(s) and could be met if the condition is true for any object in the database.
Hmmm. I'll have to check with dev on that one
You are correct in that the entire db is scanned for the suppression condition but this is done as a combined query with the trigger condition, so the two examples would be equivalent. (Unless there is a nuance I'm not getting)
So, if I had the following alert and suppression for that alert, would it suppress the alert for only the node named Bob and still notify me for all other nodes where the CPU was greater than 90 or would it always suppress the alert so long as a node existed in the database named Bob?
In my experience it would suppress all alerts so long as there was a node in the database named Bob.
Trigger when all of the following apply
CPU Utilization is greater than 90
Suppress Alert when all of the following apply
Node name is equal to Bob
If I may interject please. Based on the understanding that suppression is based on a separate DB query, it appears that it may be good general practice to "suppress' using the proper logic qualifiers on the Trigger tab so as to be more specific. It would seem all too possible to create suppression conditions that are less specific that none desires.
If that sounds like a wise approach, then there is an even greater case for using well designed custom properties than ever before. If one tries to accomplish "suppression" on the Trigger tab rather than the Suppression tab, the list of suppressions could get quite long, depending on the situation. Therefore, custom properties would come in quite handy in maintaining functionality and sanity!
Thoughts?
Borgan
Firstly I just have to say that your avatar makes me laugh every time I see it!
As far as your points are concerned, I completely agree. In most (probably all) environments smart use of custom properties will make handling alerts much easier and coherent.
As far as how suppressions are handled, I (think) can see cases for both doing them via the suppression tab as well as building them into the alert itself; however, once again in either case good use of custom properties is going to make all the difference in the world.
Thanks Byron. That is a pic of my grandson!
Back to the subject at hand. So, lets try a simple example to see if we are on the same page.
Say you have a network of one switch with ten attached nodes. You also have 10 other nodes that are not attached to that switch.
Then you construct a node down alert for all 21 nodes but include a suppression condition for the switch-attached nodes based on the status of the switch. With our current proposed understanding of the nature of the suppression SQL query, we will NOT get a node down alert for ANY node, not just for the nodes attached to the switch. In other words, the suppression condition assumes a universal dependency which in fact does not exist.
In the real world, there will be many switches or other upstream devices, each with dependent devices. So, the challenge becomes how to configure suppression that is specific, yet all-inclusive.
Are we looking at more alerts, rather than less alerts to get this done?
Say you have a network of one switch with ten attached nodes. You also have 10 other nodes that are not attached to that switch.Then you construct a node down alert for all 21 nodes but include a suppression condition for the switch-attached nodes based on the status of the switch. With our current proposed understanding of the nature of the suppression SQL query, we will NOT get a node down alert for ANY node, not just for the nodes attached to the switch. In other words, the suppression condition assumes a universal dependency which in fact does not exist.
Yes, this is correct.
Now, good use of custom properties you could turn this rather useless alert/suppression combination into something useful. Lets take your example and say that each of your nodes had a custom property that defined which upstream device they were attached to. Now if in your alert you include only devices with that switch as the upstream device things should work out.
I think we are probably looking at more alerts the more intelligent you want them to be. There are some cases where you can create a very complicated alert by nesting different tests in it but that can also become a management nightmare in it's own right.
I did see a post at one point in the forums where somebody used a custom property to note which tier of his network the gear was on and then used some intelligent suppressions based on that. However; I think it all really comes down to what kind of an environment you are working with.
Ok, so we're on the same wavelength it looks like.
To extend this a little more, would you then generally agree that:
(1) If your network is relatively small and simple, you might get away with using the Suppression tab in cases where truly universal alert suppression is needed.
(2) On networks where devices are more intricately meshed and multiple real dependencies exist (true of majority of networks), then "suppression" may be more reliably accomplished using custom properties on the trigger tab, while essentially leaving the Suppression tab unused.
Enough work for this week. Going to get some egg nog.
Hrm, I am not sure these are necessarily true. As I sat and thought about this I found myself wondering down a lot of different complex potential situations and use cases. Eventually I came full circle back to a simple thought...
I think you have to look at things from the perspective of the NMS system.
In all cases (both large and small networks) you will have one or a few devices upstream of your NMS and if those devices go down everything else will appear to be down for the NMS. For these devices a suppression makes sense.
Also, as you begin to chunk your infrastructure up into different networks the primary gateway devices for those networks may be good places for suppressions as well for the networks that lie behind them.
In general I see more cases where things will be getting filtered as part of the alert but there definitely seem to be some use cases for the suppressions as well. I don't necessarily have a good feel one way or the other that the size of your network will mandate using one over the other.
DISCLAIMER: The thoughts above are only thoughts many of which occurred during the time of this writing and are not in any way based on real testing.
Because of the way the suppression tab method works, I try to avoid using it unless the criteria is very straight forward and simple. Otherwise, it can be fairly easy to create suppression criteria that doesn't do what you think it will. I would definitely agree that custom properties are your friend when it comes to creating both alerts and alert suppressions.
The bigger problem I see is that the current interface is not very user-friendly when it comes to creating alert suppressions. I think what's needed is an interface that allows users to create dependencies between devices and leaves the logic required for suppressing alerts to the internal workings of the software. Currently, users have to figure out how to express this logic to the program which can be very cumbersome and confusing.
Yes, dependencies (which some probably confuse with suppression) are something that Orion does not currently allow for.
Andy, can you shed any light as to whether SolarWinds is working on incorporating the ability to define device dependencies (in node management?) in a future version of Orion?
Some dependancy suppression can be done now. Samples are in the paper and the Orion AG, but it is fairly complicated and labor intensive. I'll have to let the PM address future plans.
Andy
I like the updated guide. It seems that once you start talking dependencies some kind of event correlation starts potentially becoming involved. From what I've seen over the years from HP, SMARTS, and now cisco too with products like ANA some very complex logic can be involved. You can look at how cisco is doing it here:
http://www.cisco.com/en/US/docs/net_mgmt/active_network_abstraction/3.6/fault/1fltmn.html
Root Cause event correlation almost becomes an art not a science. Cisco sums it up in this line:
Meeting the event management challenge is done by correlating related events into a sequence that represents the alarm lifecycle, and using the network dependency model to determine the causal inter-relationship between alarms.
Hello,
I would like to thank the SolarWinds team for providing the Advanced Alerts Guide. This short reference has helped me alot in understanding how everything works.
Would it be possible to include all of the Alert Variables, and if possible, a list and description of the Conditions?
I find myself going back and forth between this guide and the NPM Admin Guide while working on alerts, and having everything in one place would go a long way.
Thanks,John
Hi John,
I can't promise anything now but that would be a great addition. I'll put it on my list.