Check out this new Technical Reference on Understanding Orion Advanced Alerts.
The suppression is tied to the trigger condition, so it can be as specific as the trigger.
This is a wonderful guide, thanks for posting this.
One question I still have regarding suppression;
Is the suppression an open ended query that can be met by any object or set of objects in the database that if met will suppress the alert or is it specific to the object that triggered the alert in the first place?
If the first is true then the example in the guide of a suppression versus a suppression embedded in the Alert Conditions are not the same. The first would look through the entire database for any interface that met the condition where as the second would only look at the interfaces on the device that triggered the alert.
And then there is the third possibility of me completely not understanding something here. = )
Per this post: and based upon my experience, I was under the impression that the suppression was not related to the triggering object(s) and could be met if the condition is true for any object in the database.
Hmmm. I'll have to check with dev on that one
You are correct in that the entire db is scanned for the suppression condition but this is done as a combined query with the trigger condition, so the two examples would be equivalent. (Unless there is a nuance I'm not getting)
So, if I had the following alert and suppression for that alert, would it suppress the alert for only the node named Bob and still notify me for all other nodes where the CPU was greater than 90 or would it always suppress the alert so long as a node existed in the database named Bob?
In my experience it would suppress all alerts so long as there was a node in the database named Bob.
Trigger when all of the following apply
CPU Utilization is greater than 90
Suppress Alert when all of the following apply
Node name is equal to Bob
If I may interject please. Based on the understanding that suppression is based on a separate DB query, it appears that it may be good general practice to "suppress' using the proper logic qualifiers on the Trigger tab so as to be more specific. It would seem all too possible to create suppression conditions that are less specific that none desires.
If that sounds like a wise approach, then there is an even greater case for using well designed custom properties than ever before. If one tries to accomplish "suppression" on the Trigger tab rather than the Suppression tab, the list of suppressions could get quite long, depending on the situation. Therefore, custom properties would come in quite handy in maintaining functionality and sanity!
Thoughts?
Borgan
Firstly I just have to say that your avatar makes me laugh every time I see it!
As far as your points are concerned, I completely agree. In most (probably all) environments smart use of custom properties will make handling alerts much easier and coherent.
As far as how suppressions are handled, I (think) can see cases for both doing them via the suppression tab as well as building them into the alert itself; however, once again in either case good use of custom properties is going to make all the difference in the world.
Thanks Byron. That is a pic of my grandson!
Back to the subject at hand. So, lets try a simple example to see if we are on the same page.
Say you have a network of one switch with ten attached nodes. You also have 10 other nodes that are not attached to that switch.
Then you construct a node down alert for all 21 nodes but include a suppression condition for the switch-attached nodes based on the status of the switch. With our current proposed understanding of the nature of the suppression SQL query, we will NOT get a node down alert for ANY node, not just for the nodes attached to the switch. In other words, the suppression condition assumes a universal dependency which in fact does not exist.
In the real world, there will be many switches or other upstream devices, each with dependent devices. So, the challenge becomes how to configure suppression that is specific, yet all-inclusive.
Are we looking at more alerts, rather than less alerts to get this done?
Say you have a network of one switch with ten attached nodes. You also have 10 other nodes that are not attached to that switch.Then you construct a node down alert for all 21 nodes but include a suppression condition for the switch-attached nodes based on the status of the switch. With our current proposed understanding of the nature of the suppression SQL query, we will NOT get a node down alert for ANY node, not just for the nodes attached to the switch. In other words, the suppression condition assumes a universal dependency which in fact does not exist.
Yes, this is correct.
Now, good use of custom properties you could turn this rather useless alert/suppression combination into something useful. Lets take your example and say that each of your nodes had a custom property that defined which upstream device they were attached to. Now if in your alert you include only devices with that switch as the upstream device things should work out.
I think we are probably looking at more alerts the more intelligent you want them to be. There are some cases where you can create a very complicated alert by nesting different tests in it but that can also become a management nightmare in it's own right.
I did see a post at one point in the forums where somebody used a custom property to note which tier of his network the gear was on and then used some intelligent suppressions based on that. However; I think it all really comes down to what kind of an environment you are working with.
Ok, so we're on the same wavelength it looks like.
To extend this a little more, would you then generally agree that:
(1) If your network is relatively small and simple, you might get away with using the Suppression tab in cases where truly universal alert suppression is needed.
(2) On networks where devices are more intricately meshed and multiple real dependencies exist (true of majority of networks), then "suppression" may be more reliably accomplished using custom properties on the trigger tab, while essentially leaving the Suppression tab unused.
Enough work for this week. Going to get some egg nog.
Hrm, I am not sure these are necessarily true. As I sat and thought about this I found myself wondering down a lot of different complex potential situations and use cases. Eventually I came full circle back to a simple thought...
I think you have to look at things from the perspective of the NMS system.
In all cases (both large and small networks) you will have one or a few devices upstream of your NMS and if those devices go down everything else will appear to be down for the NMS. For these devices a suppression makes sense.
Also, as you begin to chunk your infrastructure up into different networks the primary gateway devices for those networks may be good places for suppressions as well for the networks that lie behind them.
In general I see more cases where things will be getting filtered as part of the alert but there definitely seem to be some use cases for the suppressions as well. I don't necessarily have a good feel one way or the other that the size of your network will mandate using one over the other.
DISCLAIMER: The thoughts above are only thoughts many of which occurred during the time of this writing and are not in any way based on real testing.