6 Replies Latest reply on May 28, 2015 5:19 AM by akhasheni

    site-wide alert exceptions?

    akhasheni

      Is there a way to tell NPM and SAM to not alert if a node or app is unmanaged in one place rather than edit all alerts one-by-one?

       

      Say, I want alerts on nodes and apps that are not up and build a number of different alerts that do different actions based on node/app tier, core application, location and other parameters, and say, I have 23,000 of those alerts. (No, I only have 25 - but still.)

       

      Now, I do not want alerts on any node or app that is unmanaged or in maintenance - or perhaps external.

       

      Is there a way to apply these alert exceptions to all existing alerts in bulk rather than edit them one-by-one?

       

      Thanks.

        • Re: site-wide alert exceptions?
          LadaVarga

          Hello,

           

           

          Alert actions are not executed if object is unmanaged by default. You will see alert in Orion, but no email/log/...

           

          Can be changed in Polling settings:

          Allow alert actions for unmanaged objects

           

           

          Lada

          • Re: site-wide alert exceptions?
            silverbacksays

            If you give your nodes a custom property for 'Location' (or whatever suits), you can build exceptions into alerts for this property.


            For example you could have a "Nodes that are Down" alert, but you don't want to see alerts for nodes that are in London. So you assign all London nodes the "Location" custom property value of "London" and that add a suppression rule on your "Nodes that are Down" alert for where the node custom property "Location" is equal to "London".

             

            You can use the same logic for almost any requirement, using a combination of specific alerts for specific custom properties, and exceptions from some alerts for others.

              • Re: site-wide alert exceptions?
                akhasheni

                That's the point, you have to add that suppression to each alert. There is no way to add one rule that would apply to all alerts, like "do not send any alerts for nodes that are down and that are located in London". (I know this particular condition doesn't make sense - in my case, I want to add an "in maintenance" custom property and silence all alerts for all nodes and apps when that property is set to "yes". So, modify each alert manually? Really?)

                 

                This is compounded by another pet peeve of mine: escalation rules can't be triggered by object properties, only by time elapsed. (See ALERT ESCALATION: BEST PRACTICES? thread.) E.g. can't add an escalation such as "if the object custom property "tier" is 1, email every 15 minutes; if not - just once". This requires separate alerts for each such value of the custom property. As it is, we have 40 active alerts, and they'll keep adding up. Adding suppressions to each one makes it too complex for my liking.

                 

                In your "London" scenario, let's imagine you have 200 active alerts: interfaces, volumes, nodes, app and component monitors, multiplied by conditional escalations, types of issues, teams that handle them. Now you need to add that location-based suppression mechanism: no alerts on anything in London as we have alerts that are specific to the London team. So, modify each and every one of those 200 alerts? Yikes.