Help! I'm being MAULED BY ALERTS, or How To Manage High Alert Volumes and Keep Your Sanity

Version 1

    In my environment, we are very alert heavy. Alerts are handled differently based on the subject line of the alert, or the component name, and are used for alerting folks of issues, as well as reporting on unusual statistical anomalies. Most of the alerts are generated from SAM, and it's impossible to set practical thresholds for many applications that Orion monitors. We just don't pay much as much attention to an alert unless the subject says, "THIS OBJECT IS FREAKING ON FIRE." If you set up your outlook rules correctly, then this isn't that bad of a method of handling things. One example would be an alert that has an automated action, and the trigger continues to update everyone on the DL what the statistic is. If that number continues to fall, everything is ok. If it continues to rise, something is on fire, and there's likely an alert for that that spells out the event as being catastrophic, critical, or however you want to call it. Either way, we get a LOT of alerts, and that isn't going away. At least not anytime soon. However it is important for me to know and keep an eye on who the problem children are, be they alert conditions or alert objects.

     

    To give you an idea of how alert heavy we are, on average I recieve 3000 to 4000 alerts a day generated from about 200 +/- alert rules in Orion. When I first took over this project, I tried auditing those to remove any fluff, and found that we're most likely missing events and need more alerts. It's a diverse environment...  Last week, we had an event where there were 15,000 alerts generated in a single day. Managing this much volume from the alert service is impossible if you stick to the individual events, so sometimes it is helpful to take the 30,000 foot view of things before you can take a step in the right direction.

     

    I use the attached reports to help identify any unusual alert trends, and continually audit and improve our processes.

     

    Each report can be plugged into the Report Writer as a custom SQL report.

     

    On a side note, I recommend adding the alert's name to the email body to help keep track of alerts. Poorly designed alerts are a headache, and trying to track down noisy alerts can be an exercise in itself. Especially if someone has to inherit your configuration with or without documentation.

    Alert Name: ${AlertName}

     

    On the Alert Detail report, it's important for me to tell you two facts:

     

    1) that the report only covers the modules that I have installed. Any additional modules you may have will need to be included in their own union / select, and modify your TSQL accordingly.

    2) that I use Nodes.Platform, a custom property, to group these alerts by the relevant service platform. This should be changed or removed to conform to your configuration. This is easy to modify with a good text editor like Sublime Text (my favorite) or Notepad++.

     

    I hope these are helpful to any of you who have to manage high volume alerts.