This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Reduce number of alerts for NOC

I use alert suppression with some degree of success, but I have another issue and am wondering what others have done.

Our Operations center would like to know when something of interest is going on so they can wake up and look at the Orion screen to get specifics.  I don't want a hundred emails to go to them.  So basically they would be alerted in some way that any number of alerts had just been generated, but only once within a specified period of time, say 1/2 hour.

It's funny, this is easy to do in Kiwi, a much simpler program, but I have not hit on the right way to do it in Orion.

Suggestions?

Debbi

  • Firstly, I am assuming that you are using Advanced Alerts in Orion?

    Assuming you are using Advanced Alerts it's really a matter of tuning how often the alerts are checked, how long the condition has to exist before the Trigger Action occurs and doing so while keeping your polling intervals in mind so you aren't having the alerting engine double check the same data-points.

    If you could provide some more details and maybe a few actual examples, I could try and provide more specific suggestions.

    I have had to do the same thing for our 24x7x365 NOC.  What I found is that you can significantly reduce the number of false positives but you can't eliminate them; that's why you have people there to investigate stuff.

    So basically they would be alerted in some way that any number of alerts had just been generated, but only once within a specified period of time, say 1/2 hour.

    To the best of my knowledge, doing this specifically is not currently possible.

  • I am using advanced alerts.  For example, let's say one of our local fiber hubs goes down (fiber hubs are ComCast, we do not have monitoring of them).  This would affect approximately 10 of our sites, generating a "down" condition for all of our switches and wireless access points for all of the sites, maybe 100 emailed alerts or more.  Even if I did an alert suppression for each of the sites, that is still at least ten alerts, IF the suppression works as hoped (as you know it depends on timing, even if your main node is polled more frequently than the other nodes at the site).

    Debbi

  • In this case you could create one alert for all 10 sites and have it only fire off it all of the sites are down.  If there was a way for you to monitor the fiber hub you could use dependencies but it sounds like that isn't an option here.

  • Debbie -

    Have you tried creating service groups for these sites so you get a single alert that the group is down and investigate that?