This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Advanced Alert Supression - Avoid re-alerting when re-enabling Alerts?

I have a requirement to disable certain alerts, or maybe all alerts, during maintenance windows.  That part is easy, but the requirement is to continue monitoring during the window, just suppress alerts, and once the window is over re enable alerting without flooding everybody’s phone with pages for the devices that went down during the window. 
 
So I think I know how I can do most of this.  If I disable the Advanced Alerts I have created (all advanced alerts here) then clear the events of any device that went down during the window before re enabling the alerting, I think this gets me most of the way there.  Here is the problem.  What if I have a device that had been down for 3 days, and still is down, that is part of the Alert I am about to turn back on.  Well unless I go hunt down all of the trigger events for all the devices that might be in alerting status but had nothing to do with the impacted devices in my maintenance window, everybody on my team gets woke up by those pages going back out when I turn alerting back on.
 
Do you know anyway around this issue?  Many NMS solution have an option to suspend alerting, then re enable it without all the alerts that had been already triggered prior to the suspension of the alerting going back out?  I have case 181794 but so far support has been stumped.
 

  • Did you ever receive any further information about how to accomplish this?

    We're in the same boat.

  • you may want to try an email to the man, as this shows on his profile

    Last Logged In: Jan 24, 2013 3:29 PM

    Let me know what you find when you do.... I could use some help on this topic.

  • I thought this was a family channel

  • But it changes when the sun goes down...

  • I have a similar task for my company's maintenance windows.  As you mentioned, disabled then re-enabling advanced alerts will cause any existing alerts to retrigger.  More importantly, it resets the age of the alert which may affect SLA requirements.  Instead of disabling alerts, I found that another way to avoid email floods was to simply redirect the email actions to an SMTP server that does not exist.  The actions will still kick off (good for logging assets affected by the maintenance window), but will die in the ether when the SMTP relay cannot be reached.  Below is the SQL query I use...

    declare @oldServerName As Varchar(255)

    declare @newServerName As Varchar(255)

    set @newServerName = '0.0.0.0'

    set @oldServerName = 'WHATEVER_THE_IP_OF_YOUR_SMTP_RELAY_IS'

    update [orion].[dbo].[ActionDefinitions] set [Target]  = REPLACE(CAST( [Target] AS VARCHAR(MAX)),'SMTPSERVER:'+@oldServerName,'SMTPSERVER:'+@newServerName)

    WHERE Target like '%SMTPSERVER:%'

    select * from [orion].[dbo].[ActionDefinitions]

    Then when the window is over, swap the values for old and new and run it again.

  • Detroiter,

    Do you run that as a query in SQL Management Studio?

  • Has anyone heard of a way to throttle emails out if the conditions cause a storm of alerts, in a global way on the server? I have run into situations (okay, caused is a better word) where thousands of emails go out due to the wording of the alert. My Exchange god can block them eventually, but is there a way for the NPM software or the server to catch the situation and stop itself, similar to the way that an snmp trap alert can be told to pause for so many minutes after seeing a certain number of traps?

  • Hi Steve, the above requirement looks like a typical maintenance requirement. Why do you want to take an approach with Alerting here ?

    Why dont to use Unmanage Schedule Utility available ? Unmanage all the devices that are there in the list for the defined period of maintenance (this would take start time and numbers or hours you want to unmanage upto which is end date). Unmanage and Re-manage of the node is well suited for such requirements.

    There are several others ways to achieve this apart from Unmanage/Remanage or alerting. Let me know if the above approach suits best of your scenario or not

    Thanks in advance

  • Vinay,

    I cannot speak for Steve, but I think that the Unmanage function has one major limitation.  While it successfully suppresses alerts, it also stops all polling data collecting for the unmanaged nodes.  I would prefer to have an option to chose whether to stop all monitoring and alerting or just stop alerting for specific nodes.   Maybe we want to see how the specific nodes are doing while their being worked on, but don't want to get bombarded with alerts.  This is my "two cents" contribution; I hope it helps.