cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Silence Alerts While Still Monitoring

Silence Alerts While Still Monitoring

Posting this in NPM but the same would go for SAM applications.  There should be a way from the web UI to put an element in a "maintenance mode" in which the node is monitored but no alerts would go out.  Another user adatole developed a way to do this but I think it could be expanded upon (here is the original thread  http://thwack.solarwinds.com/thread/41779 ).  He referred to the concept as muting.

There are many times where this may be necessary.  Let's say WAN routers for example, we receive dozens of notices from carriers saying that a circuit will be down for maintenance for 5 minutes between 12AM - 6AM.  I don't want to unmanage the device and lose all of that information for what typically is short duration outage.  I do on the other hand want to silence any alerts that may come up during that period.

For this idea to be truly effective I believe the following components must be included

1.  The ability to put a Node/Interface/Volume/Application/Component in maintenance mode for a specific duration of time (just like we have for unmanaging)

2.  The ability to schedule a recurring task for this mode (just like the unmanage schedule utility)

3.  There should be a required note field that must be filled in by the user stating a reason for the maintenance mode

4.  Would need more granular rights to allow a user to put a device in this mode without giving full node management rights.

5.  Audit tracking of who made the changes similar to this idea (http://thwack.solarwinds.com/ideas/1034)

6.  The maintenance period should not be taken into account in availability reports.

Tags (2)
43 Comments
Level 12

Yep, I'm all for this. During maintenance, we just want to avoid notification flooding. We still would like to see if devices are up/down, especially as changes are performed. Perhaps a separate database table could be used for maintenance mode activity such that the details excluded from availability reports, etc would still be available in something like "maintenance reports".

Oh, and muting and maintenance should both be configurable and schedule-able via the web UI (no win32 apps). Pretty please?

Level 15

cmgurley wrote:

Oh, and muting and maintenance should both be configurable and schedule-able via the web UI (no win32 apps). Pretty please?

Great point....other than backend services everything should be web-ui.

Level 15

This is doable today using a workaround of sorts.

Create a Custom Property called Maint. Use this property in all your alert configs.

Under the trigger conditions you would add this as a trigger condition:

Maint = 'no'

Now you can mute the alerts using the Web Gui by changing this Custom Property to 'yes' (or something like the users ID)

You can run a report the next day showing any nodes where Maint is not equal to no.

Or add a view to your web page using this as an SQL Filter:

Nodes.Maint <> 'no'

This allows you to easily check for nodes that were not "unmuted" after the maintenance work was completed.

Level 15

I agree that it can be done today however I'm looking for a specific method to do it like the unmanage works today requiring an off time so to speak and the ability to schedule it.  Currently we'd have to mute when needed not schedule it for say 1-3 AM during a maintenance window.

Level 16

This would be useful to allow statistics collection to still continue until the nodes become unreachable due to maintenance, and but should feed into the node availability calculations. i.e. planned maintenance does not count towards the time a node is being down for SLA calculations. A custom property to turn off alerts doesn't help if you're trying to make sue of that feature.

I have used a feature like this (other external monitoring vendor) and we loved it. Muting the alerts allowed use of console to provide assurance that services/servers came back after forklift upgrade of core switching or other similar event.

Level 20

We have certain maintenance windows for specific types or groups of devices.  IE. server or network Sunday when many changes take place during the same regularly scheduled time window every month.

I'm glad we have growing options for dealing with maintenance periods.

Level 8

+1 on a maintenance status feature, and it should be assignable at the group level as well as by individual nodes.  My users have this with their current monitoring solution but I'm moving them to Orion.  This was one of the first things they asked about.

Level 8

I think you can simply used the "acknowledged" property for this.

If I understand you correctly, "acknowledge" will only work or be available after the alert was triggered. The goal, at least mine, would be to avoid alerts in the first place during planned maintenance for core network devices.