Good afternoon everyone!
I working on an alert to trigger whenever a user mutes a node without specifying a end-date. I tested the following SWQL query in SWQL Studio:
SELECT Nodes.Uri, Nodes.DisplayName FROM Orion.Nodes AS Nodes
Join Orion.AlertSuppression n on n.EntityUri=Nodes.Uri
where n.SuppressUntil is NULL
It successfully provides me a list of nodes muted without an end-date. When I popped into the alert I attached to this thread, it even gives me a warning about the nodes listed as objects that will be triggered immediately. However nothing happens when I make it active. I expected it to pop up as an active alert, what have I done wrong? I would appreciate any advice that anyone can provide. Thanks!
Solved! Go to Solution.
Ok everyone, I appreciate all the attention this got and with some additional time to think about it, I got it figured out. I just needed to target the audit events opposed to the nodes themselves.
SELECT * FROM [dbo].[AuditingEvents]
Where AuditingEvents.AuditEventMessage like '% muted alerts on %' and TimeLoggedUtc > DATEADD (minute, -1, SYSUTCDATETIME())
Problem solved! Now I have an alert that will trigger every time a user fails to schedule end time for muting a node. I appreciate everyone's help!
No, I had the same issue. Everything is working as expected. Your alert is perfect, the condition is met, then the next step is to see if the node is muted, which it is, so the alert doesn't fire.
I had to end up writing a script using the API to check for this and create tickets in our helpdesk system. It would be nice to have a checkbox on the alert like "always alert even on muted nodes or applications"
And to just head off the question "Why would you want to alert on muted nodes?"... because people will mute them and forget to unmute them.
So yes, if you mute the node (without a schedule) you will get called. If you mute the node with a scheduled end date more than two weeks out, you will get called. Muting is for production nodes that you want to work on for a reasonable, temporary time and not get called. If it can be down for more than two weeks, then it's not a production server and you need to set the custom properties accordingly. Yeah go ahead and roll those eyes at me...
That's exactly why I am trying to find a solution. I got called into my boss' office to explain why monitoring was broken. It wasn't broken, but a tech had muted a node forever and thus it never alerted. I'd love to just disable the mute and unmonitor option entirely (not the schedule part, just forever part). I've thought about doing some custom coding to remove it, but I'd have to reapply every time I update the software.
Why not just generate a dashboard or a report displaying all these unmuted objects? I add unmanaged or muted objects lists to every team dashboard that way they know they are responsible for their own stuff.
That's a great idea, however I am still in talks with management to change our current dashboard. In particular, upper management gets pretty unset when anything changes. I would prefer that to creating an API script that runs on the server but I feel that management directives will force my hand and I'll have to do the former.
At the end of the day the mute function was built explicitly for the purpose of not alerting on an object, so you are working against the designed purpose of the feature. I can think of lots of sql trickery that would accomplish the goal by misdirecting the "object" of the alert to some other entity, but thats all goofy work arounds that inevitably someone else is going to have to try to decipher when you leave the position and they will throw their hands up and say "that last guy was a weirdo, lets delete everything and start over." In my consulting career I spent a LOT of time rebuilding environments when they changed hands because nobody could unravel the ball of string that the previous admin had tangled up over the years.
A daily report of muted objects with no end date specified is a reasonably straight forward way to get past the alert issue, but the next time a problem slips through the cracks your mgmt will ask, "who was getting this report" and that leaves you with the burden of ensuring these reports are going out all over the place and that people are reading them.
I am a fan of allowing my end users to self service as much as possible so I want to make sure that when anyone on their team mutes something then it is front and center that the monitoring team is no longer responsible for that object, so making it as simple as possible to see those things in as many places as possible is just good CYA on my end.
At end of the day, my job is do what I have been directed to do. It's not to worry about the intended purpose of the function or what will happen after I move on from a role. Ideally, I wish something was built-in to deal with this problem, but there isn't one. There is a report provided to the teams, but I cannot make sure everyone is reading it. Honestly, I hate having to go to a report and advise management that so-and-so did this so yell at them. Ultimately the objective is to avoid this problem not just a CYA solution. This issue will have to be dealt with actively and since I'm only one person will have to be dealt with programmatically. All that being indicated, I appreciate your thoughts on the subject. Thanks!
I'm in favor of more of the report as well. But I can see why majames84 would want to go the alert way. Where my thoughts differ is that as the monitoring engineer, I view my role as being the kid in the back of the classroom that some see as the nark for pointing out bad behavior to the teacher. It is my job to inform the teacher of the bad behavior but it is up to the teacher to correct that behavior. Monitoring is pretty much the same. It is my job to point out the issue with the nodes we monitor but it's up to that server owner to actually fix it.
I do like the programmable approach to this issue though and is something I'll look at doing within my new environment.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.