This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SWQL: Triggering an email notification when someone mutes a node, after it's down.

Hi, please bear with me here..

I'm trying to create an alert, that when a node goes down, it triggers a custom property 'Node_Down_Trigger' = 'Node Down'.  Now, when someone 'MUTES' a node, IF, the custom property 'Node_Down_Trigger' = 'Node Down', THEN the trigger action would change 'Node Down_Trigger' = 'Acknowledge' AND sends and email, with text saying that someone is working on the down server.

To add, when a server goes DOWN, I have a trigger setting the 'Node_Down_Trigger' = ''Node Down', and when it resets, it changes to 'Node Online'.  This helps me identify if a server is muted while 'online' for maintenance, or 'down' for support escalation, or a node went down, then came back online without support looking into it.

Now, I've created this SWQL Alert and Trigger Action below, which is the only way I know how to include a custom property field.

However, I think there's an issue, that IF someone MUTES the alert, the trigger action is suppressed and does not take place.  Only when I have, 'resumed' alerts is when I get the trigger.

Q: IS there a way to have this type of alert+trigger?

pastedImage_1.png

pastedImage_2.png

  • You can do this by creating an alert on Auditing Events that uses a custom SWQL query, like this:

    pastedImage_8.png

    The query text in the white box is:

    INNER JOIN Orion.Nodes AS Nodes on AuditingEvents.NetObjectID = Nodes.NodeID

    WHERE AuditingEvents.NetObjectType = 'N'

    AND AuditingEvents.ActionTypeID = 102

    AND Nodes.CustomProperties.NodeDownTrigger = 'Node Down'

    This trigger will fire if someone mutes a node where the custom property "Node_Down_Trigger" was previously set to "Node Down."

    Once you've set up this trigger, you can add trigger actions. You won't be able to use the "Change Custom Property" action, however, because this is an Audit Event alert and you want to change the custom property of a Node object. This is not insurmountable, though.

    Use the Solarwinds REST API to set the custom property. Add a "Send a GET or POST Request to a Web Server" trigger action:

    pastedImage_13.png

    The URL* is:

    https://localhost:17778/SolarWinds/InformationService/v3/Json/${N=SWQL;M=SELECT Uri from Nodes where NodeID = (SELECT NetObjectID from AuditingEvents where AuditEventID = ${N=SwisEntity;M=AuditEventID} )}/CustomProperties

    *(Sorry about the SWQL select in there - the alert manager doesn't expose the NetObjectID of an Audit Event, so you need to query it).

    The body to POST is:

    {"Node_Down_Trigger":"Acknowledge"}

    Alternatively, you could wrap this logic in a Powershell script. Pass the AlertEventID to the script, let the script to the lookups using the API, and have it submit the update request via the API as well.

  • Thanks  m-milligan

    I do get the trigger action, (sms-to-email), but after testing this a few times, if I happen to mute the node a few times consecutively, it will fire off x(times).  Is there a way to have it only get the 'latest' one?

    pastedImage_1.png

  • Add this line to the WHERE clause:

    AND Nodes.Uri NOT IN (select EntityUri from Orion.AlertSuppression where SuppressUntil is NULL OR MINUTEDIFF(GetDate(), SuppressUntil) > 0)

    This will exclude any node that has already been muted.

  • The whole thing may be faster if you use SQL instead of SWQL, because you can use an EXISTS() clause to limit the data that needs to be examined. This is the part of the query that goes in the editable query box on the Trigger Condition page:

    INNER JOIN Nodes N on AE.NetObjectID = N.NodeID

    INNER JOIN NodesCustomProperties NCP on NCP.NodeID=N.NodeID

    WHERE AE.NetObjectType = 'N'

    AND AE.ActionTypeID = 102

    AND NCP.NodeDownTrigger = 'Node Down'

    AND EXISTS (select EntityUri from AlertSuppression2 AS2 where AS2.EntityUri LIKE concat('%NodeID=',N.NodeID) AND ( SuppressUntil is NULL OR SuppressUntil > GetDate()) )

  • Thanks m-milligan, this will help have some immediate transparency when someone mutes the alert.

  • I just tried this, and it looks to be triggering off some old 'muted and old scheduled muted events'.  Any ideas on  how I can filter this?  Thanks much.

  • How often does your alert definition check for new alerts? Say your alert definition runs every 5 minutes. You could limit it to only events that happened in, say, the last 5 minutes. Add this to the WHERE clause:

    AND MINUTEDIFF(AuditingEvents.TimeLoggedUtc,GETUTCDATE()) < 5

  • you are not accounting for SuppressFrom. Not sure if it matters, but what if From is scheduled in the future? would the above logic still work?

  • I didn't address SuppressFrom in my query because the OP was only concerned about the combination of (Node is down) and (someone took action). The way the query is written, it would trigger even if the SuppressFrom date is in the future.

    It could certainly be rewritten to make sure the SuppressFrom date is within a few minutes of the current time (let's say 15 minutes, to allow some time for the alert trigger to run and for someone to respond to it):

    INNER JOIN Nodes N on AE.NetObjectID = N.NodeID
    INNER JOIN NodesCustomProperties NCP on NCP.NodeID=N.NodeID
    WHERE AE.NetObjectType = 'N'
    AND AE.ActionTypeID = (SELECT ActionTypeID FROM AuditingActionTypes WHERE ActionType='Orion.AlertSuppressionAdded')
    AND NCP.NodeDownTrigger = 'Node Down'
    AND EXISTS (
    select EntityUri from AlertSuppression2 AS2
    where AS2.EntityUri LIKE concat('%NodeID=',N.NodeID)
    AND ( SuppressUntil is NULL OR SuppressUntil > GetDate())
    AND ABS(datediff(mi,AE.TimeLoggedUtc, SuppressFrom)) <= 5
    )

    AlertSuppression2.SuppressFrom and AuditingEvents.TimeLoggedUtc are both UTC times, so neither needs to be converted from local time first.

    Note that I've replaced

    AND AE.ActionTypeID = 102

    with

    AND AE.ActionTypeID = (SELECT ActionTypeID FROM AuditingActionTypes WHERE ActionType='Orion.AlertSuppressionAdded')

    because I've found that the ActionTypeID values aren't always consistent between installations.