Email Alert Action: Send to Custom Property

Hey everyone,

I've seen the various conversations on alerts, so wanted to share a bit of experience and preference I have.

In my companies original stand up for SolarWinds, We had a couple custom properties that flagged a responsible team and alert tiering for a specific node. The tiering affected the alerts received for Node status (Up/Down), CPU, Memory, and volume utilization.  When it came to SAM alerts, it is a bit more wild west.  It seemed like every SAM template had its own customized alert and the configuration on them varied on how they were setup.  Some were flagged to the app, some to component, and some were further hardcoded to specific node names.  To go a step further, the email alert actions were all over the place.  It seemed like each alert object had its own email action.  Then it varied if it went to a distribution list, indivuals, or multiple people specifically.  This eventually lead to scenario of "how come I didn't get an alert on this," just to find out there was an alert for it but it went to someone that left the company months ago.  Then trying to find those scenarios in the GUI were rather difficult.  Additionally, as more teams were being brought into SolarWinds, they were wanting to be notified of the node health for the systems where their applications lived.  So the old responsible team with a tier was no longer feasible. I won't even go into team rebranding and changes to the distribution list emails.

Then came the talks of redoing SolarWinds in our environment.  We stumbled on some posts that referenced using custom properties to populate the To field in the email side of things.  After some testing, we validated that it was pretty reliable.  The only bug we ran into is if someone copies/pastes an email and it brings across hidden ascii characters.  So we moved forward with using that approach and various custom properties to set the alert actions for Up/Down, CPU, Memory, Volumes, Components, Applications, etc...  This allowed us to re-use the same alert action across an object for multiple alerts.

So with all that said, the prereqs for this is creating some custom properties.  In our environment we use a Text field, value must be specified.  You will need to create one per object type you want to alert on (Node, Volume, Application, etc...).  Since you can not use the same custom property name across all templates, I would recommend having Email or EmailAddress as a suffix or prefix to some kind of identifier.  Its really personal preference on that one if you want all of the Email properties grouped together in Custom Properties Management, or everything for the object grouped.

So for my examples:
Nodes:  N-Email
Application:  A-Email
Volumes:  V-Email
Transactions (WPM):  T-Email

With the custom properties in place, go ahead and create the alerts per the standard in your environment.  We tend to follow using scope to limit down the list of systems and then use the trigger condition for the object that changes.  Our scopes tend to target custom properties on what action we want to take (Email, Email/Page, Console, etc...).  So once you get to the Trigger Actions, you simply need to put your custom property into the To field.  If you want to cheat, you could insert the variable into the body of the email and then cut and paste that to the To field. 

So your To field should look like the following:
Nodes:  ${N=SwisEntity;M=CustomProperties.N-Email}
Application:  ${N=SwisEntity;M=CustomProperties.A-Email}
Component:  ${N=SwisEntity;M=Application.CustomProperties.A-Email}
Volume (based on volumes):  ${N=SwisEntity;M=CustomProperties.V-Email}
Volume (based on Node):  ${N=SwisEntity;M=Node.CustomProperties.N-Email}
Transaction:  ${N=SwisEntity;M=CustomProperties.T-Email}
Transaction Step:  ${SwisEntity;M=Transaction.CustomProperties.T-Email}

Based on our custom properties.  Admins are able to set if they want an application alert to be triggered based off of the component status or the application status.  On the Node and Volume front, we have a property so that the users can trigger volume alerts specificially for the volume itself or treat all volumes on the system the same (from the node information).  The volume ones have worked great so our operations teams get the OS drive alerts and then the specific app owners get the alerts for their volumes.

So with this custom property approach, we were able to standardize our basic alerts (CPU, Memory, Status, etc...) to a smaller subset of alerts instead of each team having their own.  The text box and To field accept multiple addresses, so if additional teams want to be included for a specific node, app, etc... its just a matter of adding the DL.  I believe some use a comma seperated list, we have had success with semicolon.  Whenever a team wants to rebrand, we can do a mass update of the custom property through manage custom properties.  The email field also gave us a good field to create dashboards around.  We can now create a dynamic group where Email contains TeamXYZ, so teams get all systems they care about.

I haven't found a good way to detect the hidden ascii characters via SWQL as of yet.  I've had the conversation in the past and someone posted a SQL query to find them.  Here is a SWQL query that gives failed alert actions over the past month.

select ToLocal(aa.[timestamp]) as TriggeredDateTime
    , aa.message
    ,SUBSTRING(aa.Message,CHARINDEX('"ErrorMessage","Value":"',aa.message), LENGTH(aa.message)) As ErrorMessage
    ,CASE 
        WHEN aa.eventtype = 0 then 'Triggered'
        WHEN aa.eventtype = 1 THEN 'Reset'
        WHEN aa.eventtype = 2 THEN 'Acknowledged'
        WHEN aa.eventtype = 3 THEN 'Note Added'
        WHEN aa.eventtype = 4 THEN 'Added to Incident'
        WHEN aa.eventtype = 5 THEN 'Action Failed'
        WHEN aa.eventtype = 6 THEN 'Action Succeeded'
        WHEN aa.eventtype = 7 THEN 'Unacknowledge'
        WHEN aa.eventtype = 8 THEN 'Cleared'
        END AS EventType
    ,ac.name As Alert
    ,'/Orion/NetPerfMon/ActiveALertDetails.aspx?NetObject=AAT:' + ToString(AO.AlertObjectID) AS Alert_Link
    ,ao.entityCaption as Entity
    ,ao.EntityDetailsUrl as Entity_DetailsURL
from Orion.Alerthistory aa
    join Orion.AlertObjects ao on ao.AlertObjectID = aa.AlertObjectID
    join Orion.AlertConfigurations ac on ao.AlertID = ac.AlertID
Where aa.eventtype = 5
     AND aa.[timestamp] > ADDMONTH(-1,GetDate() )
Order by  aa.[timestamp] Desc

If anyone has questions, comments, concerns or has a better way to do it, please feel free to chime in.  This has worked great for our environment, but of course milage may vary.  If someone has a better mousetrap then I'm definitly all ears.

Parents
  • We are doing the same thing exactly as you described and it has been working well. We use alert actions to fill in missing information on some of the custom properties and once a month run a report listing all of them, audit the report, then re-upload it in custom properties. As for cut/paste you could always paste into notepad or notepad+ first to see if there are any non visible characters.

    For the most part when we enter the responsible team and alert tier the responsible team usually have an email group defined for the whole team and since these align most of the time it is pretty easy to align the team with the email address in the custom props via alerting. This will vary however based on if the alert is going to the server team or an application team that is actual owner of the server.

    For Disk/CPU/Memory alert we use custom props for alert thresholds as well. For example by default our critical volume alert is 95% full (or 5% remaining) or 5GB or less remining. 

    So initially we set all the custom props at those levels. Some teams will contact us and want lower or higher thresholds so we just adjust the custom props for that specific node.    

    You could also consider doing some of the custom properties settings/audit via the API.  

    Another time saver is using variables when setting up your alerts, so you are not typing in the same information over and over for each action.

  • I wanted to do a drop down list originally for the email addresses, but the amount of variations and the length of the text box would have gotten too large to manage.  We require the teams to submit a DL to use, as individual names would have been too much to handle as well.

    The application teams have been wanting to get the notifications as well, just for a visibility perspective.  So the groups would have had too many branches for just a tier & team approach.  I was almost to the point of writing a powershell script to handle the email which would have had some kind of email array in it.

    For Disk/CPU/Memory, how come you are using custom properties for those instead of the built in thresholds?  We've been using the thresholds in our environment with alerting, and seems like its accomplishing the same thing.

    We do combine the threshold though with alerting properties for CPU, Memory, and Volumes.  So when the node is added, they can choose each object type individually if they want it to have No Alerts, Console Only (shows up only in the SW GUI), Email, or Email/Page.

  • Not OP, but have used similar CP for thresholds for years.  I think mostly its just legacy, in the old days managing those thresholds as a one-off basis was just really hard to do in the GUI, especially in bulk.  For green field deployments I stopped doing the properties a while back, but i could see why people dont want to break what already works and have to rework a bunch of dashboards and alerts and such to stop using the CP. 

  • Stopped doing the properties as in "I've got a new method I prefer" or as in "Someone else looks after them"?

  • In a green field I just use native thresholds, now that you can do "x of y" for basically all the major metrics I think they are in a really good place.

  • Good call.  I was thinking there was an easy way to set those in bulk from the GUI, but appears I am wrong on that.  Manually adjusting those would be a pain without powershell/api.

Reply Children
No Data