Hey everyone,
I've seen the various conversations on alerts, so wanted to share a bit of experience and preference I have.
In my companies original stand up for SolarWinds, We had a couple custom properties that flagged a responsible team and alert tiering for a specific node. The tiering affected the alerts received for Node status (Up/Down), CPU, Memory, and volume utilization. When it came to SAM alerts, it is a bit more wild west. It seemed like every SAM template had its own customized alert and the configuration on them varied on how they were setup. Some were flagged to the app, some to component, and some were further hardcoded to specific node names. To go a step further, the email alert actions were all over the place. It seemed like each alert object had its own email action. Then it varied if it went to a distribution list, indivuals, or multiple people specifically. This eventually lead to scenario of "how come I didn't get an alert on this," just to find out there was an alert for it but it went to someone that left the company months ago. Then trying to find those scenarios in the GUI were rather difficult. Additionally, as more teams were being brought into SolarWinds, they were wanting to be notified of the node health for the systems where their applications lived. So the old responsible team with a tier was no longer feasible. I won't even go into team rebranding and changes to the distribution list emails.
Then came the talks of redoing SolarWinds in our environment. We stumbled on some posts that referenced using custom properties to populate the To field in the email side of things. After some testing, we validated that it was pretty reliable. The only bug we ran into is if someone copies/pastes an email and it brings across hidden ascii characters. So we moved forward with using that approach and various custom properties to set the alert actions for Up/Down, CPU, Memory, Volumes, Components, Applications, etc... This allowed us to re-use the same alert action across an object for multiple alerts.
So with all that said, the prereqs for this is creating some custom properties. In our environment we use a Text field, value must be specified. You will need to create one per object type you want to alert on (Node, Volume, Application, etc...). Since you can not use the same custom property name across all templates, I would recommend having Email or EmailAddress as a suffix or prefix to some kind of identifier. Its really personal preference on that one if you want all of the Email properties grouped together in Custom Properties Management, or everything for the object grouped.
So for my examples:
Nodes: N-Email
Application: A-Email
Volumes: V-Email
Transactions (WPM): T-Email
With the custom properties in place, go ahead and create the alerts per the standard in your environment. We tend to follow using scope to limit down the list of systems and then use the trigger condition for the object that changes. Our scopes tend to target custom properties on what action we want to take (Email, Email/Page, Console, etc...). So once you get to the Trigger Actions, you simply need to put your custom property into the To field. If you want to cheat, you could insert the variable into the body of the email and then cut and paste that to the To field.
So your To field should look like the following:
Nodes: ${N=SwisEntity;M=CustomProperties.N-Email}
Application: ${N=SwisEntity;M=CustomProperties.A-Email}
Component: ${N=SwisEntity;M=Application.CustomProperties.A-Email}
Volume (based on volumes): ${N=SwisEntity;M=CustomProperties.V-Email}
Volume (based on Node): ${N=SwisEntity;M=Node.CustomProperties.N-Email}
Transaction: ${N=SwisEntity;M=CustomProperties.T-Email}
Transaction Step: ${SwisEntity;M=Transaction.CustomProperties.T-Email}
Based on our custom properties. Admins are able to set if they want an application alert to be triggered based off of the component status or the application status. On the Node and Volume front, we have a property so that the users can trigger volume alerts specificially for the volume itself or treat all volumes on the system the same (from the node information). The volume ones have worked great so our operations teams get the OS drive alerts and then the specific app owners get the alerts for their volumes.
So with this custom property approach, we were able to standardize our basic alerts (CPU, Memory, Status, etc...) to a smaller subset of alerts instead of each team having their own. The text box and To field accept multiple addresses, so if additional teams want to be included for a specific node, app, etc... its just a matter of adding the DL. I believe some use a comma seperated list, we have had success with semicolon. Whenever a team wants to rebrand, we can do a mass update of the custom property through manage custom properties. The email field also gave us a good field to create dashboards around. We can now create a dynamic group where Email contains TeamXYZ, so teams get all systems they care about.
I haven't found a good way to detect the hidden ascii characters via SWQL as of yet. I've had the conversation in the past and someone posted a SQL query to find them. Here is a SWQL query that gives failed alert actions over the past month.
select ToLocal(aa.[timestamp]) as TriggeredDateTime , aa.message ,SUBSTRING(aa.Message,CHARINDEX('"ErrorMessage","Value":"',aa.message), LENGTH(aa.message)) As ErrorMessage ,CASE WHEN aa.eventtype = 0 then 'Triggered' WHEN aa.eventtype = 1 THEN 'Reset' WHEN aa.eventtype = 2 THEN 'Acknowledged' WHEN aa.eventtype = 3 THEN 'Note Added' WHEN aa.eventtype = 4 THEN 'Added to Incident' WHEN aa.eventtype = 5 THEN 'Action Failed' WHEN aa.eventtype = 6 THEN 'Action Succeeded' WHEN aa.eventtype = 7 THEN 'Unacknowledge' WHEN aa.eventtype = 8 THEN 'Cleared' END AS EventType ,ac.name As Alert ,'/Orion/NetPerfMon/ActiveALertDetails.aspx?NetObject=AAT:' + ToString(AO.AlertObjectID) AS Alert_Link ,ao.entityCaption as Entity ,ao.EntityDetailsUrl as Entity_DetailsURL from Orion.Alerthistory aa join Orion.AlertObjects ao on ao.AlertObjectID = aa.AlertObjectID join Orion.AlertConfigurations ac on ao.AlertID = ac.AlertID Where aa.eventtype = 5 AND aa.[timestamp] > ADDMONTH(-1,GetDate() ) Order by aa.[timestamp] Desc
If anyone has questions, comments, concerns or has a better way to do it, please feel free to chime in. This has worked great for our environment, but of course milage may vary. If someone has a better mousetrap then I'm definitly all ears.