Currently testing and looking to implement some base application templates that covers all processes seen on a given server. This list would ideally not be a powershell or other script ran against a node directly but using the RealTimeProcessExplorer function within SolarWinds.
I would ultimately also parse out any OS specific non critical to monitor processes and be left with the processes that would normally need to be looked at manually on a server. This should ideally give a good view of how the server really is working and if CPU or Memory are high consistently be able to determine which processes are doing it?
Would something like this also work for services?
One of the reasons is that my environment heavily restricts direct powershell connections to our servers which is normal but solarwinds isnt necessarily one of the direct ways. We monitor primarily via WMI and winRM is not configured currently.
Not everything has an agent either so I am curious if this method will even work for WMI nodes currently or not. I have a pending task to get with my server team and cybersecurity team to discuss winRM with SolarWinds as a trusted source but no ETA on that.
I have theorized the following process as a somewhat valid framework but it is untested and I wanted to get some opinions on this.
Step 1. Create an alert with a trigger action to run external program, approve action and override default count of 10 to whatever threshold is feasible. For my testing purposese I did 1000 with a timeout of 300 seconds.
Step 2. query active alerts via the below format using solarwinds API.
$AlertName = "insert above created alertname"
$query = "SELECT A.AlertActiveID, A.AlertObjectID, A.AlertObjects.AlertNote, A.AlertObjects.EntityUri, A.Alertobjects.EntityCaption, A.Acknowledged, A.AcknowledgedBy, A.AcknowledgedDateTime, A.AcknowledgedNote, A.TriggeredDateTime, A.TriggeredMessage FROM Orion.AlertActive as A WHERE A.TriggeredMessage = '$($AlertName)'"
$alternativeQuery = "SELECT A.AlertActiveID, A.AlertObjectID, A.AlertObjects.AlertNote, A.AlertObjects.EntityUri, A.Alertobjects.EntityCaption, A.Acknowledged, A.AcknowledgedBy, A.AcknowledgedDateTime, A.AcknowledgedNote, A.TriggeredDateTime, A.TriggeredMessage FROM Orion.AlertActive as A WHERE A.TriggeredMessage = '$($AlertName)' AND A.Acknowledged IS NULL"
Next steps:
Once Alert details have been obtained - can then filter further based on if Alert was Acknowledged or Not within the query to limit the dataset powershell needs to work with?
Then verify if AlertNotes are empty or not. If not empty - verify that the AlertNotes contains the header: Name Process ID and CPU
Once verified to contain the right data - have powershell parse data into an object based data set: Properties will equal the three header names.
Once parsed into workable data set - create application template with each process being its own component. Application Template Name will be date/time of creation and NodeName so this can be tracked.
Example: Base Template - insert node name - insert date insert time
Once application template has been created - have it assign to designated node - the Nodes URI is obtained via alert query above.
Once Application Template has been assigned to designated node - acknowledge the triggered alert and append new note stating that application template ID and Name were created/assigned to Node.
Once all alerts have been processed and app templates created - can turn off Alert Definition (need to figure this part out)
This should allow alerts to be acknowledged as they are getting worked on and then allow history to remain during workflow.
Once alert definition is turned off via API - this will clear all alerts automatically.
Message Center will have full audit trail of alert being triggered for each node, acknowledged for each node, Application Template creation/assignment and then the disable of the alert definition will be logged.
This allows for full audit trail via Message Center and for transparency of progress via active alerts until job completion.
Thoughts?