Good plan for automating process monitoring for servers or just not ideal?

Question

Currently testing and looking to implement some base application templates that covers all processes seen on a given server. This list would ideally not be a powershell or other script ran against a node directly but using the RealTimeProcessExplorer function within SolarWinds.

I would ultimately also parse out any OS specific non critical to monitor processes and be left with the processes that would normally need to be looked at manually on a server. This should ideally give a good view of how the server really is working and if CPU or Memory are high consistently be able to determine which processes are doing it?

Would something like this also work for services?

One of the reasons is that my environment heavily restricts direct powershell connections to our servers which is normal but solarwinds isnt necessarily one of the direct ways. We monitor primarily via WMI and winRM is not configured currently.

Not everything has an agent either so I am curious if this method will even work for WMI nodes currently or not. I have a pending task to get with my server team and cybersecurity team to discuss winRM with SolarWinds as a trusted source but no ETA on that.

I have theorized the following process as a somewhat valid framework but it is untested and I wanted to get some opinions on this.

Step 1. Create an alert with a trigger action to run external program, approve action and override default count of 10 to whatever threshold is feasible. For my testing purposese I did 1000 with a timeout of 300 seconds.

Step 2. query active alerts via the below format using solarwinds API.

$AlertName = "insert above created alertname"

$query = "SELECT A.AlertActiveID, A.AlertObjectID, A.AlertObjects.AlertNote, A.AlertObjects.EntityUri, A.Alertobjects.EntityCaption, A.Acknowledged, A.AcknowledgedBy, A.AcknowledgedDateTime, A.AcknowledgedNote, A.TriggeredDateTime, A.TriggeredMessage FROM Orion.AlertActive as A WHERE A.TriggeredMessage = '$($AlertName)'"

$alternativeQuery = "SELECT A.AlertActiveID, A.AlertObjectID, A.AlertObjects.AlertNote, A.AlertObjects.EntityUri, A.Alertobjects.EntityCaption, A.Acknowledged, A.AcknowledgedBy, A.AcknowledgedDateTime, A.AcknowledgedNote, A.TriggeredDateTime, A.TriggeredMessage FROM Orion.AlertActive as A WHERE A.TriggeredMessage = '$($AlertName)' AND A.Acknowledged IS NULL"

Next steps:

Once Alert details have been obtained - can then filter further based on if Alert was Acknowledged or Not within the query to limit the dataset powershell needs to work with?

Then verify if AlertNotes are empty or not. If not empty - verify that the AlertNotes contains the header: Name Process ID and CPU

Once verified to contain the right data - have powershell parse data into an object based data set: Properties will equal the three header names.

Once parsed into workable data set - create application template with each process being its own component. Application Template Name will be date/time of creation and NodeName so this can be tracked.

Example: Base Template - insert node name - insert date insert time

Once application template has been created - have it assign to designated node - the Nodes URI is obtained via alert query above.

Once Application Template has been assigned to designated node - acknowledge the triggered alert and append new note stating that application template ID  and Name were created/assigned to Node.

Once all alerts have been processed and app templates created - can turn off Alert Definition (need to figure this part out)

This should allow alerts to be acknowledged as they are getting worked on and then allow history to remain during workflow.

Once alert definition is turned off via API - this will clear all alerts automatically.

Message Center will have full audit trail of alert being triggered for each node, acknowledged for each node, Application Template creation/assignment and then the disable of the alert definition will be logged.

This allows for full audit trail via Message Center and for transparency of progress via active alerts until job completion.

Thoughts?

DGNetops · Answer

That report wont give you the specific services and processes would it? It's just a list of the installed software but not necessarily the running pieces?

The issue is that we have a ton of apps that need monitored but noone has taken the time to create specific templates. I am not in a position where monitoring is my daily as I also manage several of our network systems and can dedicate some time to working on things but if its not automated in some fashion it will quickly get out of date again.

I would love to not involve the alerting engine but I cannot at this time run a remote powershell script without installing an agent onto the servers.

That is also an issue since the internal agreement was that we would be agentless for most if not all servers.

Not sure how I can get the running services/processes without involving the solarwinds tools and just using a powershell script since while the account I have technically has access it is disallowed from running powershell scripts remotely against servers.

Im exploring the report from the asset inventory still so I might be wrong and if I can use it - that would be awesome just not sure at this time.

HerrDoktor · Answer

plus Running the real time process explorer on multiple devices programmatically will be probably painfully slow.

mesverrum · Answer

This feels a little harder than it has to be. If you are dealing with Windows servers then 99.9% of the time you can just leverage the Installed Software inventory and write a custom swql alert that just looks for the existence of your apps in the software list, if sql server is installed then set a custom property you use for tracking templates to "SQL Server" and use the automatic template assignment in SAM to add the SQL template to any server that matches that condition.

You can get really elaborate with that scheme, including support multiple templates at once, but i don't want to get too deep in the weeds initially.

If you go down the route of managing this all through a powershell script running on the Orion server itself you really don't need to bother involving the alert engine at all, you just embed all the relevant logic and activity directly in the script.