cancel
Showing results for 
Search instead for 
Did you mean: 

Using the Linux Agent communication as an Alert Action to kick off script

Using the Linux Agent communication as an Alert Action to kick off script

This scenario is a little unique but I'm happy with how it turned out. First, to outline the requirements on why I needed to do this.

Requirement: Kick off a script on a linux host as an alert action. We want this so we can attempt to 'self-heal' aspects of an application by running automation scripts to fix common problems.

Constraints: Unable to leverage SSH as alert action due to PCI requirements/concerns. All communication must leverage agent communication.

The major hurdle is the Alert Engine does not have an Alert Action that allows me to use agent communication to execute a script on a remote host. (I've raised this an a feature request).

Below is an outline of how I created a work-around. The example below outlines a simplified version I mocked up in my lab.

Example: I am monitoring Apache on a CentOS host, I want to restart the httpd service if it goes down. I need to automate the Orion Alert Engine to kick off a restart script for the httpd daemon. I wrote this article a few years back that leverages SSH but I'm unable to use this for my current scenario due to my constraints.

Prerequisites:

  1. A new SAM Application Template
    1. A single Linux/Unix Script Component Monitor which contains the following script to restart the httpd daemon. I had to include the 'echo "statistic: 0"' so it would it would follow valid scripting requirements for SAM. This is where I'd place any automation script that I'd like to run.
      1. pastedImage_22.png

    2. Application Template Preferred Polling Method set to 'Agent'
      1. pastedImage_24.png
    3. I need to note the TemplateID since I'll need to reference this later.
      1. You can find the TemplateID by editing the template and looking in the URL. In this example, TemplateID = 854pastedImage_4.png
    4. Last and more important part: This template is NOT assigned to any node. Once this template is assigned, it's going to run this restart script, not something I obviously want to keep assigned.
  2. Update the attached PowerShell script with the following info, starting on Line 35-ish
    1. Orion Hostname
    2. Orion Username
    3. Orion Password

Executive Summary

  1. httpd daemon goes down and an associate alert is triggered.
  2. Alert Action of 'Execute an external program' is leveraged to kick off powershell script on main Orion Server.
  3. PowerShell script calls into the Orion API with relative alert info
    1. API Call to assign an existing SAM Template with restart script (Shown in prerequisites section, the script needs to reference the correct TemplateID, FYI)
    2. API Call to Force Poll Now of newly assigned Application Template (I'm impatient and I want this to work as quickly as possible)
    3. API Call to unassign new application template
  4. httpd service has now been restarted and back online

Detail Overview

I have a running httpd service, Yay!

pastedImage_7.png

Something happens and apache dies, sad panda.

pastedImage_8.png

I built an alert to look for this explicit condition.

Here are my Alert Trigger Conditions.

pastedImage_34.png

Here is an Alert Trigger Action of 'Execute an external program' to kick off my PowerShell script (Attached to this Thwack article below) to call into the Orion API and work it's magic.

pastedImage_35.png

Here are the details of the execute an external program. It's basically the path to where my PowerShell script lives, which is currently on the C: of my main Orion server.

pastedImage_36.png

Here is the full path in the screenshot above. I have to pass in the current applicationID of what is in alarm as well as the TemplateID of what I need to assign.

  • C:\windows\system32\windowspowershell\v1.0\powershell.exe -ExecutionPolicy unrestricted -command "C:\SolarWindsScripts\Linux_Agent_Service_Restart.ps1 '${N=SwisEntity;M=Application.Node.NodeID}' '854'"

Now I sit back and let Orion do its thing. Here comes the alert!

pastedImage_10.png

Automation begins! New Application Template is assigned which will kick off the restart script.  When its done it will unassign the application template

pastedImage_9.png

Huzzah! Apache is running again automagically!

pastedImage_11.png

Attachments
Comments

I like this, but I've lately figured that if I'm sitting down to write custom scripts I may as well do the whole thing in the script. 

What I mean is I have been writing the scripted SAM monitors with the self healing already built in.  If I need to check a list of processes I loop through the list and have an if statement where if the thing is not working, do these things to fix it, and flag the monitor as being in warning and modify the message string to let me know that we already tried restarting services x,y, and z.

Depending on the use case we might get more elaborate with it but that's the general idea.  Just trying to streamline my resolutions.

mesverrum​, I completely agree with you. If you have a custom script already then you might as well build the remediation into to it. The scenario above had the requirement to use existing non-script based component monitors so this was my work-around.

Version history
Revision #:
1 of 1
Last update:
‎01-20-2020 11:10 AM
Updated by:
 
Contributors