SAM Script Component Monitors - Everything you need to know

Question

Overview

Since I enjoy scripting, I figured that I'd gather all of my thoughts around what is required when creating a custom script component and place it here. Hopefully it'll help as you build your own.

There are two main things that the SAM script component monitor is looking for:

1. The Exit Code.

Ok, I'll be honest. You may have seen an exit code table in the documentation. However, trust me when I say don't worry about following it exactly. Make sure that you exit with 0 nearly all the time. Only when something terrible happens in the script should you use exit 1. And by terrible, i mean something in the script breaks. Let the statistic value/threshold determine the status of the component (see #2 below). You'll have a much better experience writing your own scripts this way.

2. Script Output

At a high level, SAM is really just looking for the word 'Statistic:' followed by a numerical value. e.g. "Statistic: 123:" How you print that to the console varies based on the scripting language that you're using. PowerShell uses "Write-Host", Bash/Shell uses "echo", Python uses "print", OK, you get the idea.

You can give your output a custom label by doing Statistic.<name>. e.g. Statistic.Name, Statistic.FileCount, Statistic.Label, add whichever helps after the Statistic., just make sure there are no spaces in the name. The .<name> is not required but is helpful if you return multiple values from the same script. You can report up to a max of 10 statistic values per script.

You can also have an optional message(text) returned with the statistic value. If used, the Message name has to follow the same name being used with your Statistic. e.g. Statistic.value1 and Message.value1, Statistic.value2 and Message.value2. If your Message.<name> does not match your Statistic.<name> then SAM will throw an error. The Message is optional, it's only if you want to return additional info on top of your statistic value. The Message can merely return informational text or error information. Depending on the needs of your script you could create different if/then/else statements, based on the statistic value which can have different Message formats to help understand what your script is doing.

Detail Type

Required

Meaning

Statistic

Yes

A numeric value used to determine how the monitor compares to its set thresholds. This must be an integer value, (negative numbers are supported).

Statistic.Name1: 123

Statistic.Name2: 456

Message

No

An error or information message to be displayed in the monitor status details. Note: Multi-line messages are supported. To use this functionality print each line using a separate command. For example:
Message.Name1: abc

Message.Name2: def

Example: If i created a script that outputs "Statistic.filecount" then it'll use filecount as the variable name where I can set my warning and critical thresholds. You can also see that i'm including a Message.filecount in the output to help me better understand the issue if it goes into alarm.

2016-06-17 09_43_24-Edit Application Template - SE Forum Backups.png

Python on Windows

See this article: Using a Python script as a SAM component

Linux Script Monitor

There are a few more settings on the Linux Script Monitor that need to be set for it to function properly.

Script Working Directory: /tmp <-- You can change this directory if you prefer, but I always set it to this
Command Line: <scripting language path> ${SCRIPT} <-- You don't have to include the entire path but I like too just in case. The check transfers everything in the Script Body section to the Linux server and then executes it. The ${SCRIPT} tells SAM to reference that script body.
- perl ${SCRIPT}
- python ${SCRIPT}
- bash ${SCRIPT}

2016-06-17 13_15_05-Edit Application Template - SE Forum Backups.png

Nagios Script Monitor

This is the one that seems to confuse the most people, mostly because there isn't a lot documented around it and answers are scattered around Thwack.

The Nagios Script Monitor works following way:

Connects to target machine using SSH (the machine where is application is assigned to).
Executes script.
Reads the text from standard output and returns it to Orion.
Closes the SSH connection.
Text output is parsed and saved to database.

**This means that ALL checks are executed on the endpoint node, not SolarWinds. This is important since most Nagios Plugins require you to specify the hostname with -H. In these cases you'll want to change the check to use -H localhost as pictured below.

***I've seen instances where the Nagios script doesn't like localhost. If localhost doesn't work but testing with a hard coded IP/Hostname does, then you'll need to modify the Nagios script directly to use an Orion variable (See Misc section below). In the Nagios script, where it sets the -H parameter into a variable, force that variable to equal ${IP}. The ${IP} is an Orion variable that pulls the IP of the node the check is run against in Orion. This is only needed in rare occasions.

If you have access to view the code of the script, then copy/paste it into the Script Body. In the Command Line is where you'll place your Nagios parameters. Replace perl with bash or python based on what the script is using.

2016-06-17 10_01_14-Edit Application Template - Nagios Plugins.png

If the Nagios script is a compiled binary file (ELF), then there is more work involved. Since we can't transfer the script to the server during the check then that means that it has to already be on the server. Your Script Working Directory can be blank. The Script Body just needs any character (can't be empty).

The Command Line is where you'll place the full path to the script along with all of the Nagios parameters.

2016-06-17 10_03_49-Edit Application Template - Nagios Plugins.png

Nagios Troubleshooting

Script output values are not defined or improperly defined
- For Nagios Script Monitor, output must be delimited by spaces.
  - Original ouput: Memory OK | total=7981;used=7643;free=337;shared=0;buffers=186;cached=2587
  - Expected output: Memory OK | total=7981 used=7643 free=337 shared=0 buffers=186 cached=2587

Misc

Orion allows you to use Orion specific variables in your script which can make it more dynamic as you assign them out to multiple nodes. You reference these variables directly in your script. You're not able to pass these variables in the Script Arguments section.

Here is a list of variables that you can use.

${IP}
${USER}
${PASSWORD}
${creds}
${PORT}
${Node.SysName}
${Node.Caption}
${Node.DNS}
${Node.ID}
${Component.ID}
${Component.Name}
${Application.Id}
${Application.Name}
${Application.TemplateId
${Threshold.Warning}
${Threshold.Critical}
${Node.Custom.CustomPropertyName}
${Application.Custom.CustomPropertyName}

I pulled the list of available script variables from the admin guide and this thwack post.

This turned into a novel but I hope it helps. I've also attached some sample scripts too.