This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SAM Script Component Monitors - Everything you need to know

Overview

Since I enjoy scripting, I figured that I'd gather all of my thoughts around what is required when creating a custom script component and place it here. Hopefully it'll help as you build your own.

There are two main things that the SAM script component monitor is looking for:

1. The Exit Code.

Ok, I'll be honest. You may have seen an exit code table in the documentation. However, trust me when I say don't worry about following it exactly. Make sure that you exit with 0 nearly all the time. Only when something terrible happens in the script should you use exit 1. And by terrible, i mean something in the script breaks.  Let the statistic value/threshold determine the status of the component (see #2 below). You'll have a much better experience writing your own scripts this way.

2. Script Output

At a high level, SAM is really just looking for the word 'Statistic:' followed by a numerical value. e.g. "Statistic: 123:" How you print that to the console varies based on the scripting language that you're using. PowerShell uses "Write-Host", Bash/Shell uses "echo", Python uses "print", OK, you get the idea.

You can give your output a custom label by doing Statistic.<name>. e.g. Statistic.Name, Statistic.FileCount, Statistic.Label, add whichever helps after the Statistic., just make sure there are no spaces in the name. The .<name> is not required but is helpful if you return multiple values from the same script. You can report up to a max of 10 statistic values per script.

You can also have an optional message(text) returned with the statistic value. If used, the Message name has to follow the same name being used with your Statistic. e.g. Statistic.value1 and Message.value1, Statistic.value2 and Message.value2. If your Message.<name> does not match your Statistic.<name> then SAM will throw an error. The Message is optional, it's only if you want to return additional info on top of your statistic value. The Message can merely return informational text or error information.  Depending on the needs of your script you could create different if/then/else statements, based on the statistic value which can have different Message formats to help understand what your script is doing.

Detail Type

Required

Meaning

Statistic

Yes

A numeric value used to determine how the monitor compares to its set thresholds. This must be an integer value, (negative numbers are supported).

Statistic.Name1: 123

Statistic.Name2: 456

Message

No

An error or information message to be displayed in the monitor status details. Note: Multi-line messages are supported. To use this functionality print each line using a separate command. For example:
Message.Name1: abc

Message.Name2: def

Example: If i created a script that outputs "Statistic.filecount" then it'll use filecount as the variable name where I can set my warning and critical thresholds. You can also see that i'm including a Message.filecount in the output to help me better understand the issue if it goes into alarm.

2016-06-17 09_43_24-Edit Application Template - SE Forum Backups.png

Python on Windows

See this article: Using a Python script as a SAM component

Linux Script Monitor

There are a few more settings on the Linux Script Monitor that need to be set for it to function properly.

  • Script Working Directory: /tmp  <-- You can change this directory if you prefer, but I always set it to this
  • Command Line: <scripting language path> ${SCRIPT} <-- You don't have to include the entire path but I like too just in case. The check transfers everything in the Script Body section to the Linux server and then executes it. The ${SCRIPT} tells SAM to reference that script body.
    • perl ${SCRIPT}
    • python ${SCRIPT}
    • bash ${SCRIPT}

2016-06-17 13_15_05-Edit Application Template - SE Forum Backups.png

Nagios Script Monitor

This is the one that seems to confuse the most people, mostly because there isn't a lot documented around it and answers are scattered around Thwack.

The Nagios Script Monitor works following way:

  1. Connects to target machine using SSH (the machine where is application is assigned to).
  2. Executes script.
  3. Reads the text from standard output and returns it to Orion.
  4. Closes the SSH connection.
  5. Text output is parsed and saved to database.

**This means that ALL checks are executed on the endpoint node, not SolarWinds. This is important since most Nagios Plugins require you to specify the hostname with -H. In these cases you'll want to change the check to use -H localhost as pictured below.

***I've seen instances where the Nagios script doesn't like localhost. If localhost doesn't work but testing with a hard coded IP/Hostname does, then you'll need to modify the Nagios script directly to use an Orion variable (See Misc section below). In the Nagios script, where it sets the -H parameter into a variable, force that variable to equal ${IP}. The ${IP} is an Orion variable that pulls the IP of the node the check is run against in Orion. This is only needed in rare occasions.

If you have access to view the code of the script, then copy/paste it into the Script Body. In the Command Line is where you'll place your Nagios parameters. Replace perl with bash or python based on what the script is using.

2016-06-17 10_01_14-Edit Application Template - Nagios Plugins.png

If the Nagios script is a compiled binary file (ELF), then there is more work involved. Since we can't transfer the script to the server during the check then that means that it has to already be on the server. Your Script Working Directory can be blank. The Script Body just needs any character (can't be empty).

The Command Line is where you'll place the full path to the script along with all of the Nagios parameters.

2016-06-17 10_03_49-Edit Application Template - Nagios Plugins.png

Nagios Troubleshooting

  • Script output values are not defined or improperly defined
    • For Nagios Script Monitor, output must be delimited by spaces.
      • Original ouput: Memory OK | total=7981;used=7643;free=337;shared=0;buffers=186;cached=2587
      • Expected output: Memory OK | total=7981 used=7643 free=337 shared=0 buffers=186 cached=2587

Misc

Orion allows you to use Orion specific variables in your script which can make it more dynamic as you assign them out to multiple nodes. You reference these variables directly in your script. You're not able to pass these variables in the Script Arguments section.

Here is a list of variables that you can use.

  • ${IP}
  • ${USER}
  • ${PASSWORD}
  • ${creds}
  • ${PORT}
  • ${Node.SysName}
  • ${Node.Caption}
  • ${Node.DNS}
  • ${Node.ID}
  • ${Component.ID}
  • ${Component.Name}
  • ${Application.Id}
  • ${Application.Name}
  • ${Application.TemplateId
  • ${Threshold.Warning}
  • ${Threshold.Critical}
  • ${Node.Custom.CustomPropertyName}
  • ${Application.Custom.CustomPropertyName}

I pulled the list of available script variables from the admin guide and this thwack post.

This turned into a novel but I hope it helps. I've also attached some sample scripts too.

SampleScriptMonitors.zip
  • Love this! I frequently have folks ask for a quick primer on how to use the custom script monitors in SAM...

    Now I can send them this article! emoticons_happy.png

  • Regarding exit codes, you state:

    Only when something terrible happens in the script should you use exit 1. And by terrible, i mean something in the script breaks.

    Is that the correct philosophy to take? If something in the script breaks then I would think that a more appropriate exit status to return would be 4+ (i.e. "unknown"). Failure of the script isn't necessarily indicative of failure of the component.

    I think the message/statistic relationship needs a little further clarifying too: i.e. you can't have a message without a statistic with the same label. If you return a labelled message without a matching labelled statistic it just gets ignored. If you don't return a statistic at all (just a message) you'll get an error "Can't identify dynamic column definitions from script output."

  • Good points. From what I've seen there is a bug with how SAM handles exit codes and really anything outside of exit 0 fails. Anything Exit 1 or great produces a error like in the screenshot below. Once that red error occurs, SAM throws the statistic and message data out the window. In the scripts that I build i try to catch/handle exceptions within the script. However, if a major exception occurs where the script isn't going to product any useful data anyways, then I exit with 1 so I can investigate what happened. I've raised this issue with SolarWinds engineering team and they were able to replicated the same data. In the end, you're welcome to use any exit code that you like, I just stated above what works well for me.

    2016-06-21 07_55_57-Product 311 - chad.every@solarwinds.com - Outlook.png

    I'll take a look at the wording around statistic and message and see where I can improve it.

  • I believe that in this case "Exit 2 - Warning" and "Exit 3 - Critical" components were not properly (fully) configured - either "Get script output" on script definition dialog was not pressed during template creation, or possibly script did not return expected "Statistic" fields when it was executed (but I think that it would lead to different error message in that case).

    Exit codes for WARNING or CRITICAL can be useful when you want script logic to determine these statuses (e.g. based on something else than numeric output value), otherwise you can exit with return code for UP and then let standard Orion logic in data processing to change it based on thresholds (if they are defined and exceeded).

  • Is there any way to write back to a custom property from a script monitor and/or with an alert that handles the result of a script monitor?

    chad.every

  • So there is a possibility but there are some caveats that come with it. There are only relevant custom properties are for the node or application (There are others, but none that pertain to monitoring applications). There are no custom properties for components. So the best you could do would have a script monitor update an Application custom property and then reference that in an alert. The water gets muddy if there are multiple component monitors updating the same application custom property. So when multiple components are at play, you'd have to append to the application custom property then parse it for the specific component in an alert.

    This is something I put together awhile ago to update an application custom property via the Orion API.

    PowerShell script to modify an APPLICATION Custom Property via Rest API/JSON

  • You can also use the macro ${IP} to get the node's IP address into the script.

  • Thanks for the post.

    One more question about multiple output: is it possible to hide N/A strings?