This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Statistic Threshold issues on a Linux custom script on SAM

I’m having issues getting a custom Linux script to report correctly as an application monitor. This is the script in question. I’ve got others working fine.

#!/bin/bash

stat=`grep current /var/www/lab-temps.html |awk '{print $4}'|sort -r|head -1|sed -e s/"\."/" "/g|awk '{print $1}'`;

echo "Statistic: $stat"

date >> /tmp/temptest

echo "Statistic: $stat" >> /tmp/temptest

It’s very rudimentary; search a given page for values and output the highest one as $Statistic only *IF* that value exceeds 70. That works fine:

./temp.sh

Statistic: 75


So my script is absolutely working fine and executing fine, as seen here:


[root@host tmp]# tail -f temptest

Fri Jun 19 12:31:37 PDT 2015

Statistic: 74

Fri Jun 19 12:32:11 PDT 2015

Statistic: 74


However, my Application monitor refuses to fail. Here is my Script Output #1 section:


Screen Shot 2015-06-19 at 12.35.39 PM.png



As near as I can tell from all documentation, this -- with the Statistic of "74" returning -- should be in a Critical state here, yet it's not.


What am I doing wrong? Why isn’t this triggering in the Dashboard? The ultimate goal is to simply have it toss us an email and SMS if the temps exceed x value based upon Statistic. This will be ultra helpful for me as I've got a queue of other number-result based custom linux monitors I want to build out along these lines.

  • Based on the script you posted above, it doesn't look you're passing an exit code. E.G. the error level returned to SAM when the script executes so SAM knows if the script executed properly or not. Thresholds can be used to override status from the script if so desired. In that case I would recommend editing with a status of "0" when the script ran successfully and "1" or "5" if it fails (your choice). E.G. "Exit 0"

    Scripts Must Report Status Through Exit Codes

    Scripts must report their status by exiting with the appropriate exit code:

    Exit Code

    Meaning

    0

    Up

    1

    Down

    2

    Warning

    3

    Critical

    Any other value

    Unknown

  • Interesting, so what's the point of letting us define the Statistic Threshold with any integer then? e.g. "warning if greater than x value"?

  • If thresholds are defined then they will control the warning/critical status. If they are not, you can pass warning/critical state as part of the scripts exit code. The choice is yours. Most people use thresholds to define status, but we still need to know if the script ran successfully or not, which is where the exit code comes into play.

  • Yeah, I just quadruple checked and it's absolutely running right.

    We know SAM is successfully executing it -- that "temptest" file is created and updated BY the SAM system logging into the server in question and executing my custom script.

    When I test the script's execution inside of SAM -- using SAM tools -- it reports the exactly correct statistical value for the "$statistic", completely reliably. So as near as I can tell, SAM is always getting back the right data from my rudimentary script.

    However, my Application monitor refuses to fail.

    From all the documentation and my understanding, this means that if the temperature output -- the $statistic value -- is 50 or higher I should get a WARNING state, and if it's 60 or higher I should get a CRITICAL state.

    SAM is perfectly executing the script and receiving back a value 100% of the time between 71 and 75, yet I never get a WARNING or CRITICAL to tie later into my email and SMS alerting.

    What am I doing wrong? I guess it's off to tech support...