This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Memcached monitoring script problems - Illegal division by zero?

Hi all,

I'm attempting to monitor some instances of a nearly stock configured memcached with SAM 6 running in an RHEL 6.4 environment, and hitting some befuddling issues with the application monitoring template scripts. Here's an example.


I've set up credentials, discovered a node, and configured the memcached application template for that node with appropriate credentials and port. When the monitor runs, an error is produced with the following output:

Testing on node '172.30.44.211' failed with 'Unknown' status ('Unknown' might be different if script exits with a different exit code).

Output: =====================================================

Illegal division by zero at /tmp/APM_98127686.pl line 59.


I then go over to the SAM Template Editor, and run the memcached "General statistic" component monitor script there, using the same credentials and test node. I receive the same error.


Next, I've extracted that component's Script Body, dumped it into /tmp/test.pl, and run it as the same-credentialed user using the command line including correct port configured in the component. The output returned from the script appears correct and like it should be able to be parsed, with no perl "Illegal division by zero" error at all:


[solarwin@solarwindstest tmp]$ perl test.pl 11211

Message.Threads: Used threads: 4

Statistic.Threads: 4

Message.Current_connections: Current connections: 10

Statistic.Current_connections: 10

Message.Total_connections: Total connections: 20

Statistic.Total_connections: 20

Message.Connection_structures: Connection structures: 11

Statistic.Connection_structures: 11

Message.Current_items: Current items: 0

Statistic.Current_items: 0

Message.Total_items: Total items: 0

Statistic.Total_items: 0

Message.Used_for_caching: Used for caching: 0.00 MB

Statistic.Used_for_caching: 0.00

Message.Cache_usage_ratio: Memcache cache usage ratio: 0.00 %

Statistic.Cache_usage_ratio: 0.00

Message.Evictions: Evictions: 0

Statistic.Evictions: 0

Message.Denied_connections: Denied connections: 0

Statistic.Denied_connections: 0


I'm stumped! When I cut, paste, and run the script myself everything is fine. When I hit "Test" and let SAM login and run the script, "Illegal division by zero".

  • Yep! Here's the script that it's using:

    $port=$ARGV[0];

    if ($port eq '') {

    print "Message: Can't find \"port\" argument.\n";

    exit 1; }

    $arg1="echo \"stats\" \| nc 127.0.0.1";

    $cmd1=${arg1}.' '.${port};

    $arg2="echo \"stats\" \| netcat 127.0.0.1";

    $cmd2=${arg2}.' '.${port};

    $cmd=${cmd1}.' || '.${cmd2};

    @out=`$cmd`;

    $exit=`echo $?`;

    if ( $exit != 0 )

    {

    print "Message: Wrong port or memcached stopped or some problems with executing \"nc\" or \"netcat\" command. \n";

    exit 1;

    }

    for ($i=0; $i<@out; $i++) {

    if ($out[$i] =~ /threads/) {

      @t=split(" ",$out[$i]);

      $stat1=$t[2];

      }

    if ($out[$i] =~ /curr_connections/) {

      @t=split(" ",$out[$i]);

      $stat2=$t[2];

      }

    if ($out[$i] =~ /total_connections/) {

      @t=split(" ",$out[$i]);

      $stat3=$t[2];

      }

    if ($out[$i] =~ /connection_structures/) {

      @t=split(" ",$out[$i]);

      $stat4=$t[2];

      }

    if ($out[$i] =~ /curr_items/) {

      @t=split(" ",$out[$i]);

      $stat5=$t[2];

      }

    if ($out[$i] =~ /total_items/) {

      @t=split(" ",$out[$i]);

      $stat6=$t[2];

      }

    if ($out[$i] =~ /bytes/) {

      @t=split(" ",$out[$i]);

      $stat7=$t[2];

      }

    if ($out[$i] =~ /limit_maxbytes/) {

      @t=split(" ",$out[$i]);

      $stat8=$t[2];

      }

    if ($out[$i] =~ /evictions/) {

      @t=split(" ",$out[$i]);

      $stat9=$t[2];

      }

    if ($out[$i] =~ /listen_disabled_num/) {

      @t=split(" ",$out[$i]);

      $stat10=$t[2];

      }

    }

    $stat8=sprintf "%.2f", $stat7/($stat8/100.0);

    $stat7=sprintf "%.2f", $stat7/1024/1024.0;

    print "Message.Threads: Used threads: $stat1\n";

    print "Statistic.Threads: $stat1\n";

    print "Message.Current_connections: Current connections: $stat2\n";

    print "Statistic.Current_connections: $stat2\n";

    print "Message.Total_connections: Total connections: $stat3\n";

    print "Statistic.Total_connections: $stat3\n";

    print "Message.Connection_structures: Connection structures: $stat4\n";

    print "Statistic.Connection_structures: $stat4\n";

    print "Message.Current_items: Current items: $stat5\n";

    print "Statistic.Current_items: $stat5\n";

    print "Message.Total_items: Total items: $stat6\n";

    print "Statistic.Total_items: $stat6\n";

    print "Message.Used_for_caching: Used for caching: $stat7 MB\n";

    print "Statistic.Used_for_caching: $stat7\n";

    print "Message.Cache_usage_ratio: Memcache cache usage ratio: $stat8 %\n";

    print "Statistic.Cache_usage_ratio: $stat8\n";

    print "Message.Evictions: Evictions: $stat9\n";

    print "Statistic.Evictions: $stat9\n";

    print "Message.Denied_connections: Denied connections: $stat10\n";

    print "Statistic.Denied_connections: $stat10\n";

    exit 0;

  • Have you verified that the credentials you are using in SAM for that monitor have adequate permissions to execute that script? Also, have you tried assigning any other application templates to this node that use the Linux/Unix script monitor? Can you verify those are working as intended?


  • Nothing obvious in the script. I think aLTeReGo is on the right track.

  • $stat7 and/or $stat8 are sometimes 0 and it's getting upset about being asked to divide it later. If it's running okay on the command line try adding some debugging: write out $stat7 and $stat8 in a message, write out '0' as the statistic and exit.

  • Thanks for responding (all who have swarmed on this topic! I'm extremely impressed with this community!), those are good suggestions.

    I have verified the account permissioning: logged in as the `solarwin` user via SSH, dumped the script into /tmp, and ran `perl /tmp/test.pl 11211`. The script runs successfully this way.

    I've also ensured that the credentials are successful with other application templates. I also have Apache running on this machine and can execute the corresponding monitoring template components successfully using the same credentials.

  • What mystifies me is that zero values for the "used for caching ratios" calculations appears to work fine from the command line. I've created a test script to validate this:

    $port=$ARGV[0];

    if ($port eq '') {

    print "Message: Can't find \"port\" argument.\n";

    exit 1; }

    $arg1="echo \"stats\" \| nc 127.0.0.1";

    $cmd1=${arg1}.' '.${port};

    $arg2="echo \"stats\" \| netcat 127.0.0.1";

    $cmd2=${arg2}.' '.${port};

    $cmd=${cmd1}.' || '.${cmd2};

    @out=`$cmd`;

    $exit=`echo $?`;

    if ( $exit != 0 )

    {

    print "Message: Wrong port or memcached stopped or some problems with executing \"nc\" or \"netcat\" command. \n";

    exit 1;

    }

    for ($i=0; $i<@out; $i++) {

    if ($out[$i] =~ /bytes/) {

      print "DEBUGGING: $out[$i]";

      @t=split(" ",$out[$i]);

      $stat7=$t[2];

      }

    if ($out[$i] =~ /limit_maxbytes/) {

      @t=split(" ",$out[$i]);

      $stat8=$t[2];

      }

    }

    $stat8=sprintf "%.2f", $stat7/($stat8/100.0);

    $stat7=sprintf "%.2f", $stat7/1024/1024.0;

    print "Message.Used_for_caching: Used for caching: $stat7 MB\n";

    print "Statistic.Used_for_caching: $stat7\n";

    print "Message.Cache_usage_ratio: Memcache cache usage ratio: $stat8 %\n";

    print "Statistic.Cache_usage_ratio: $stat8\n";

    exit 0;

    Output from that script:

    DEBUGGING: STAT bytes_read 96

    DEBUGGING: STAT bytes_written 11627

    DEBUGGING: STAT limit_maxbytes 851443712

    DEBUGGING: STAT bytes 0

    Message.Used_for_caching: Used for caching: 0.00 MB

    Statistic.Used_for_caching: 0.00

    Message.Cache_usage_ratio: Memcache cache usage ratio: 0.00 %

    Statistic.Cache_usage_ratio: 0.00

    That's what mystifies me. The same script in the same directory with the same permissions run by the same user on the same system produces two different outputs depending on whether it's run manually by me or automatically via SAM: when run manually by me through the command shell, it produces satisfactory output. When run by SAM, it gets the 'illegal divide by zero' error.

  • Do you have a couple different perl binaries on your system? You could put ${SCRIPT} in the command box and add an invocation line to your script with the path to the one you intend to use (e.g., #!/usr/bin/perl).

  • Only one version of perl is living on the system, confirmed via `find / -type f -name perl -executable`.

  • I have a couple of suggestions, the first is from experience with scripting for Nagios, the other may not even apply at all.

    First suggestion, use full paths to everything. For example, nc and netcat should be listed as /bin/nc and /bin/netcat, adjusting for your real path.  Sometimes execution remotely doesn't initiate PATH, so the binaries don't get found, and you end up with some weird issues.

    The other suggestion may be from my not understanding how SAM is executing the remote command. I've always assumed that SAM is a Windows service, which suggests to me that it is doing some scp/ssh to execute the command remotely. If that is the case, then I'd recommend copying the script to the /tmp directory (as you've done), then executing it as the same user SAM is using from a remote server using ssh, ie:

    ssh solarwin@solarwindstest /usr/bin/perl /tmp/yourcommand 11211

    You might need to enclose the commands at the end in quotes, it's been a while since I've done remote SSH execution that way.