This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Where I am on the learning curve: true/false

I started using SolarWinds (as my British colleauges might say tongue-in-cheekily: "in anger") not long ago.  It's been deployed at work for a couple years at least, from what I can gather.  One of my current goals is to integrate AutoSys (enterprise scheduler) with the monitoring/alerting/fixing.

First I wanted to pull in a parameter value (known as a global variable) from AutoSys; see: Monitoring via custom Java code. After fiddling with a PowerShell "shell within a shell" command I could pull in the values that we typically set to the strings UP or DOWN as a custom property with Boolean or Yes/No True/False settings. There's a third state, like "not a number" if the interface fails for some reason which I suppose I'd need to default to whichever state makes sense.

We use these globals to hold automation from running, such as when an application or system is down for maintenance.  Makes more sense than the overly simplistic "midnight to 2AM" blackout periods that are often a built-in feature.  This way can be done on a scheduled basis, or via dependenices, or someone could take the variable "down" to resolve bottlenecks, etc.

The 0/1 state of yes-or-no-ness:

Fotor_152729112915236.jpg

On an aside, it would be nice to have left axis values of 1 or 0 instead of decimals and negative numbers.

The custom property, defined in a parent:child relationship in order to not alert on say, disk space, when an install might use the entire disk when it runs.

Fotor_152729103982668.jpg

It worked like I wanted, to a limited extent, as I hit a few bumps.  I've read that the parent of the group needs to be singular, or alarms might skirt this gate.  And, the custom property does not look like it scales a lot as I've seen this example show up in other places.  A couple hundred settings might turn into noise if variables cover many apps.  Last, when I set the variable back up from the AutoSys side the child disk volumes have remained in a grey/unknown un-monitored state.  Of course, that could be for other undiscovered reasons.

I'm supposed to ask a question here, since that box above is ticked.  Am I doing this right?

  • The parent needs to be "Down" to trigger suppression events, it looks like yours is going "Critical" when it is at 1.  I'm assuming you are using a custom script monitor, in which case you should just exit with a 1 to force the monitor down, which would then cause the dependency to kick in.

    The Basics of PowerShell (part 3)

  • So I have the output correct (ones and zeroes) but the exit codes are not right.

    Outer shell:

    [

    E:\bin\GVSW.ps1 $global

    ]

    Inner shell:

    [

    cd e:\bin

    E:\"Program Files (x86)"\CA\"Workload Automation AE"\JRE_WA\bin\java GVSW $global

    ]

    I need to pass the inner exit code to the outer shell, and then out to the monitor call.  Right wavelength?

  • I'm not familiar with this inner and outer shell trickery you are describing, but that sounds about right.  something like an If $value = 1 then exit 1 else blah blah

  • Inner/outer is like one subroutine that calls another.  I needed to pass the innermost return code back to the outermost calling routine.  As mentioned in the Java app thread, the inner code was easier (for me) to get working since paths with spaces can ended up being parsed wrong.

    The exit routine I found that worked is (no if/then needed since that is done in Java):

    Exit $LASTEXITCODE

    Too many years with UNIX shells, where the exit code of the last statement in the shell is passed as the shell exit code. 

    Almost there...

    SW-20180601-UP-DOWN-0-1.png

    The exit code now sets the status as Down (red light/green light).

    There is a warning saying "The return code is different than expected. Testing on node '127.0.0.1' failed with 'Down' status ('Down' might be different if script exits with a different exit code)."

    Can I ignore that, or no?

  • To my eyes it seems to be working, assuming you still get an exit 0 with a statistic when the stat is good then I would feel reasonable comfortable ignoring that.  It would mean that your 0-1 chart won't show up anymore the way it did, but thats just a design limitation that when solarwinds gets an exit code 1 it won't even collect any statistics, just downs the object and you are done with it.

    With that app being down it should trigger the dependency relationships for the children now where they get marked as unreachable and solarwinds stops polling them.  Is that happening for you?

  • > ... Is that happening for you?

    It looks like the disk volumes I was testing with are in an Unknown state, for reasons other than my up/down check.  I'll find other test cases.

    Thanks again!