Odd Issue with PowerShell SAM Monitor on one Node

Fun one off issue that I am not sure if it is the node or the monitor.

I have a PS monitor setup to check a folder for the existence of an XLS file. I have a Prod and Stage instance of this monitor that is setup on Prod and Stage servers.  The monitors are setup identical the only difference is the name of the monitor. 

Both monitors are set to Inherit account from node.

Both nodes have the same service account setup on them. 

The Stage Monitor is currently responding with an Unknown status.

If I go into SAM and find the monitor to test when I test with the Stage Node it says "unknown user name or password"

The account that SAM should be using is the same account that the node uses to report on WMI and it is showing up.

I can go and find a different application monitor and add it to the Stage node and it will work like it should.  I know the Solarwinds account works. 

I tried adding the Prod monitor to the Stage server and it failed as well with 'unknown user name or password"

I removed the monitor and added it back and it is still showing as unknown. 


This seems like I am missing something.  Or maybe the node is having a Monday.  Thanks - Dave

Parents
  • I worked with the System Engineering team to try and figure out the issue and it looks like the issue is something with Solarwinds trying to run the PS script. This issue cropped up roughly 2 days after updating to 2023.3 last month.

    We have a MPE and APE

    Tested Monitor through Edit Template in SAM.  This failed about 90% of the time with unknown status in the test.  We were monitoring what was going on with the node we were testing against and Orion was able to connect to the node and launch the PS monitor but it left the wsmprovhost.exe open. If I leave the monitor running out of Orion it will continue to open wsmprovhost.exe until resources are exhausted on the node.

    Next we removed Solarwinds from the test completely. On the MPE and the APE we ran the script the script from PowerShell and the script worked fine.  There were no time outs and the results came through right away. 

    From what we can tell when we use Solarwinds to run the PS Script via a SAM monitor it times out 90% of the time and some times it actually connects.

    If we remove Solarwinds from the equation and run from PowerShell on each of the polling engines it works. 

    Maybe there is a bug in 2023.3 that was missed?

  • If it's set to remote host and you're testing it locally on the polling engines that test doesnt sound right. the equivalent would be invoking psremoting on the remote server from the solarwinds server in a script block, if so you probably also want to pass in creds and potentially change the variable output around a bit

  • That is what we are doing.  Setup the script to run via PowerShell from both Polling engines and it worked.  Tried to run the same script with the same account in Solarwinds and it times out.  WMI is connecting just fine the issue is something with Solarwinds and the PowerShell monitor.

  • We have the same thing happening in our environment. Fortunately this is a server deployed just for running monitoring scripts so I gave up on troubleshooting and instead I'm restarting it everyday. But similar to your case we are running several PowerShell monitor components on it, and it spawns dozens of processes which in the end consume all CPU resources and crash the server and those components times out randomly. 

  • Odd fun thing.  Everything is working fine this morning.  BUT I just discovered it looks like my MPE was updated to 2023.3 correctly when I ran the update on 9/12.  I checked the APE and it looks like the update never made it that far it is still on 2023.2. 

  • Looks like I may have found a bug in 2023.3.  I updated my APE to 2023.3 and the PowerShell monitors are dead again.  I really miss 2020.2.6.

Reply Children