Just setting up monitoring on my UNIX servers. In this case, it's an HP-UX 11.23 box as part of an MC/Serviceguard cluster.
Now, here's the question. I wrote a script (ksh) that runs fine manually. All the places where it can be tested from within Orion work fine. It runs fine manually on the box. It also runs fine on other boxes. However, in automated mode, that is when run automatically from the software on this one box, I get this error.
I've done a bit of research to find the cause and all indications direct me to add time to the SSH timeout. Essentially, you modify a config file on the Orion server (Windows) and restart the JobEngine V2 services. What this is supposed to do is increase the timeout value for SSH connections. What the error boils down to is that it is taking too long (more than the default 2 seconds) to log in to run the script on the box.
Now here's where it gets troublesome. As I am not an admin on the orion box (I'm a UNIX guy, not a Windows admin), I am very limited to what I can do there. However, my thought was that if I could go into the registry and manually change the value of this key to say, 10, instead of two and then test the script again, I would have something to go to the admin with to convince him of the problem. However, when I went to the registry, this key is no where to be found.
I read a suggestion that said to create another user on the box and run the script as that user (creating the necessary credentials as well). In this case, the script must run as root as the command the script relies on is only executable as root due to it being a cluster command.
Suggestions?
The "fix" said to edit the following line in the Solarwinds.APM.Probes.dll.config file:
<appSettings>
<add key="SSH.Monitor.PromptWait" value="2" />
changing the "2" to a "10" to increase the timeout from 2 to 10 seconds.