Oh, the title didn't scare you off yet, eh? Perfect. Welcome to my latest challenge!
We developed an HA monitoring strategy for our on-premise applications where we check for the status of a service using a VIP. This allows us to monitor an HA-protected service without caring whether or not the back-end cluster is an active/active or active/passive cluster or whether it is a cluster of 50 servers. As long as one of the services responds (or the web page responds, etc. etc.) then we know that the business service is available. Yes, I know, it doesn't tell us if the service is *degraded* but that is another ball of wax.
Enter AWS.
We were asked to monitor an application cluster in AWS. Load-balancing is done via ELBs in AWS. We are also testing the use of the SolarWinds agent based on some rather direct advice from aLTeReGo and the team of product managers at SolarWinds. I know that I have the security groups set up correctly because I am able to query the status of the service directly to the servers and I have the same SG applied to the ELB. I have configured a listener on the ELB for 5985 and 5986, but when I try and test across the ELB I get an RPC Not Listening error.
Here is the simple PowerShell script I am using. Remember, these servers are monitored in NPM via the agent, not via WMI or SNMP.
$service = get-wmiobject -query "SELECT State FROM Win32_Service WHERE Name='SolarWindsAgent64'";
$status=$service.state
Write-Host $status
switch ($status)
{
{$status -eq 'Running'} {Write-Host 'Message: Service is '$status; Write-Host 'Statistic: 0'; break}
# If the service status is 'Running' then returns a statistic equal to the exit code (0) exits as Up.
default {Write-Host 'Message: Service is ' $status; Write-Host 'Statistic: 1';}
# If the process is not running, return a statistic of 1 (matching the exit code) and exit as Down.
}
Execution mode is remote host. If I wanted to run as local host I would need to modify the query to include the target and credential.
$service = get-wmiobject -query "SELECT State FROM Win32_Service WHERE Name='SolarWindsAgent64'" -ComputerName '${IP}' -Credential '${CREDENTIAL}';
I know that WinRM is working direct to the servers, but whether I run this in local or remote mode across the LB, using just the -ComputerName or both -ComputerName and -Credential, it doesn't seem to work.
I get that this is really an AWS question and not an SAM monitor question, but if you've managed to figure this one out I'd be glad to hear what you did to make it work.
And, yes, I did try and do a wide-open AWS security group for both the ELB and target nodes. It doesn't help.