SAM Windows Agent Restart and Windows Firewall fix Script

Version 1

    I've had an issue where our field support are given a ticket for a machine going down in SolarWinds, but they just close the ticket out saying "I can RDP to it." This has added overhead to myself as I then have to dig through the ticketing system, find their responses, and figure out why the machine is showing down within SolarWinds.  To add to the issue we have a self cleanup script that will warn when a machine has been offline for 30 days, and delete the node 3 days later without acknowledgement.  Often times a node is only "down" due to the Windows Agent needing to be restarted, or that Windows Firewall was turned on and the exception for File and Printer Sharing was not enabled.  This script will test and then execute if it determines this is the cause.

     

    What you'll need:

    The Windows account used to access the machines and the password

    Basic knowledge of PowerShell

    Access to the primary SolarWinds poller

     

    SolarWinds Orion Module Engine Windows Service Fix

    In order to run PowerShell scripts from executed alerts, you must modify the Windows service on your primary poller. Change the "SolarWinds Orion Module Engine" service from running as Local System, to the Windows account used for polling.  When completed cycle the SolarWinds services, (a reboot may be required for this to take affect. I do not recall).

    Serivce.PNG

     

     

    Creating an Encrypted Password File:

    I followed this guide, but essentially in PowerShell you only need to do the command below.  It's important to note, the only way I got this to work was running this encryption command while logged in with the same Windows account I used for the service in the previous step.

    http://www.adminarsenal.com/admin-arsenal-blog/secure-password-with-powershell-encrypting-credentials-part-1/ 

    "P@ssword1" | ConvertTo-SecureString -AsPlainText -Force | ConvertFrom-SecureString | Out-File "C:\Temp 2\Password.txt"

     

    PowerShell Script for Testing Down Node:

    The code below is what I'm using.  I'm restricting the list of servers with an Alert I created in SolarWinds to only monitor "MachineType contains Windows", "Status is Down", "Status is not Unmonitored", and exclude any subnets that require a different Windows account for monitoring.

     

    The SolarWinds Alert will execute an external program.  I point it to:

    C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe "PathOfYourScript.ps1" ${N=SwisEntity;M=DNS}

    The ${N=SwisEntity;M=DNS} is Arg[0] a space any any other variables are Arg[1]...[2]... etc.

     

     

    This is the actual code in the PowerShell script.  In this code, the first check is simply doing ICMP.  If this is successful, it comes back "True".  If True, I pass the command to restart the SolarWinds Agent Service.  If false we go to step 2.  Step 2, runs a WinRM command against machine.  Since we here utilize WinRM, if it doesn't come back with anything, the node is actually down.  If we do get a response with "*Microsoft*" as the ProductVendor, we then push the "File and Printer Sharing" exceptions to allow ICMP and status polling in SolarWinds again.

    $username = "Domain\Username"

    $pass = cat C:\scripts\Password.txt | ConvertTo-SecureString

    $cred = New-Object -TypeName System.Management.Automation.PSCredential `

                -ArgumentList $username, $pass

    $Computer = $args[0]

    $Test = Test-Connection -Quiet -ComputerName $Computer

    $test2 = Test-WSMan $Computer

    $ServiceObj = Get-Service -Name SolarWindsAgent* -ComputerName $Computer

     

     

    IF ($test -eq "True")

    {#Restart SolarWinds Agent

    Enter-PSSession -ComputerName $Computer -Credential $cred

    Restart-Service -InputObj $ServiceObj

    Exit-PSSession

    exit}

     

     

    ELSEIF ($test2.ProductVendor -like "*Microsoft*")

    {#Enable File and Printer Sharing Exceptions for Windows Firewall

    Invoke-Command -ComputerName $Computer -Credential $cred -scriptblock {

    netsh advfirewall firewall set rule group="File and Printer Sharing" new enable=Yes}

    exit}

     

     

    ELSE {exit}

     

    Auto-Acknowledge SolarWinds Alert:

    I love the alerting system in SolarWinds, but I wish some items would only flag on the machine but not stay as an active alert.  Minor issues are something I'd like to know about when researching but not be overwhelmed with them on our summary page.  (VM snapshot age alerts and example).  Also, since I've already established all of my regional rules, I didn't want to modify every single alert.  Instead I created a global alert for only verifying that a machine is truly down, and if it isn't, fix the agent or firewall.  Well, doing so I would have active alerts showing for all of the machines that were indeed down.  To get around this I have my Auto-Acknowledge script.  (useful for my application owner alerts.  they never seem to acknowledge anything in SolarWinds)

     

    First create a user with no admin rights, but the right to acknowledge alerts.  Then simply create an "Execute an external program" action, and set the path as follows:

    "C:\Program Files (x86)\Internet Explorer\iexplore.exe" "${N=Alerting;M=AcknowledgeURL}&AccountID=AcknowledgeUser&Password=AcknowledgePass"

    It is possible to pass a message into the acknowledgement but I don't bother.

     

    Self-Delete Down Nodes:

    Simply because I mentioned I use this, here it is.  We have close to 5k nodes in SolarWinds.  We have delegated out control to the facilities but they are fairly bad and keeping up with maintaining a current list of machines.  To add automatically I'm sure we've all created subnet (and soon AD) scans.  To clean up old machines is another issue.

     

    First I create an alert with the following "Custom SQL Alert (advanced)" fields.  Alert on Node, (gray box area says) SELECT Nodes.NodeID, Nodes.Caption FROM Nodes. My code in the white box is:

    INNER JOIN

    NetObjectDowntime

    ON Nodes.NodeID=NetObjectDowntime.NodeID

    WHERE

    NetObjectDowntime.EntityType='Orion.Nodes'

    AND

    NetObjectDowntime.State=2

    AND

    NetObjectDowntime.DateTimeUntil IS NULL

    AND

    convert(varchar(8), NetObjectDowntime.DateTimeFrom, 112) <= convert(varchar(8), getdate()-30, 112)

    AND

    Nodes.UnManaged='false'

    This looks for machines down for at least 30 days and are not unmanaged.  My first alert is an email alerting the server groups their machine is slated for deletion in 3 days.  It also dumps all of the custom field properties in the email in case I ever need to "undo" a deletion.

     

    Once the 3 days have passed and it hasn't been resolved for unmonitored, the deletion script executes.  My escalation level 2 executes an external program:

    C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe "DeletionScriptPath.ps1" ${N=SwisEntity;M=NodeID}

    And this executes the following PowerShell call.  Please note you need to have the SolarWinds SDK SWIS installed on the primary poller.

     

    Add-PSSnapin SwisSnapin | Out-File PathofOptionalLog.txt

     

    $username = "Domain\User"

    $pass = cat Passwordfilepath.txt | ConvertTo-SecureString

    $cred = New-Object -TypeName System.Management.Automation.PSCredential `

                -ArgumentList $username, $pass

    $swis = Connect-Swis -Credential $cred # create a SWIS connection object

    $NodeID=$args[0]

    Remove-SwisObject $swis -Uri "swis://localhost/Orion/Orion.Nodes/NodeID=$NodeID"

    break all

    exit