7 Replies Latest reply on May 3, 2019 12:09 AM by naik_snigdha777

    Reboot Event Review on Server Reboot

    adbs98

      All I thought I would share a script that a colleague and I wrote, that I have gotten to automate with SAM when a reboot alert is triggered.

       

      Breakdown

       

      1. A person, robot, aliens, etc .... reboot a server that we have monitored.

       

      2. We get an alert stating that the server has been rebooted and we get specifics on the reboot.

       

      3. If we know about the reboot, we acknowledge the alert within 2 minutes and that is the end

       

      4. If we don't acknowledge the alert, I then have a powershell script that goes out and pulls the 1074 events from the event manager on that server and emails the output to our team. We then know who rebooted and if they are kind enough to say why, then the why. This makes it much easier.

       

      Here is an actual email. Names changed of course to protect the innocent.

       

      **********************************************

      From: SysAdmin@domain.org
      Sent: Dec 20, 2015 2:24 PM
      To: DataCenter Team
      Subject: Reboot Event Review - serverblah.domain.local

       

        Server Event Log Restart Parsed Results

      EventID       : 1074
      MachineName   : serverblah.domain.local
      Message       : The process C:\Windows\system32\winlogon.exe (serverblah.domain.local) has
        initiated the restart of computer serverblah.domain.local on behalf of
        user Domain\user blah for the following reason: Software upgrade
        reason could be found
       
        Reason Code: 0x500ff
       
        Shutdown Type: restart

      **************************************

       

      Now the script and how I got this to work (Nasty solarwinds bug didn't make this easy)

       

      1. Setup the script - Put the following into a notepad and name it and end it with .ps1 like autoreboot.ps1. Set the execution policy to passive. Then attempt to run the script from powershell - ./autoreboot.ps1 serverbla

      ***Remember to change the perimeters in the script to match your settings ****

       

      *******************************************

      #Reboot Check by Santez K / Aaron B - All credit goes to Santez K for getting foundation of this this script laid.

      #IMPORTANT !!! If this stops working while using SAM alerting - Check the Solarwinds Orion Module service - It should be using a valid AD admin account.

       

      param(

      [string]$computer

      )

       

      Write-Host $computer

      $after = (get-date).addminutes(-4000)

      $Log = Get-EventLog -LogName system -after $after -ComputerName $computer | Where-Object {$_.EventID -eq 1074} | format-list -Property EventID, MachineName, Message, TimeGenerated, TimeWritten, UserName | out-string

      Send-MailMessage -To "email@blah.org" -Subject "Reboot Event Review - $computer" -Body "Server Event Log Restart Parsed Results $Log" -smtpserver email.blah.local -From "email@blah.org"

       

      ***********************************

      2. This is the most important part. Solarwinds alerting has a very nasty bug that won't do Windows authentication when setting up the alert. No matter what you use, it will say ummm no. I will explain how to get around this in the next bullet.

       

      This is how I have the command set to run the script. Node name puts the server name in for the server it is alerting on.

       

      Capture.PNG

      3. For authentication you need to go to services on the SAM server and change the "Solarwinds Orion Module Engine" to run as and pick the user who has keys to the world. We have a solarwinds service account set in AD for this so that is what we use. Weirdly that is what is used to run the script against. If you left it with local user, it would never authenticate with any servers and you would never get any results. I spent weeks on and off trying to figure this out. Was to told it was the alerting service but support and others were wrong.

       

      Once that is setup you should be all set. Here is our reboot breakdown in triggers.

       

      Capture.PNG

       

       

      Hope this helps someone

        • Re: Reboot Event Review on Server Reboot
          ignitephoenix

          So this is very cool, and my lead wants something like this. I am just not getting anything in the email alerts. I saved a ps1 file with the script, and followed everything else I could. It does seem I'm getting an alert. But Subject just says Reboot Event Review- and the body has Server Event Log Restart Parsed Results. I did change the script from -4000 minutes to -2000 minutes, but I don't think that's the issue. I'm not familiar with powershell. Other than email.blah.org and email.blah.local, was there anything I needed to change in order to have this work correctly?

            • Re: Reboot Event Review on Server Reboot
              mesverrum

              From what you describe of the message you are getting it sounds to me like the $log variable is empty when your email goes out.  If i were troubleshooting this I would launch powershell.exe under the same account you use for solarwinds, then try and execute these bits of the commands from the script

               

              $computer = 'caption of the computer i want to test this against'

              $after = (get-date).addminutes(-4000)

              $Log = Get-EventLog -LogName system -after $after -ComputerName $computer | Where-Object {$_.EventID -eq 1074} | format-list -Property EventID, MachineName, Message, TimeGenerated, TimeWritten, UserName

               

              then just run enter in $Log into the next prompt in powershell and see what it dumps out.

               

              Some of the places I imagine this could fall apart would be:

               

              First of all is the account that your orion module engine server running as actually able to execute powershell commands like this?

                   That's why I would be using the PoSH cli under that account to confirm nothing weird comes up

               

              Is the caption you are passing actually a hostname that the orion server can resolve?

                   Sometimes people will put things in the caption that are not actually hostnames, confirm that is not happening to you

               

              Check the event logs of your test machine, do they actually have event 1074's in their history, and is that event recent enough that it's not getting cut off by the time filter you set up?

               

              If all those things check out then I would expect the script to work, but walking through each step on the CLI is a decent way to pick out where exactly the process is getting stuck.

                • Re: Reboot Event Review on Server Reboot
                  ignitephoenix

                  Is the code you posted not able to grab the hostname from the affected node? What I was hoping to do was apply this universally, so that no matter what node this occurred on, I would receive the Event Log. Manually checking the logs does provide a 1074, so I know that's not the issue.

                   

                  I can check with my team to see if I modified the code correctly, but the Orion server is domain admin, so I can't see any reason it wouldn't be able to perform powershell commands.

                   

                  The Event Log does seem to be empty, so my guess is it's likely the hostname puling info that's not working correctly. I'll see if I can take a look with the members of my team who know PowerShell along with the information you've provided.

                    • Re: Reboot Event Review on Server Reboot
                      mesverrum

                      The code segment I posted isn't able to grab anything because it is intended to be typed into the power shell command line on its own,  just to spot check that nothing weird is happening when you try to execute the commands specific to displaying the log output,  I'm not suggesting that as a replacement for the example because as far as I can see that example should work perfect as it is,  but privileges and posh execution from one environment to another tends to be fussy.

                      • Re: Reboot Event Review on Server Reboot
                        adbs98

                        The first thing you should do is ensure the script works and gives you results before moving on to getting SAM to run it.

                         

                        That being said... The script should just work. There are two main reasons it may not be giving results

                         

                        1. The seconds is how much time to search the event logs for 1074 events. So if you have not rebooted the server you are checking in the last two days then your not going to get any results. Use a test server and reboot it to ensure you have a fresh 1074 for the script to find. I had this set so it doesn't give all 1074 events in the log as I only want to know about the most recent reboot.

                         

                        2. You have to start your powershell windows with run as and use a user that has keys to the kingdom. Meaning it has admin rights for the server you want to check. You can't just fire up powershell and expect it to be able to authenticate with a remote server. I would also ensure you are running and testing the script directly on the Solarwinds server as this is where it will run from. Once you get run the script multiple times and consistently get results then move on to the next task of setting it up in SAM.

                         

                         

                        If you look at the alert setup, the ${NodeName} is the hostname of the server that SAM is alerting about.

                         

                         

                         

                        Hope this helps.

                          • Re: Reboot Event Review on Server Reboot
                            ignitephoenix

                            So I had someone I work with familiar with PowerShell take a look at it. I guess the first time it was missing a set of double quotes; however, when trying to run the script remotely it's pulling up different values on the eventlog. if SolarWinds run this, might that change? I am not sure how he's trying to run the script remotely, it's not returning the same values according to him. Like it is changing the values. When he keeps the Where clause, no data gets returned period. When he uses a gethostname, the values change. Any thoughts?