GIVING BACK TO THWACK - 2 (Automated Snapshots and Unmanaging)

Version 1

    After patch manager was setup with a daily schedule of blocks of servers I was tasked with automating both unmanaging and snapshotting of these servers prior to Patch Manager kicking off each night. I figured that others may find it useful to know how I accomplished this:



    This part was straightforward. I used the Solarwinds Unmanage Scheduler installed on the poller to create a script for each day containing the list of nodes. I then created a Windows task schedule template which I used to create one schedule per day to kick off unmanaging of the servers at 1am each morning and remanaging at 6am.



    This request was a bit more of a head scratcher as our VM team stated that they are unable to script such a things. I wanted to do this in a way that any of our sysadmins could make changes to if necessary anyway so as we are running VMAN I settled on doing it via Solarwinds. I was given the additional instruction by the VM team that only one snapshot can be requested at any one time otherwise the ESX’s would get overloaded and cause the entire VM infrastructure go into meltdown.


    I accomplished this in the following way:


    1. I created a new alert and gave it a generic name followed by the day of the month.
    2. Properties Tab: I set it to evaluate the trigger every 2 minutes and made the severity ‘informational’.
    3. Trigger Condition Tab: For the trigger condition I just wanted something which would always be true so set the trigger to: Node->Node Name->is equal to and selected the Solarwinds poller.
    4. Reset Condition Tab: I’d calculated that the longest time we’d need to take all the snapshots per day would be under 90 minutes so set the alert to automatically clear itself after 90 minutes.
    5. Time of Day Tab: I selected to specify the time of day for schedule and created a schedule name specific to the task. Patch Manager has been set to begin at 1am each day so 90 minutes prior to that would be 11:30pm of the day before so I ticked all months in the schedule then selected the day before and specified to run between 11:30pm and 1am of the following day.
    6. Trigger Actions Tab: I entered a message to display of “Pre-Patching Snapshots have begun”. I also added a netperfmon entry stating the same for the first action. For the second action I selected ‘Manage VM - Take Snapshot’ and selected the first server on my list.
    7. As I could only take one snapshot at a time I calculated that we could safely run one every 4 minutes (my testing showed that snapshots taken this way are not instant and can take a couple of minutes to begin). So, I added an escalation level with a 4 minute wait. I then copied the previously created action and placed it in escalation level 2, edited it to change to the second server on my list and (this is VERY important) change the action title (see notes below). I repeated this to create enough escalation levels as I had servers.
    8. Reset Action Tab: Added a log entry of ‘Snapshots Complete’
    9. Once saved I duplicated this alert for each of the remaining days and edited as appropriate.


    Snags: I initially created a title for each snapshot action for the time of day it would begin however after configuring several days I discovered that the action titles are reused and thus I’d been changing the actions in ALL of the alerts rather than just the one I was creating so in addition to the time I also added the date to make the titles unique. You could also use the server name but I wanted to make these actions reusable for future replacement hence the use of time/date.