Let's get to the point, you need to upgrade to Orion 2020.2.1 and I want to help.
I'm sure you've read all the articles in the Success Center and even contemplated that one method you read about doing this with "Zero Downtime". Well instead, I'm just going to tell you what I did and hope it at least helps a couple people who might be currently pressured into upgrading. *cough*
This method will even give you the ability to fall back should you have any issues.
My previous environment:
Let's begin by prepping:
Since I am on Windows Server 2012R2 for everything, I will need to upgrade and migrate both my SQL server and OS.
Start by powering down that NTA server of yours. You wont need it. Orion version 2020.2.1 now creates a new NTA database on the same SQL server that the Orion database gets placed on. This will happen during the configuration wizard for the 2020.2.1 install.
The next thing I did was create (2) brand new Windows 2019 servers.
Allocate resources to your new servers based on the Orion platform requirements (my environment was classified as "Large SLX")
At this point, you should now have (2) Windows 2019 Servers. The one with SQL 2019 is setup and ready to go and another that will be your new SolarWinds Orion server, which is just sitting there waiting in anticipation.
Download SolarWinds Orion Platform v.2020.2.1 from your customer portal and place it on the desktop of your new 2019 Orion Server.
Log in as the Local Administrator on the new Orion server and run the installer. DO NOT PROCEED WITH THE CONFIGURATION WIZARD. We will do this when it's time to cut over. Just cancel when you get to this step.
Your environment is now prepped and we can start the migration.
Start by logging into vCenter or HyperV and create a snapshot of your current Orion server. Just for fun.
Now, log into your current Orion web portal and deactivate your licenses
Migrate any legacy reports you have over to the new server:
Once your licenses have been deactivated; reports have been moved, go ahead and power down your old Orion server.
Now, log into the SQL server holding your current Orion database and create a FULL backup. Don't wait on your DBA, you can do it yourself. I believe in you.
Once the backup is done, go ahead and copy it to your new SQL 2019 SQL Server and restore it using the same link in the last step. You will no longer need that old SQL Server of yours. (For SolarWinds at least...)
Once the database has been restored, you will need to select the newly restored database on your new SQL 2019 server an execute the following query, replacing 'Server 1' with the NetBIOS name of your OLD Orion server and 'Server 2' with the NetBIOS name of your NEW Orion server:
DECLARE @oldHostname nvarchar(max)
SET @oldHostname = 'Server1'
DECLARE @newHostname nvarchar(max)
SET @newHostname = 'Server2'
UPDATE Engines SET ServerName = @newHostname WHERE ServerName = @oldHostname
UPDATE WebSettings SET SettingValue = @newHostname WHERE SettingName = 'JobSchedulerHost'
UPDATE Websites SET ServerName = @newHostname WHERE ServerName = @oldHostname
UPDATE OrionServers SET HostName = @newHostname WHERE HostName = @oldHostname
Just to recap, we've deactivated our licenses, backed up our database on the old server, restored it on the new server, updated the databased to reflect the NetBIOS name of our new server, and powered down the old Orion server.
Now on your new Orion server, go ahead and start the configuration wizard I told you to skip earlier. Follow the steps on the configuration wizard and select the option to use an existing database. Point it to your new SQL 2019 Server and use the SA account to create the initial connection. The configuration wizard will then ask if you want to create a new account for SolarWinds to use instead. Please do this and do not actually use your SQL SA account.
The configuration wizard will now prompt for the same thing, but now for the new NTA database. Choose to create a new database for it on the new 2019 SQL server. Use that new account you created in the last step for accessing the Orion database.
Once the configuration wizard is finished you should now be able to access your new Orion server via the web portal to reactivate the licenses.
When I went to reactivate my licenses, I experienced an error about my license store being corrupted. You can fix that here.
At this point you should be good to go, but I will include a few bonus steps I did once I confirmed it was able to log in on the new portal and my data was showing:
While I can't guarantee the same process goes as smoothly for everyone else, especially if you have multiple polling engines (please see @ahbrook comment below for help with APE's) this process only took me (1.5) hours to complete after I had already pre-staged everything along side my old environment.
Should for whatever reason you need to fall back, all you will have to do is power on the old Orion server and reactivate your licenses. (Assuming you haven't changed the hostname and IP of your new Orion server to match. This is why I created a CNAME and tested my access to the portal first.)
Hope this helps!
EDIT: Updated the article to include the step of updating the restored database with the name of your new Orion server. Kudos to @ahbrook for pointing this out as this was step I had to do as well.
On Friday, we got the call from our security office that we were safe to proceed with a migration scenario. I used this thread as part of my work to develop steps, and learned a few things in the process that may be helpful for others.
Main poller running 2020.2.0 on physical hardware
Additional polling engine running as a VM
Production database running on a physical server
Test main poller running as a VM.
Test database running as a VM.
After making sure we had solid backups of all our servers and a network firewall in place to prevent outbound communications based on the URL thread, we proceeded to:
Wipe and rebuild the physical machine, giving it a new name (app02) but the same IP as the previous server
Build a new APE, giving it a new name (ape02) but the same IP as the previous server
Build a new test main poller, giving it a new name (app02-t) but the same IP as the previous server.
Changed the passwords that Orion accesses the databases with, but left the database servers more or less alone.
Starting with Test and working with prod in parallel, we got through the installer steps without issue and moved onto configuration. And this is where we ran into considerable difficulty.
- The Test environment would not see the Test database. This turned out to be because firewall rules that allowed inbound communications were lost on the DB. We don't know why at the moment; I suspect they were pulled out due to an abundance of caution.
- The Test and prod environments both hung trying to validate the website. Test would tend to hang trying to validate the SWIS service, and Prod would just infinitely spin, waiting for the website.
- Eventually we moved app02 to a unique IP address, and this allowed the configuration manager to run and the website to work, just extremely slowly. It sometimes hung trying to access the orion agent to download it.
Digging around in the logs showed me quite a few errors, mostly related to not finding resources or timing out. Eventually, though, I saw the following article:
Task 2, Step 8 pointed that more needed to be done in this scenario. Specifically, we needed to go in and change a few tables. Examining these tables, I Found the issue:
When installed and configured, the system added the new app instance as a main poller with a different NetBIOS name but the same IP address as the previous poller.
In essence, the database thought there were 2 main pollers with 2 names and a shared IP, and this was causing ALL SORTS of confusion.
- I followed the documentation to go in and modify any NetBIOS references from app01 to app02. I then ran the configuration wizard again, and finally things worked without a hitch.
- To finalize cleanup on prod, I needed to remove the now-useless extra poller (the extra app02 that was created by my mistake), repair the license using the steps outlined above, and then re-apply the licenses we deactivated before starting. We also reset the IP back to its original, and ran the configuration wizard again.
After taking these steps, production looked good - the instance was running, I could see all of our nodes in the maintenance mode we put them in, and security was working as expected. The agents were not checking in, however, but we are waiting on networking to put in the CNAME to see if that fixes it.
As far as test goes, the process was simpler because we had tested in prod... as you do. 🙂 (Given that prod was physical and was not having the database access issues, I focused on it first). For test, I was also able to go in to edit the database and rename the references. In its case, the license seemed healthy, but it did not work and allow us to manage nodes until we deactivated and re-applied all our licenses. I also needed to clean up the extra polling engine that it thought existed.
If you have a multi server or HA setup, it is extremely important that you go in and check the database references in between installing the new software and running the configuration manager. There can only be one main poller, and likely the system still thinks it exists if you are using the same database servers. It doesn't take much to check this, so better to be safe than sorry!
We're still not 100% up, but we are looking better for the holiday break. I am super grateful to this thread and other places that have been sharing tips and tricks to getting things secured and back to full operational status, and that's why I wanted to post this - to help others who may follow in our footsteps.
Oh, hello there misconfigured firewall rules!
For some reason, completely beyond any of us, the "Solarwinds Agent Management" windows firewall rule, port 17778 inbound was set to only allow "public" traffic, not "Domain" or "private."
As soon as we opened that up, we were able to get our agents check in and now we're about 95% recovered.
So that's another important troubleshooting note: Check your firewall rules! Ideally document them before you start.
Is it possible if I upgrade from 2016.2.100? I am going to upgrade to fix the security issue, but my version is too old and so worry about the upgrade path with many many documents.
This is super helpful!
In this scenario, are you leaving the agents installed on your managed nodes? I've not seen much guidance on whether or not they would be compromised as well.. We're estimating that we will have to completely uninstall and reinstall our agents, and I do not know how our database would like that -- if we decide we are keeping our DB and it isn't infected.
When I performed this upgrade / migration, I completely left our SolarWinds agents as is on the servers they were installed on. Once I created the CNAME record of the old Orion server that points to my new one, the agents checked in and updated their version of the agent software automatically.
The only thing you would still need to do after is modify the agents settings from control panel on each server that has it installed, so the hostname or IP now points to the new server name. That's assuming you're not eventually changing the name on your new Orion server to match the old one. Then you can skip having to do this part.
This is good in case you're not on an affected version and someone still thinks it's smart to upgrade to 2020.2.1 HF2 right now! I'm not sure why anyone would think this is a good idea right away but... that's just me.
Understood and appreciate the comments.
My intention with this post was specifically help anyone that might be under extreme pressure to get to the latest version. I'm sure we can all agree this was not on our Holiday wish list, especially mine, but felt I should at least offer what I can for anyone who has not had to do this previously.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.