All,
My organization is re-evaluating whether to use the centralized upgrade or offline upgrade method. Until this point we've used the offline upgrade as it is consistently successful. A cursory search of Thwack indicates occasional issues with the centralized upgrade method and I'm curious to know whether these issues occur for the majority of the community.(+) centralized upgrade from 2020.2.6 to 2023.1 stuck at 6% Please wait while SolarWinds Platform is being installed and taking long time. - Forum - Network Performance Monitor (NPM) - THWACK
(+) 2023.3.0 upgrade fun. - Forum - The Orion Platform - THWACK (solarwinds.com)
(+) 2022.3 Platform upgrade - how? - Forum - The Orion Platform - THWACK (solarwinds.com)
(+) My Upgrade to Orion 2020.2: A Customer Perspective - Product Blog - Resources - THWACK (solarwinds.com)
Your feedback is appreciated.
Thank you.
I keep our environment pretty up to date so I'm never making large jumps like this. I normally use the centralized update as we have an Orion HA pair, 10 additional polling engines, and 3 standalone webservers. During a typical update I need to run the configuration wizard on one of the polling engines manually. Also, the downfall of the centralized update is that it takes down your whole environment at the same time. If you do it manually you are down while you update your central Orion server but then you can do a rolling blackout situation with other components.
I take care of three SolarWinds instances and have used the offline method for years. Since version 2023 we have been using centralized upgrades and it has worked well. We upgrade each time there is a new release, so are always at the most current, not letting it get many versions behind.
Centralized upgrades has certainly become more reliable over the years. However, I've hit issues nearly every time I use it, sometimes major others minor like any upgrade. Some I know what the issue is, e.g. SolarWinds Administration Service is in need of restart or server needs a restart etc., others have needed to engage with support.
Now I typically use a hybrid approach where I use the offline method to upgrade the primary engine, then use the centralized upgrade to push to the rest from there, it's simply about convenience. Centralized upgrade also allows you to stage the installer files now so you don't have to wait for it to download all the files to each and every server in the environment.
If possible, test it in a dev environment if you have one.
I've been using the centralized upgrade for a couple versions now and haven't had a problem. We do have one poller that is on a slower network, so we traditionally kick off the download a day or two before our scheduled outage. This also helps on reducing the downtime during the upgrade itself since the servers have everything they need.
I also normally do a full reboot of the environment before and after the upgrade. I have had a few errors in the past due to stuck processes (this occured during the offline upgrade in the past and I want to say has happened with the central upgrade as well) and this seems to help
Just did another upgrade to 2023.4.2 on two systems and it went well. My only comment is that the SolarWinds Admin service did not restart on any of the systems and needed to be started manually afterwards on every server. Other than that no issues.
Have never actually seen the central upgrade work on any environment, there's always been some sort of issue. Currently doing another batch of upgrades and the first environment that doing an upgrade on looks like another failure to work.
Performing an upgrade to 2024.1
The main poller upgraded and after config wizard finished on the Primary poller brought to screen for Centralized Upgrade
After clicking Update button it starts to flash like a heartbeat and nothing after that, no reference to the other 11 Servers in this environment. At the very least would have expected to see list of the servers in the environment and some sort of status for each of them. Looks like it has failed again and need to do manually as nothing changed in over 30mins, no errors or indication as to status.
Hi. If you opened a support case, can you please post the case # here?
Ultimately it's going to come down to your risk acceptance. if you can accept the risk of going with the most current version, then go with the online central upgrade path, it will save you a lot of time. (until the day it cost you a lot of time).
Me personally, I always do the offline download, and make sure it's one version back from current. I also test the install at least once on a copy of the production system (on a disconnected network). This last time I had to do the test upgrade 5+ times from 2022.x until I found the right combination to make it work. It would crash and leave things in a partially upgraded state and I could not get it to fixed, I finally found that I had to uninstall the TFTP software before I started the upgrade and it worked just fine. If I had been upgrading the live system I cannot just revert snapshots and try again over and over again for a week... the end users would not put up with that. Also on the live system, I cannot just leave it down for days while tech support helps me figure it out.
I've had similar experiences which others have mentioned here and I agree with @brscott . I recently attempted to use the centralized upgrade route, as it was a new option available. Well, let's just say it did not go well. I'm sure this option is a great idea on paper, but the more complex the environment you manage, there is a likely correlation that the upgrade will fail. In my experience the upgrade failed on the MPE at approximately 2/3 complete. This resulted in a long night with support, and we were unable to resolve the issue. A full restore from backup was all that was able to bring it all back. I went with the tried and true method of the manual route of just queuing up several RDP sessions for each poller and it was all successful without a single issue. To each their own, but I would avoid it for now because the convenience does not outweigh the risk and potential downtime.