cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

My Upgrade to Orion 2020.2: A Customer Perspective

With any upgrade to production software, I’m always a little hesitant to be the very first person out of the gate. Like many of you, I’ve been burned in the past when chasing the newest promised features. This isn’t indicative of any specific vendor’s software (SolarWinds included), it’s just being burned is a great way to learn to never do that thing again (the “hot stove” methodology to learning).

The Orion Platform 2020.2 updates were made generally available (GA) on June 4, and I was anxious to try out some of the new features. I used the Orion Demo to review some of the features and see how the products integrated. However, I took my time and did my research before embarking on this last upgrade.

For me, research includes reading release notes, review any hotfix details, cataloging my existing system, reading up on other experiences on THWACK, speaking with friends in the community, and otherwise doing my due diligence. Pretty much the same thing I do with any production system upgrade.

Background

Let me tell you a little bit about my environment, so we’re all on the same page. My SolarWinds infrastructure isn’t the largest nor the smallest build—I like to think of it as the Goldilocks build—just the right size for me and my company. My Orion Platform footprint consists of a Main Polling Engine, two Additional Polling Engines (APEs), and two Additional Web Sites (AWSs). All these servers (and the Microsoft SQL Server hosting the Orion database) are virtualized.

Upgrade Summary

Monitoring Infrastructure

  • Main Polling Engine (MPE)
  • 2 × Additional Polling Engines (APE's)
  • 2 × Additional Web Servers (AWS's)
  • O/S: Windows Server 2016
  • SQL O/S: SQL Server 2016 SP2
  • 5 products on 5 Orion servers

Monitored Elements

  • 1,645 Nodes
  • 12,800 Interfaces
  • 650 Applications
  • 200 Virtual Hosts

Upgrade Plan

  1. Backups and snapshots
  2. Install any prerequisites
  3. Upgrade MPE
  4. Upgrade APE's and AWS's via web
  5. Planned Change Window: 6 hours

Upgrade From/To

  • NPM 12.4 --> skipping 2019.4 --> 2020.2
  • SAM 6.8 --> skipping 2019.4 --> 2020.2
  • NCM 7.9 --> skipping 2019.4 --> 2020.2
  • VMAN 8.4 --> skipping 2019.4 --> 2020.2
  • SCM 1.1 --> skipping 2019.4 --> 2020.2

Total Upgrade Time: 127 minutes
And that includes a opening a support case

This environment currently monitors 1,645 nodes, 12,800 interfaces, 650 applications, 200 virtual hosts, and many other things I’m glossing over for brevity. Our database is currently clocking in at between 70-75 GB. All told, we have 16K network elements, with roughly 550K events spanning our three polling engines.

Planning

Since I was two major revisions behind on most of my products, I decided to add a little extra time in planning. I started by looking at what SolarWinds recommended for my upgrade process by using the Upgrade Advisor.

Upgrade Advisor

I jumped into my customer portal and opened the Upgrade Advisor. I put in the necessary information about my existing system and the target upgrade.

wluther_0-1597767159936.png

 

After submitting, I was given a step-by-step list of what needed to be done to bring me up to the selected version numbers.

wluther_1-1597767159954.png

 

For anyone interested, I’ve attached the PDF export of my upgrade advisor, so you can see what I was using as my guide. This made writing up my change management request super easy. I could just attach the PDF.

The summary of the upgrade plan is this:

  1. Upgrade NPM from 12.4 to 2020.2
  2. Upgrade SAM from 6.8 to 2020.2
  3. Upgrade NCM from 7.9 to 2020.2
  4. Upgrade VMAN from 8.4 to 2020.2
  5. Upgrade SCM from 1.1 to 2020.2
  6. Upgrade DPAIM from 11.1.1 to 2020.2
  7. Upgrade the scalability engines using the Centralized Upgrade

Prerequisite

In the release notes for Orion Platform 2019.4 (the version I’m skipping over), it’s mentioned the products now rely on the .NET 4.8 Framework and will check on its presence before upgrading. Knowing none of my servers had this installed, I elected to do an offline installation of the new framework. Thankfully, all of my servers are running the same operating system (Windows Server 2016), so it was only the one download, which I then copied to each of the Orion servers.

Execution

Now it was time for the actual installation. We kicked off right at the beginning of our scheduled change window—midnight.

12:00 a.m.

The first step was to power down all the servers (including the database server) and take snapshots. At my organization, our preference is powered-off snapshots. Database servers are not fans of snapshots while handling data processing, so we err on the side of caution and turn them all off before taking a snapshot.

After the snapshots were captured, we powered up the database server first. Then we powered up the Main Polling Engine, the Additional Polling Engines, and the Additional Web Servers, waiting for each to come online before moving onto the next.

12:15 a.m.

I installed the aforementioned .NET Framework update and rebooted all the Orion Platform machines. After the reboot, I validated they were all recognizing each other, polling was taking place, and the web interface was fully operational. After this confirmation, it was time to start on the Orion Platform products upgrade.

I began the install using the Orion installer I got from my customer portal, and we were off to the races.

The complete install of all six products (not just NPM) took approximately 40 minutes. Considering I was two releases behind for a couple of the products, I was pleasantly surprised with the speed. Could it be faster? Of course, it could. I wish every installation I touch would be faster. Needless to say, I was very pleased with only taking 40 minutes to upgrade all products, as there was a time when a single product would take two or three times as long to upgrade. (More on that below.)

12:55 a.m.

After the product installation, the Configuration Wizard kicked off automatically to build the website, modify the database, and adjust the services. I confirmed the settings from before were in place and we were ready to go.

12:56 a.m.

The Configuration Wizard reported an error and stopped. I’ve been using Orion Platform products for many years, so I looked into some likely places for problems. I looked at the Configuration Wizard log, saw a thing or two I could tweak, and ran it again. It failed again.

1:18 a.m.

Realizing this was an issue outside my knowledge, I did the only correct thing: I contacted Support. From my Customer Portal, I logged in and opened a ticket online, so I could provide all my information in advance. I then called in to the support telephone number and referenced that ticket.

We did a screenshare and a modification or two needed to be made directly to my database. Honestly, I didn’t notice the specifics (I’m only an accidental DBA), but it was definitely something I would not have tried myself.

1:44 a.m.

Together with the support rep, we re-ran the Configuration Wizard and this time it completed. I thanked Alexis and I was able to log in to the web console on the Main Polling Engine.

I went to My Orion Deployment, then the Upgrades and Evaluations tab and ran the updates to all four of my other servers. Not one at a time—in parallel! They completed in under 30 minutes for four servers—including the Configuration Wizard.

2:07 a.m.

My entire infrastructure is now upgraded, and I can mark the change as complete.

My Previous Upgrade Experiences

Before I go on about this new build, I wanted to talk a quick minute about upgrades of the past. It seems like just a couple of releases ago when the new installer was released. I thought things couldn’t get much better. After the release of this consolidated installer, we were able to run all product upgrades in a single maintenance window from day one. Prior to that, well, YIKES! I can remember doing upgrades in the long-ago times, and those were not only unforgiving, but they took multiple maintenance windows to complete.

My Upgrade Experience

This may have very well been the fastest, easiest, and most trouble-free upgrade (and that’s with a Support call) I have ever experienced. The work SolarWinds has done over the last few years really shows with how easy my upgrade process was. I noticed something else: any available hotfixes were already slipstreamed into my install. It couldn’t get much simpler.

First Impressions of 2020.2

One word: fast. The web interface feels so much faster than my previous version. I don’t have raw data to prove this, but I’ve been using Orion Platform products for years, and this feels like it’s trying to win a race.

I haven’t had time to explore all the new features in all six of my products, but I’m sure I’ll get to them in due course. There are specific features important to me (modern dashboards) and important to the company (Orion Maps).

Work Still to Do

Since we have been using SolarWinds products, which has probably been over 15 years now, we’ve introduced some levels of customization. To their credit, SolarWinds has implemented enough features over the years, we’ve been able to do away with a large portion of “custom mods.” We still have a few in-house files/mods we keep, but most of our modifications are just JavaScript additions/tweaks. And, thanks to the wonderful folks sharing their knowledge and experience via THWACK, I’ve been able to put together a few PowerShell scripts to help me manage those mods. Additionally, by monitoring my Orion servers with Server Configuration Monitor, I can fill in the gaps quickly. The product keeps track of all the files we need to touch in the website files (since running the config wizard reverts those changes). I just have to look at the SCM tracking of those files and I know which files need to be changed to re-implement our customizations.

Summary

In under 2½ hours (including opening and working a support case) I was able to upgrade to the Orion Platform 2020.2 product releases for five servers monitoring over 16,000 network elements. SolarWinds has made it simpler, easier, and quicker to do upgrades. I’m anxious to start playing with some of the new features. If you’ve upgraded to 2020.2, what new feature is your favorite?

22 Comments
Level 16

Hi @wluther 

Not yet I'm on 2019.4 thanks for that fine update..

Just find my self "complain" about 11.5 time fly

😂🤣

 

https://thwack.solarwinds.com/t5/NPM-Discussions/11-5-1-on-the-way-Just-not-brave-enough-to-that-11-...

 

@wlutheryou da man! Awesome write up! Including those hyperlinks so others can follow through for their own environment was perfection 😎

I echo my fellow MVP's comments. The addition of Centralised Upgrades to the platform makes upgrades to Orion Plafrom 2018.4 has made upgrades SO much simpler, and much quicker to perform.

I tend to run N-1, until the latest release has it's first batch of hotfixes, but sometimes those new features are SO good, or bridge a gap in your current monitoring, that you want to upgrade as soon as they hit General Availability. If you do this, my advice is to review the known issues for the modules you have, and the Orion platform release, to ensure you won't run into any know gotchas.

MVP
MVP

Yeah, @sja , I got bit hard by that 11.5 upgrade too, like a bunch of others. That was one of the worst for us, as we lost several weeks worth of data due to downtime, and eventually reinstalling previous versions and reloading backed up db. This upgrade, in contrast, was it's nuclear opposite, and was one of the smoothest I've ever had the pleasure of doing. They have definitely come a long, long way since 11.5, and things are so much better now... No doubt about it!

Level 9

Thanks for the write up @wluther, that makes me feel more confident moving forward with my upgrade next week.  But I'm still worried I'll have your same experience and need a support case to actually get through it.  I've had the same experience multiple times where opening a case and they do a tweak to the database is the only way to get through the configuration wizard.  It would be a major improvement if these inconsistencies / irregularities that need tweaking could be detected before the upgrade begins.

Level 13

I wish our upgrade was that easy, but we are building out a whole new instance from scratch because of the large number of changes we are being required to go through.  These are internal company things mostly, not SolarWinds specific other than a new Windows OS.  We have been building the new instance since December and it isn't live yet.  

MVP
MVP

@nerengm I think there is definitely something to having a healthy dose of worry when doing any upgrade. If anything, at least that helps you to verify you are actually thinking about it. I always feel if I'm not worried about an upgrade, then I probably shouldn't be doing the upgrade.

Over the years I have found that having a good plan, especially when it's upgrade time, is usually the most important part of it all. With each Orion ecosystem being different than the next person's, there are always those unseen gremlins hiding about. But, if nothing else, a solid plan can help identify outstanding tasks required, or sometimes even overlooked, as well as skirt around some unpleasant issues before they ever happen. I have also learned to always have support on speed dial, and a browser tab opened to the sw webex link, just in case. When support answers my call, I greet them, immediately followed with "Okay, what is the meeting ID?"

MVP
MVP

@jhandberg Building out a whole new instance from scratch is actually one of the best ways to do it. I've been allowed to do that 1 time, which was a few years back, and it was great. I mean, yes, there was definitely, what felt like, way more work to be done. But, once again, going back to planning, I probably spent more time planning it than I did working on it. While I don't remember every exact metric of the project, I think I had done 2-ish months (possibly more) of prep work, planning, and testing. I was able to keep our existing servers up, online, and working during this process. (This is where I made our account rep work for his monies... having him run all over for license extensions every couple of weeks without me having to stop to bug him, which worked great.) I had broken down everything in our previous system into smaller groups, then used the Orion SDK to basically export each subset of nodes/settings from the old system, and import them into the new system (SDK via PowerShell).

It might seem like a bunch of additional work but, in my opinion, a clean install usually runs well enough to justify the work. Best of luck to you with your upgrade/rebuild. Let us know if you run into any issues. And when you're all done, take some time to tell your story and celebrate with the community.

Level 13

@wluther I do think building this whole new instance is the best thing in our situation, and support has been helpful with the temp licensing.  It has just dragged out.  I have to maintain changes in both systems until we can switch over.

Maybe some might find the path for this amusing, cringe, or be happy they can just upgrade.  But here we go, this has been our experience where the worst part of it has been and still is internal silos and politics.

Our SolarWinds Upgrade Experience - Orion Platform 2018.2 to 2020.2  (planning started a little over a year ago, work started about November 2019)

Issues:
Upgrading requires a Windows OS upgrade.
We are aware of certain database issues/inconsistencies. There are enough of these that we decided the best path is to start from scratch rather than transfer a corrupt database.

Example:

We have had support calls where they try to run a script against the database but the tables are not there or is different than expected. This is believed to be due to this database being carried forward since the system was build several years/versions ago, not to mention previous admins simply promoted a lab instance to production, and the Configuration Wizard didn't fully normalize table structure.

We also have several orphaned database items be it maps, interfaces or whatever that we can't clean up or remove with the scripts provided by SolarWinds. The scripts error out with Table not found.

Issues due to infrastructure and team procedures we were not able to be excluded from:
The VM team WILL NOT do an in place OS upgrade. Scrap and build a new VM is the only thing they allow.

Oh, by the way, that domain you are using is going away, have to put it in this new domain x.

Oh, by the way, we have a new Software Defined Data Center so these can't go in the old VM infrastructure.

Oh, by the way, the new SDDC can't use the old IP space either, so here is your new IP subnet.

Issues caused by all of that:

New firewall requests need to be made to allow the new Orion instance, main engine, 11 APEs, 4 WPM Player servers, 4 AWS to get to any of the gear it used to be able to reach. (partially complete)

Oh, by the way, most of the gear is NOT in this new domain, so good luck figuring out the cross domain credentials when there are no trusts built. (big issues here for WMI/WinRM)

Oh, by the way, the network teams need to change all the ACLs on their gear to allow the SNMP monitoring of them. This may be a priority project for you, but not them. (partially complete)

FYI, there are no less than 5 different network teams to deal with, and that is only with our Agency. (have 3 of these teams still not getting their part done)

Yes, sadly, we are a multi-agency service that isn't fully centralized yet. (some other agencies are still pending as well for SNMP/WMI/WinRM access, not to mention they are in different, unconnected AD domains)

The only good thing about this new domain is it is supposed to be a master domain in the AD restructuring across the multiple agencies. But I have been hearing that for over 5 years and it isn't complete, even though we have been pushed there.

MVP
MVP

@jhandberg Wow, you weren't kidding there. Sounds like you got the short of end of the stick. And it seems you are having to work with an insane amount of moving parts, chefs in the kitchen, and hamsters on the wheel.

In my opinion, and with my limited knowledge and experience, I'd say you're doing a great job, having made any progress with all of those obstacles being thrown at you. While I'm sorry you have to go through all that, it is rather impressive to hear you are not only still standing, but still fighting your way through it. You're like Batman, with all of Arkham against him, fighting to save Gotham. Best of luck to you on your adventure.

May the silos be with you!

Level 9

The thing that struck me about your upgrade was that you called support and got a Tech on the line within 30 minutes. I had an issue during our upgrade, waited on hold for an hour and got bumped to voicemail. I got a call back about 6 hours later.

MVP
MVP

@equator Sorry you had such a horrible experience. Unfortunately, I've certainly been in similar situations before too, as have many others. Waiting for 6 hours to get a call back is completely unacceptable with a downed system, at least in my opinion. Was this upgrade/support experience recent? Not that it could help with the trouble at the time, but did you submit feedback on your support experience?

While I'm not sure if it's just my imagination, or if it actually works better, but I've gotten into the habit of creating my support ticket online first. Then, once it exists, I call into support and choose the option for an existing case. It might be chance, or just a lack of actual data, but it always feels like that method gets a person on the phone more quickly than choosing the option to open a new case over the phone.

Also, I tend to be a bit more technical (in my favor, of course) when it comes to the priority of the ticket. If I'm doing an upgrade, and there is an issue, then my company considers our system to be down, so I select the highest priority. Now, if it's just a broken widget, or something trivial, I certainly don't misuse the priority/severity selection. I've also learned not to hesitate to request an escalation. However, I suppose, in your case, you'd need them to contact you more quickly so you could then request an escalation.

I've also been on support calls with folks who are (maybe) new, or simply not as experienced as others. On the other hand, I've been fortunate enough to have had some rocket surgeons, possibly working directly from within The Matrix, having my issue(s) resolved within minutes. With all the support tickets I've opened over the years, and all the calls I've been on, I'd say support, while not perfect, has improved, considering how large the SolarWinds library of products have grown.

Level 20

Wow Will good to hear it went ok but needing the edit the database to fix something is kinda scary.  I will say generally the upgrade experience is much better than what we dealt with back in 10.x NPM days by a large margin.  We've come a long way.

Bill

Level 9

I upgraded my small dev environment yesterday and it was pretty smooth.  The only issue was having to rebuild the website and re-run the config wizard about 4 times.  This seems to be the case with every upgrade.  Is that just me?

Level 16

We just upgraded to 2020.2.1 and although there were some issues we ran across it was one of the easiest in years. What also helped was the issues we ran into Tech Support already had solutions for so were able to keep the upgrade on track and complete it within our change window. 

Hi @wluther ,

I have upgraded my environment to 2020.2 and facing multiple problems . Post upgrade, i found my SAM details are not showing in License details page and applications stopped polling. Opened a case with support team but before they reached me i decided to reboot my main server once. And problem was resolved after reboot.

Now my Servicenow Integration stopped working and showing status unknown always . Still case is going on with Support team but so far no luck and my auto-ticketing is not working. Event i tried to add service-now dev instance and its adding with status unknown . Any help appreciated.

Level 11

Hi @wluther,

great write up about your experience with the upgrade process!

I did more than 10 upgrades so far for several customers from small deployments to bigger with 15+ engines. Most of them went quite smooth, some went.... well ending in a support case. But overall the new upgrade-system is quite solid! Especially im a huge fan of the centralized upgrade on offline installations. Do the offline installation on the mainpoller, let it finish, switch to webinterface and run the centralized upgrade for every other orion server. Thats really neat and a huge timesaver!

One issue i ran into quite frequently is the configuration wizard failing because of some "Access denied" issues, mostly related to processes not shutdown properly or some files that are still in use. Nothing to complicated to fix but quite annoying if it happens on 8 out 10 engines and you have to re-run the config wizard several times. So i can highly recommend a clean reboot of the servers before the upgrade, so you can avoid those problems.

All in all, a nice improvement from Solarwinds.

 

Best Regards

Rene

Hello @rfroembgen ,

I also faced config failed issue on my main poller during upgrade and i involved solarwinds support . After few failed tries he changed services status from automatically to manually and then it smoothly completed the config.

Then we updated remote poller from web which quit easy and simple. i like that.

Now facing snow integration issue , i am able to add my instance but status is showing unknown due to which we are unable to create snow action in alerts because it's not reading instance is up. Any help appreciated. 🙂

Level 11

Hi @alankar.srivastava,

my SNOW instance is working fine after the upgrade.

Did you already check the logfiles for the integration? you can find them at this location: C:\ProgramData\SolarWinds\Logs\ESI

There are more informations included then the webinterface will provide you (error codes, descriptions etc.) so its worth to look at.

Best Regards

Rene

Hi @rfroembgen ,

No , Support asked me share below logs.

Please collect the log file using the information below

Debug logs are located in the following folder: C:\ProgramData\SolarWinds\Logs\Orion\
Logs from Solarwinds Orion:

  1. Open SolarWinds Log Adjuster and go to the Incident Services Integration / Business Layer section. 
  2. Change the control field to DEBUG, and then click Apply.

Replicate the issue, collect the diagnostics and send them to SolarWinds support.

Thanks,

Alankar

Level 9

 Good write up, but sometimes I get jealous reading these because yall have the ability to run updates from Network. I work on closed networks and the process is no where as easy as this. 😥

Level 14

Awesome write up on this.  I am planning on upgrading our Orion 2018.2 (NPM 12.3) early next year, so this was a great read.  I'm assuming I will need to use the online installer on the Main Poller and APEs as the centralized upgrades was introduced in 2019.2, so it looks like I have one last upgrade using the installers on the APEs.

You had mentioned that as of 2019.4, .NET 4.8 is required.  Can that be installed and work with 2018.2?  I wasn't sure if I could install that ahead of time and run our current version just fine.

Level 8

Hi @bharris1 

It's been a while for me but it seems like the .NET 4.8 was included in the installer package.  Check for a reference on that.  Anyone please correct me on that if I'm off.  Best of luck either way!
-KB

Late edit: that may have been a Windows update package, sorry.

About the Author
If it's not broke, then fix it until it is...