Why Upgrading Is So Important
TL;DR: I talk a lot about how upgrades have gotten better from a customer perspective. But the gist is you should upgrade because it’s free, there are fewer technical barriers, you get bug fixes, performance improvements, new features, and higher scalability. Go to your Customer Portal and get the new versions now.
For those who don’t know me, I was a customer for several years before working for SolarWinds. In that time, I probably upgraded my Orion Platform products between ten and twenty times. New releases would come out, I’d read through the release notes, plan the upgrade, prep for the upgrade, block eight to 12 hours during a change window, and then set to the work. I’m a fairly technical guy and loved the meticulous planning and execution of an upgrade. For me, an upgrade (of any system, not just the monitoring solution) was a personal test. You know what I didn’t like? Staring at progress bars.
That was then. Progress bars still exist, and I can tell you with near 100% certainty that they’re not going away. Why? Because they provide feedback during times when things look static. What I can promise is a SolarWinds goal is to get you to stare at them far less.
Let’s start this discussion with the first question I get asked most often.
Let me get this out of the way: upgrades are free while you’re on active maintenance. Access to upgrades and features are your right as our customer. You deserve the latest and greatest, so let’s talk about why you should plan an upgrade.
Let’s examine your current setup. Yes, your environment is probably working “just fine.” But “fine” is not “well.” Our goal is to have everyone have the best experience possible. You can read all the release notes and product blogs about the features added to the products (and I wholeheartedly encourage you to do so), but there are some other benefits.
For me, the most exciting improvements aren’t necessarily the features (although Azure Network monitoring is pretty cool). For me, it’s always been about improving the system’s overall performance and scalability. Long story short, page load times are down with each version and now a single Orion instance scales to 1 million elements.
Some Background on My Experiences
The organization where I came from had the Orion main polling engine, an Additional Polling Engine (APE) in our second data center, and two additional web (AWS) behind a load balancer. In that environment, we ran seven products: Network Performance Monitor (NPM), Server & Application Monitor (SAM), NetFlow Traffic Analyzer (NTA), Network Configuration Manager (NCM), IP Address Manager (IPAM), User Device Tracker (UDT), and VoIP & Network Quality Manager (VNQM).
My organization wasn’t alone in having an installation of this size. We benefited in having a monitoring installation of this size because the larger your monitoring footprint, the more details you have about your organization’s infrastructure and the systems with which your users interact. Basically, you’re better informed and can take data-driven actions.
Using multiple products will give you the best view of your IT landscape, but in “olden times” this meant you had to run an upgrade for each. What did this mean for us? Seven sets of downloaded bits, seven sets of release notes, seven worries about the order of operations, and (quite probably) multiple change windows. Oh, and it all had to be done for all three types of servers (main polling engine, additional polling engine, and additional web ). So, just the download of software bits was [7 products × 3 server types = 21 individual downloads], which were then distributed to the servers. And that was assuming I could do a direct upgrade. If we skipped a previous upgrade, then this just got even more complex.
Just for scale here—the “olden times” was 2016. If you did upgrades in those days, you know the pain of a large distributed upgrade. Any customer who had four products would be forced to run four upgrades (one for each product) and then wait for the database schema changes. If you had scalability engines (additional polling engines or additional web ), this just reinforced how non-ideal this upgrade scenario was. This led to customers putting off upgrades completely. As a former customer, I empathize. At SolarWinds, we had to do better, and we did.
About three years ago, SolarWinds began to really take a hard look at how we handled upgrades to products on the Orion Platform. For simplicity, we summarized down to three major steps for an upgrade: compatibility checking, bits installation, and database schema updates. In September 2017, we released Network Performance Monitor 12.2 (which included Orion Platform 2017.3) and with it came with our first release of a unified Orion Installer.
Since those first steps, we’ve accelerated improvements to your upgrade experience. A major part was doing away with traditional software versioning numbers. If you follow along on the Upgrade Resource Center, you’ll see we made this change in November 2019. Honestly, it’s easier seeing “2019.4” and knowing your last major upgrade was in the fourth quarter of 2019. So, if you’re running that (as of today) your software version is roughly six months old. Easy, right?
With the NPM release in 2017, we started to really invest in a unified installer for the Orion Platform products. In the successive releases, our customers have seen dramatic improvements in the process. Each new revision offers a new improvement—a pre-flight check before upgrading, slipstreaming the bits to the main server, running the configuration wizard silently, running the installer on additional engines, running the configuration wizard on the additional engines—each of these have been added since the first release. Hopefully you agree—this is an impressive list considering it works for thirteen distinct products.
This made near-zero touch upgrades possible and if you’re interested, you can also investigate near-zero downtime upgrades. Neither of those would be possible without the improvements of the past few years.
We get this question frequently enough to have a THWACKcamp session last year about it. But first, a tale from my past. When I was a customer, I was forced to both upgrade and migrate within the first year I ran Orion Platform products. We were so behind on our server images the default was using Windows Server 2003 and SQL Server 2005. It got to the point where upgrades alone weren’t going to fix the problem. We needed to do a migration. Thankfully, my Support people and I figured out a game plan. If you’re struggling your next steps moving forward, I encourage you to work with Support to devise a plan. If you know it needs to be done, but you can’t commit your own cycles to learning all the ins and outs, you can always leverage Smart Start Assisted Upgrade. In fact, I would have jumped at this offer in my previous role (where monitoring was just one of my many duties).
Planning Your Upgrade
Any upgrade takes good planning, and part of planning is getting your steps outlined. Depending on your environment, this can vary greatly. However, your goal should be to boil it down to a list of checkboxes. Let’s get started.
Read the Release Notes
Reading the Release Notes for the products sounds boring, but it’s an essential part of the process. Knowing about any “gotchas” is important. The one that tripped people up most recently was the upgrade to .NET Framework 4.8. This may read like a small thing but depending on the running software it may require an additional reboot. This is just one example of why reading the release notes is important to a good upgrade.
And it might be one of your favorite parts of an upgrade, like it is mine. I’m one of those people who really enjoys knowing the ins and outs of the systems I use, and it’s another way I can be as informed as possible.
The .NET Framework is a prerequisite, but there are others—specifically regarding operating system and database versions. With the current release, SolarWinds added back support for Windows Server 2012 and SQL Server 2012. In addition, we’ve introduced official support for a number of SQL Express versions (with specific limitations). This was a conscience inclusion for customers bound by organizational requirements. If you want to move to a new version of Windows and SQL, the Orion Platform is also completely supported up to and including the 2019 versions, including the SQL Server 2017 and SQL Server 2019 releases for Linux.
If you don’t want to manage your own database server anymore (because maybe you don’t need “Accidental DBA” on your resume), then there’s also support for Amazon RDS and Microsoft Azure SQL DB databases. In fact, you could move your entire deployment to AWS or Azure, but that’s a topic for another post.
Capture a Diagnostic Before
When you’re ready to get your upgrade going, capture a diagnostic log before you start. This gives you a record of how your environment is operating before the upgrade. I like to do this on any scalability engine. If you have any problem, this is also a nice record of your system before the installation started.
Back Up Your Database(s)
If you’re the de facto DBA for the Orion Platform (read “Accidental DBA”), then you’ll want to make sure you get your backups in order. In my current lab, I’m running SolarWinds Backup for all the servers, including the SQL servers, so I manually kicked off another backup from the Backup Management web interface.
It’s probably obvious, but you should backup any and all databases you’re using as part of your Orion installation. Because I run all the Orion Platform products in my lab, my backup list includes the Orion database, the log database, and the NetFlow database.
One thing not explicitly stated is you should also test your database restores. You know how your backup solution is only as good as your last restore. @sqlrockstar has spoken to this many times, and gave pointers in Six Ways to Protect your Database Backups. You should review this, so you can prevent a “resume-generating event.”
Snapshot the Orion Servers
If your Orion instance is virtualized, then snapshot these servers just before the upgrade. This gives you a quick way to revert if things go sideways. Together with a good database backup/restore, you have a great fallback plan. Just please be aware a virtual machine snapshot is not the same as a backup. (Find link for snapshots != backups)
After the Upgrade
Once your upgrade is complete, then there should be a few other things you do. For me, the first thing is taking another diagnostic of all the Orion servers. This way, I have a before and after record of what’s happening.
The final thing you should do is remove any virtual machine snapshots if you created them. Keeping a snapshot is just wasted disk I/O as the hypervisor needs to deal with the delta disks. Save those IOPS for monitoring instead of unnecessary housekeeping.
My Favorite Feature
Most people probably don’t think about the MIB database too often, right? When I was a customer, we were always behind on our MIB database version. It was so much of a concern that back in 2014 (when I was still a customer) I wrote a SAM template to check the age of the MIB database.
Now, the platform itself will let you know if your MIB database is old. Just go to All Settings, scroll down to the Details area, and click on MIBs Management.
Yes, this isn’t as cool as scaling to one million elements (insert Doctor Evil finger to lip GIF here), monitoring Azure Network Gateways and site-to-site connections, improvements to Orion Maps, or the modern dashboards, but the MIB database feature addresses an extra maintenance step for all customers that’s no longer necessary.
When Good Upgrades Go Bad(ly)
I won’t lie—I’ve had an upgrade or two go sideways. I’ve tried in-place upgrades of things that have gone terribly wrong. I’m not talking specifically about SolarWinds software, I’m talking about any software. Even upgrading my home desktop occasionally went poorly.
If you have ever had an upgrade go bad, the best thing you can do is have a fallback plan. We recently spoke about ITIL principles with regards to change management in a SolarWinds Lab episode and it reminded me of why change management is a good thing. Whether your organization has a full change control board (CCB) or just a handful of stakeholders who want to be kept informed of the various goings on, documenting changes is key. That documentation can be pretty much anything as long as you have a record with a completely documented fallback plan.
Database backups and virtual machine snapshots are worth the time, every time. Just remember to remove the virtual machine snapshot when you are done. Leaving snapshots in play is just wasted I/O dealing with delta disks.
It’s anecdotal but let me give you my own personal numbers. Remember your mileage will most certainly vary. In the olden days (2017 and earlier), I could automate an Orion Platform install for all 13 products on one machine and get it to run cleanly in about four hours. This assumed I already had all the installers on the machine in question. In those days, I thought I was hot stuff.
This last round, I upgraded an existing system with all the Orion Platform products, under load, in an over-provisioned environment in 40 minutes. And 20 of those minutes were the download. Think what your future upgrades will be like now that there’s an option to pre-staging the software download.
Over the past several releases, SolarWinds has made it even easier to perform upgrades to products on the Orion Platform. So, when is your upgrade scheduled?