cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 14

NPM 12.2 Upgrade Experience Feedback?

Jump to solution

Anyone have any upgrade experiences for the general release?

We are looking to upgrade our entire SW suite in addition to NPM and would like to hear any feedback.

Also, is the RC code the same as the GR code?

Thanks in advance

Tags (1)
1 Solution
Product Manager
Product Manager

This thread has been super helpful and given us many conversation points and feedback. However, the thread is now gotten unwieldy and hard to parse through the feedback. I'm locking this thread and for those of you who are participating in our RCs, invite you to come give additional feedback on my post  Orion Platform: Preparing for the Upgrade to 2018.2  in the NPM release candidate forum. Once we are out of RC, I'll be starting a new topic thread for upgrade feedback for NPM 12.3.

Thank you and for those of you with whom I have ongoing conversations, we'll find other avenues to move forward on those topics.

View solution in original post

224 Replies

I have MANY improvements with the latest code versions, and very few caveats or items left to tweak and tune.  I really like the installer that was just updated on 1/16/2018.  It fixed every complaint I had in the installation and APE Polling Engine install areas.

Hopefully that'll make some of my upgrades easier then... it's really all about getting off windows 2008R2 and older SQL Server version VM's for everyone now that product doesn't support them anymore.

0 Kudos

Upgrades are DEFINITELY easier now.  Most of the thinking is done in the background by SW, and they've done a NICE job of it with this week's release/update.

I'm not sure who has access to it yet--my experience seems to have been a beta test, and serena​ supervised it with me.  She indicated that this would soon be published and made available to all, but didn't state when.

It's a keeper!

Level 9

To be perfectly honest, it is going the wrong direction.  The new "manage nodes" interface I shutoff, I revert to the old. OpenGL issues, when I try to open the "new" manage nodes section, that stopped after a patch, but still the interface doesn't let me do any admin work faster. It is all more clicking, more gui navigaation that doesn't let the easy access to custom properties for sorting....total downgrade in GUI.

The new "edit views" is awful, hangs up and times out 7/8ths of the time.

0 Kudos

I can second your observation in regards to the new Manage Entities screen.  Once it was corrected so it actually worked, I used it for a while, just to see if there were efficiencies gained by it. 

And like you, I shut it off.  It is significantly less efficient than the previous solution.  It does not display all the nodes on one screen, and it wastes a lot of screen real estate.  It'll stay off as long as possible, and I hope the original solution is not removed in future releases in favor of the new Manage Entities screen.  Unless, of course, the new Manage Entities screen can show all entities in one screen, and not waste so much space.  Which would make it very much like the original solution.

Just for some clarity are you saying that the actual product is going in the wrong direction, rather than the upgrade experience itself? Both are interesting points of feedback.

For the new manage entities, there's still a fair amount of work to be done on that page which is why it's not replacing the default views. Until it has feature parity we of course would not replace the default.

The edit views, it sounds like you're running into something unexpected, have you created a support ticket for this?

The upgrade went well, the new installer, brilliant. It was nice to only have one file, instead of 4, on each server.

I feel the new Manage Entries page wastes a ton of real estate on large icons, thick rows, and lots of white-space, that annoying square loading box, the new filter part and display make it hard to sort the data. Before I would have polling engine, polling method, or any of the other stuff that makes it easy to sort the list from. I mean, you can add it via filter results, but you could do that already, and then sort again with he columns.

I haven't spoken with support about this issue, only a few others. Time is not something that I have a lot of to sit on the phone. I have installed a second instance of SolarWinds this last week for testing, it has the same problems with the Edit views not populating consistently, and the Manage Entries page being a downgrade in functionality.
I would like to see some mobile friendly management pages. I would be a lot more supportive of the change if it rendered nice on a mobile. I am often in meetings, or on the other side of campus, being able to pop in and adjust something on the mobile would be great. Currently there is a lot of zooming involved.

I would like to know what functionality it gained, if any. Keep up the great work on the installer.

Product Manager
Product Manager

So how did your own upgrade go?

0 Kudos
Level 13

Our Upgrade Experience. We have a semi-large environment with 9 additional polling engines and one additional web server. We went from NPM 12.1, UDT 3.2.4, IPAM 4.3.2, NTA 4.2.2, SRM 6.4.0, NCM 7.6 to the latest versions using the new installer that included the latest hotfixes.

I like to wait until the first batch of hotfixes are released before upgrading, which is the reason for our delay.

Ran this upgrade in conjunction with a replacement of the primary polling engine. We did a 2012 R2 server swap for the primary engine. All the other APEs, Fastbit DB server, and AWS have been upgraded prior to 2012 R2 over the last few months. The migration method we followed kept it simple by using the same hostname and IP.

Total outage time was 8.5 Hours.

First 3 hours

We upgraded the NTA fastbit server to the latest version. Next removed the licences from the primary server for our modules and shutdown the services on all the APEs and primary polling engine. We exported the primary server's SSL cert from IIS Manager and copied the legacy reports folder as outlined in the migration guide.

The rest of the first few hours went fairly went well with the primary server hostname and IP swap to a new 2012 R2 node, and the initial install. Loved the "check what you want install" method which automatically included the latest hotfixes. There was one warning on the install where NCM wanted 3.0 GHz processors, instead of the 2.7 GHz ones that are presented to the node. We present 20 cores to the primary engine and have never had a CPU issue so we kept moving forward. One thing that popped up was the NTA install wanted to use a local drive for the initial install, instead of the fastbit server. We did not have the option to select a separate fastbit database server.

This first issue was fixed by downloading and running the seperate NTA 4.2.3 install module on the primary engine and selecting the separate server / database option.

We ran the config wizard 3 x on the primary server, which extended the install time. (1st with initial install, 2nd for NTA installation, and 3rd because the NTA installation caused a website runtime error where we couldn't generate the website). This was the first of two calls to support we had. I opened a case online, then called in and was able to get someone right away. Turned out the fix was to run the config wizard a third time.

Once the primary engine's website was presented we started the upgrade of the 9 x additional polling engines. Eight of these installs went great. The last one turned out to be a problem. We used the new Scalability Installer retrieved from our local Orion server/website. Settings - Polling Engines - "Download Installer Now".

Eight of our polling engines are local to the primary engine, and last one is remote in a separate datacenter. The local APE upgrades all executed without incident. We ran the updates all at the same time and they all completed successfully.

The last remote polling engine kept on timing out, and we opened another support call, first by creating the ticket online then directly reaching out and calling support. Support hold time was about 25 mins before we were able to start working on the issue. We spent the next 4 hours working with support an trying to get the files copied over, circumventing the copy process from the installer. We used a new technique as the one outlined on thwack and in the current KB didn't resolve the issue. Failed to download or run the Scalability Engine installer from the main server - SolarWinds Worldwi...

Our suspicion was that the KB only dealt with the largest Core MSI, the eventual solution was to follow the KB except that we copied the subinstallers folder out of %temp%\SWOrionSetup, whenever the timeout occurred. Coping the files before exiting the installer allowed us a head-start on the next time we tried the install. We then copied the subinstallers folder back to %temp%\SWOrionSetup and updated the time stamp folder name, then ran the installer again, repeating as necessary until all the installation files were copied.

Working with Brian from support was a great experience. Kudos to him

The last issue that we had was a minor one with the additional web server installation. The initial installation hung on the NCM Integration Module uninstall. A restart of the installer solved that issue.

All-in-all we strayed a half hour out of my planned 8 hour outage window, and I am pleased with the responsiveness of the Website while exploring the new features. I submitted an enhancement request to increase the timeout or allow an offline upgrade for APE.

Few further issues that I'd like to share since our upgrade.

To recap our upgrade included replacing the primary server with a new 2012 R2 node and a clean install. Turns out the installation included some hotfixes but not all. SAM 6.4 HF 2 was not installed and consequently we were unable to gather windows mount volume metrics, to our surprise after the install. The hotfix was installed on the APEs as they were upgraded and not replaced with a new node. Seeing the HF installed on the APEs and not the primary added to the confusion.

SAM 6.4 Hotfix 2 resolved the Windows Mount issue. A few days after that installation we realized we had issues with RabbitMQ not responding and the backup MSMQ overloading and filling up the HDD. This leak also caused intermittent SAM monitors stopping with ephemeral ports being exhausted.

Ephemeral Port Exhaustion - SolarWinds Worldwide, LLC. Help and Support

A complete uninstall and reinstall of RabbitMQ, along with running the configuration wizard on the primary engine with all the APEs shutdown resolved our issues. It was important for all the APEs to be shutdown.

We have been running stable now for a week.

One note I'd like to pass along is I would love to see better release notes that specifically call out which hotfixes are included in the Core roll-up Hotfix bundle.

How unfortunate that parts of, or all of, APE's must be shut down to correctly perform an update or a fix.  I'm fighting that same challenge right now.

0 Kudos

Thank you for this in depth feedback. We are working on improving the timeout architecture for the remote APEs, so I've flipped that feature request to what we're working on. I hope to hear your feedback the next time you're upgrading to see how we've done to improve that scenario for you.

Our upgrade from previous versions went extremely well, the web based online and all in one experience started with a check of the environment. We have an additional poller and a VM with NTA which it detected and told me about the fastbit upgrade which I needed to do after along with the additional poller.

It did the complete upgrade and then run through the configuration wizard. We have

pastedImage_0.png

So took all in all about 30-40 minutes (as I wasn't really checking)

Sucked the main poller install down to the additional poller went fine then the config wiz runs.

Upgraded NTA server which went fine

We has warnings about our server:

CPU requirements

DISCOVERED: 15/11/2017 21:33:36

DESCRIPTION: Your environment does not meet the minimum CPU requirements for the following products: Web Performance Monitor 2.2.1. Performance may be slower than expected.

RESOLUTION: For better performance ensure that you have more than 4 CPU cores that are faster than 3000MHz.

but have installed it all - we had issues with a service stopping and stopping after the upgrade but tech support helped which was this, and doing this helped alot!
https://support.solarwinds.com/Success_Center/Server_Application_Monitor_(SAM)/Tweaking_performance_...


Wow this is interesting extra information.  I'm wondering now how these settings might help some of my other busy servers that do a lot of network activity!

0 Kudos

After going through the same upgrade, I found several items that improve performance by adjusting them:

  • Ensure failing NetPath monitors are removed.
  • Reduce logging to NPM and offload it to a SIEM.  I can easily send seven or eight million syslog messages per second (yes, PER SECOND!) to NPM, and NPM's SLX onboard syslog environment only scales up to two million per second.  Overloading NPM with syslog messages results in some pretty quirky behavior; save yourself the trouble and send big syslog outputs elsewhere.  Our Security Admins want our 80+ ASA's syslogging in Debug Mode, and our WLC's do the same.  Both of those kinds of appliances generate far more syslog information than NPM's syslog can handle.
  • Ensure hot fixes and patches are applied.  I wish SW had a one-touch button that automatically went out and discovered my products and then discovered their patches & hot fixes, and showed me what to install, and that even offered a way to automate the process with a single click if I wanted.  Applying the right updates & patches & fixes, in the correct order, and rebooting and distributing the updates to all APE's in the right order, too--well, that would be heaven.
  • Over the last fourteen years I'd retired a couple of main NPM instances and converted them to APE's.  But the upgrades / installs over that time didn't reveal that NTA still thought those two main instances should be primary NTA servers.  Only yesterday did I discover that, and learned the process to remove them.  That improved performance, too.
  • Cleaning up nodes that send syslog is also very useful.  We'd had an environment that operated off of resilience and duplication, so switches & routers & firewalls were reporting syslogs to two different APE's, AND to two different SIEM's.  Reducing that duplication at the nodes has improved NPM's performance.

The list of tweaks goes on and on, but every little modification shows a corresponding improvement in NPM's performance.  But I'd be lying if I said I wouldn't want a few Solarwinds second-level engineers to come and go through my environment with a fine-toothed comb and make recommendations/suggestions, or actually do tweaks and clean-ups for me.

I feel the same way about the upgrading... it's getting better but really isn't there yet.

0 Kudos

Agreed.  Particularly when patching/upgrading/creating/changing APE's.

Each time I patch / install / upgrade, the Config Wizard generates two windows for upgrades.  That seems like a bug to me, but I run them each sequentially.  The polling engine updates have the same issue.

0 Kudos

The two window thing doesn't sound good.

0 Kudos

As Serena noted, that problem is corrected.  She demonstrated that with me two days ago, and I must say that the ENTIRE upgrade/patching process is faster, has fewer questions raised, is more complete . . . in fact, it is better in every way!

Kudos go to SW and serena​ for their caring and efficiency and improvements!

We've corrected that in releases since the time that Richard posted that comment.

Level 12

I went from Orion Platform 2017.1.1 SP1, VIM 7.1.0, VNQM 4.4.0, DPAIM 11.0.0, NPM 12.1, QoE 2.3, NTA 4.2.2, NCM 7.6, NetPath 1.1.0, CloudMonitoring 1.0.0, SAM 6.4.0

to .... wait, I can no longer scrape that from the bottom of the web window. Must be a graphic element now. Reverting to Soviet mode copy and paste (eyeballs, fingers, and keystrokes):

to Orion Platform 2017.3, VIM 8.0.0, VNQM 4.4.1, DPAIM 11.0.0, NPM 12.2, QoE 2.4, NTA 4.2.3, NCM 7.7, NetPath 1.1.2, CloudMonitoring 1.0.0, SAM 6.4.0

The new downloads are somewhat confusing. I normally go through the Upgrade Advisor to determine the exact products I need to download, and then download the full packages, which I guess is called the offline package now. I saw the note that said

IMPORTANT! The latest releases - NPM 12.2, NCM 7.7, SRM 6.5, UDT 3.3, IPAM 4.5.2, VNQM 4.4.1 and NTA 4.2.3– utilize a new SolarWinds Orion installer that handles product compatibility and upgrades for you.

I tried the Online package, which is marked as recommended. I cannot recommend it after my experience. Originally, I was just going to upgrade IPAM to 4.5.2. The Online package made it clear that it was going to upgrade all upgradable products, and that I would have to wait for each download. So I canceled, and tried the Offline. When I went to apply hotfixes, I realize that it wanted Orion Platform 2017.3 Hot Fix 1, which I think implies NPM12.2.

So I checked the Upgrade advisor, and it does not list IPAM 4.5.2 - only 4.5.1, because 4.5.2 is the new style.

So I proceeded to download the offline for that NPM12.2, ran into various severe errors during install, and concluded that I needed to uninstall, then reinstall various existing, old modules.

But the download page no longer lists the old modules. There's a link for Archived HotFixes, but even it is not complete.

The HotFixes listed for current versions are sometimes odd. For IPAM 4.5.2, it lists IP Address Manager v4.5.1 Hot Fix 1. Do I really need that?

I finally reinstalled all previous versions (because I'm a packrat, and maintain on-disk copies for just this eventuality), and decided to do the online install. I got various Config Wizard errors for almost every product. I pressed on, and finally got a run of the Config-Wizard that came up clean. I don't feel at all comfortable with the result.

Problems I would like to see you address:

  • Provide downloads of older versions of products and their hotfixes
  • Provide Release Notes downloads for all products on the download page instead of just at the documentation page
  • Make sure Release Notes list product prerequisites. IPAM does not - it says "IPAM 4.5.2 is an Orion Platform product, and consumes Orion Platform 2017.3."  It did not say that Orion Platform 2017.3 requires NPM 12.2 - I sorta surmised that. I cannot download Orion Platform 2017.3 separately. Odd use of the verb, "consumes", BTW.
  • Make it clear which versions on Orion/Admin/Details/OrionCoreDetails.aspx, the Orion Platform Details page go with which product and which version. Mark shared components as such. List all this is the Release notes.
  • Make sure the Online packages give an estimate of the amount of data to download and can recover from HTTPS failures without a complete abort (we have an HTTPS cache that messes up sometimes).
  • Make sure the Offline packages check prerequisites properly, and issue helpful messages.
  • Make it clear how to do re-installs when using Online install method.
  • Perhaps consider making Offline install packages for the components, like Orion Platform 2017.3.
  • Explain your install logic better so I can make decisions better. When do shared components get re-installed? What are the ramifications of re-installing each component - e.g. which ones require the Config Wizard to run? Which products can I install back to back without running the Config-Wizard after each?

Thanks!