Thank you for this in depth feedback. We are working on improving the timeout architecture for the remote APEs, so I've flipped that feature request to what we're working on. I hope to hear your feedback the next time you're upgrading to see how we've done to improve that scenario for you.
I think I owe my problem free upgrade to Serena's advice about uninstalling IPAM first. I followed this instruction on my primary Orion server and all additional pollers. I did not see this advice in any documentation and because of my installed versions the Product Update Advisor tool wasn't very helpful this time around. Thank you Serena!
I upgraded from Orion Platform 2016.2.100, WPM 2.2.1, IPAM 4.3.2, NPM 12.0.1, DPA 10.2.0, QoE 2.2.0, VIM 7.0.0, SAM 6.3.0, NetPath 1.0.1 to without any problems at all. I have 2 additional webservers and pollers plus a VMAN appliance and I think I was all done in less than 4 hours. Great experience with the new installer and it definitely pays to read Thwack!
2 of 2 people found this helpful
Our upgrade from previous versions went extremely well, the web based online and all in one experience started with a check of the environment. We have an additional poller and a VM with NTA which it detected and told me about the fastbit upgrade which I needed to do after along with the additional poller.
It did the complete upgrade and then run through the configuration wizard. We have
So took all in all about 30-40 minutes (as I wasn't really checking)
Sucked the main poller install down to the additional poller went fine then the config wiz runs.
Upgraded NTA server which went fine
We has warnings about our server:
DISCOVERED: 15/11/2017 21:33:36
DESCRIPTION: Your environment does not meet the minimum CPU requirements for the following products: Web Performance Monitor 2.2.1. Performance may be slower than expected.
RESOLUTION: For better performance ensure that you have more than 4 CPU cores that are faster than 3000MHz.
but have installed it all - we had issues with a service stopping and stopping after the upgrade but tech support helped which was this, and doing this helped alot!
Wow this is interesting extra information. I'm wondering now how these settings might help some of my other busy servers that do a lot of network activity!
After going through the same upgrade, I found several items that improve performance by adjusting them:
- Ensure failing NetPath monitors are removed.
- Reduce logging to NPM and offload it to a SIEM. I can easily send seven or eight million syslog messages per second (yes, PER SECOND!) to NPM, and NPM's SLX onboard syslog environment only scales up to two million per second. Overloading NPM with syslog messages results in some pretty quirky behavior; save yourself the trouble and send big syslog outputs elsewhere. Our Security Admins want our 80+ ASA's syslogging in Debug Mode, and our WLC's do the same. Both of those kinds of appliances generate far more syslog information than NPM's syslog can handle.
- Ensure hot fixes and patches are applied. I wish SW had a one-touch button that automatically went out and discovered my products and then discovered their patches & hot fixes, and showed me what to install, and that even offered a way to automate the process with a single click if I wanted. Applying the right updates & patches & fixes, in the correct order, and rebooting and distributing the updates to all APE's in the right order, too--well, that would be heaven.
- Over the last fourteen years I'd retired a couple of main NPM instances and converted them to APE's. But the upgrades / installs over that time didn't reveal that NTA still thought those two main instances should be primary NTA servers. Only yesterday did I discover that, and learned the process to remove them. That improved performance, too.
- Cleaning up nodes that send syslog is also very useful. We'd had an environment that operated off of resilience and duplication, so switches & routers & firewalls were reporting syslogs to two different APE's, AND to two different SIEM's. Reducing that duplication at the nodes has improved NPM's performance.
The list of tweaks goes on and on, but every little modification shows a corresponding improvement in NPM's performance. But I'd be lying if I said I wouldn't want a few Solarwinds second-level engineers to come and go through my environment with a fine-toothed comb and make recommendations/suggestions, or actually do tweaks and clean-ups for me.
I feel the same way about the upgrading... it's getting better but really isn't there yet.
1 of 1 people found this helpful
Few further issues that I'd like to share since our upgrade.
To recap our upgrade included replacing the primary server with a new 2012 R2 node and a clean install. Turns out the installation included some hotfixes but not all. SAM 6.4 HF 2 was not installed and consequently we were unable to gather windows mount volume metrics, to our surprise after the install. The hotfix was installed on the APEs as they were upgraded and not replaced with a new node. Seeing the HF installed on the APEs and not the primary added to the confusion.
SAM 6.4 Hotfix 2 resolved the Windows Mount issue. A few days after that installation we realized we had issues with RabbitMQ not responding and the backup MSMQ overloading and filling up the HDD. This leak also caused intermittent SAM monitors stopping with ephemeral ports being exhausted.
A complete uninstall and reinstall of RabbitMQ, along with running the configuration wizard on the primary engine with all the APEs shutdown resolved our issues. It was important for all the APEs to be shutdown.
We have been running stable now for a week.
One note I'd like to pass along is I would love to see better release notes that specifically call out which hotfixes are included in the Core roll-up Hotfix bundle.
Agreed. Particularly when patching/upgrading/creating/changing APE's.
Each time I patch / install / upgrade, the Config Wizard generates two windows for upgrades. That seems like a bug to me, but I run them each sequentially. The polling engine updates have the same issue.
How unfortunate that parts of, or all of, APE's must be shut down to correctly perform an update or a fix. I'm fighting that same challenge right now.
So how did your own upgrade go?
The two window thing doesn't sound good.
1 of 1 people found this helpful
We've corrected that in releases since the time that Richard posted that comment.
To be perfectly honest, it is going the wrong direction. The new "manage nodes" interface I shutoff, I revert to the old. OpenGL issues, when I try to open the "new" manage nodes section, that stopped after a patch, but still the interface doesn't let me do any admin work faster. It is all more clicking, more gui navigaation that doesn't let the easy access to custom properties for sorting....total downgrade in GUI.
The new "edit views" is awful, hangs up and times out 7/8ths of the time.
Just for some clarity are you saying that the actual product is going in the wrong direction, rather than the upgrade experience itself? Both are interesting points of feedback.
For the new manage entities, there's still a fair amount of work to be done on that page which is why it's not replacing the default views. Until it has feature parity we of course would not replace the default.
The edit views, it sounds like you're running into something unexpected, have you created a support ticket for this?
1 of 1 people found this helpful
As Serena noted, that problem is corrected. She demonstrated that with me two days ago, and I must say that the ENTIRE upgrade/patching process is faster, has fewer questions raised, is more complete . . . in fact, it is better in every way!
Kudos go to SW and serena for their caring and efficiency and improvements!