cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Let's talk about Solarwinds High Availability. Do you want HA? Why? Do you have it? Why'd you get it? Does it work well?

If you don't have Solarwinds HA:

  • Why not?
    • Cost?
    • Your company doesn't believe monitoring is important enough to support (with licenses, employee setup/support hours, hardware environment, etc.)?
    • You've never thought about it?
    • You just don't need it?
    • You don't have time to set it up or to maintain it?

If you DO have Solarwinds HA:

  • Why did you get it?
  • How did you convince your company it is necessary?
  • How satisfied are you with it?
  • Have you set up HA for ALL your polling engines?
  • What would you change about it?
  • Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?

You can check out some of the HA views on your own Main Poller here:

pastedImage_0.png

  • https://<YOUR SOLARWINDS MAIN POLLER ADDRESS OR DNS>/ui/ha/settings
  • https://<YOUR SOLARWINDS MAIN POLLER ADDRESS OR DNS>/ui/ha/summary
Tags (1)
55 Replies
Level 10

I found this post doing some research on HA.  We implemented HA in QA and PR.  Here is my feedback:

  • Why did you get it? Our customer wanted high availability.
    • We had skepticism but the customer wanted to be able to check off the box  to say their services are redundant.
  • How did you convince your company it is necessary? 
    • Same as above, customer demand.
  • How satisfied are you with it?
    • So far the HA service works pretty good using BIND DNS.  We also have SQL HA.   We had an issue where both servers went into standby mode and we had to put in a support call to get the main poller back in active mode.
    • The best use of HA is for Microsoft/operating system patching.  We can move servers to standby while we patch.
    • The other use case I can think of for HA is if one of the main pollers has some type of o/s corruption.  Having a standby can make this seamless.  We had one case with NPM 12.0 where the main poller became corrupt, but with Windows Server 2016 and NPM 12.2 we have seen a lot more stability.  Even so it's an advantage.
    • Make sure you implement this in a lower life cycle that closely resembles your production environment.  We had to purchase an SL100 license for QA; otherwise you only get a 30 day evaluation.  It's important for us to do our upgrades in QA first prior to PR so we need a persistent QA environment.
  • Have you set up HA for ALL your polling engines?
    • Yes
  • What would you change about it?
    • This product does not offer HA during Solarwinds upgrades or patches.  This is a huge drawback.
    • We would like a better DR recovery process.  We would like to have the ability to bring a 3rd server online and replace this as a standby or active primary poller; then remove the original server.
  • Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?
    • no

@wlouisharri,sI see you use BIND DNS, what is your DNS system (I'm hoping you'd say Infoblox)?

0 Kudos

We use Bluecat DNS.  We had to end up using a 128 bit MD5 key to get it to work.  Of course the newer security standard is SHA2(256 bit)keys.  Basically we had our DNS team create a new zone for Solarwinds called solarwinds-ha; then they had to delegate us to have root privileges within that zone.  From a process standpoint it was easier for us to use BIND DNS, but most of the internal Solarwinds support documentation is written with Active Directory DNS in mind.  We spent quite a bit of time with support, our DNS team, and trial and error in QA to get it to work.

Right now we use HA more like disaster recovery which is really quite an expensive DR solution.

Best of luck.

Many customers use Infoblox with Orion HA.

0 Kudos
Level 13

Bump for more experience and feedback in the field with HA.

Thanks

Level 9

I'm going to perform another batch of upgrades within a month so hopefully I see the same experience. My last upgrade in October ran just under 4 hours when running the installer and config wizard on my primary server and 3 APEs.

I foresee a positive experience, assuming your versions are not too many generations behind.  The latest installers / hotfixes are pretty good stuff.

0 Kudos

rschroeder Sweet! I'm looking forward to it. We're not far behind so my expectations are a bit uplifted from the last experience.

Orion Platform 2017.3, QoE 2.4, CloudMonitoring 1.0.0, SAM 6.4.0, NPM 12.2, IPAM 4.5.2, SRM 6.5.0, NTA 4.2.3, VIM 8.0.0, DPAIM 11.0.0, NCM 7.7, NetPath 1.1.2

aLTeReGo​  HAHAHA! Definitely took advantage of running them in parallel the last go around. That would be mind numbing to have to run them in serial.

jlhartsock​ keep in mind the *APE* installer has changed if you're not on at least NPM 12.01. See: NPM 12.0.1 Release Notes - SolarWinds Worldwide, LLC. Help and Support .  First you upgrade your primary and then you can download the new APE which is a smart installer and basically handles the rest. If you haven't upgraded in four months I'm wondering how far back your updates are. That might make things take longer.

Note that you can run the Configuration Wizard on all your Additional Polling Engines in parallel now, so there's no need to perform this function serially. Unless of course you also enjoy watching paint dry.

I was as 12.0.1 in September; when upgrading to 12.2, I found doing four APE's simultaneously caused problems.  It was better to let the first APE complete downloading its Polling Engine files/updates from the Main Poller before starting a second APE, or a third or fourth.

Now, with much-improved updaters, I think the whole process must be much improved.

0 Kudos

I agree on that point.  We have experienced troubles starting the process on all APE's at ones.  But once the files are downloaded one at a time, they seem to all run.

John Handberg

Maybe aLTeReGo​ has info about multi-threading the Polling Engines downloads from the Main Poller, after it's been upgraded & patched, that would enable simultaneous downloads from it to multiple APE's . . . .?

That would be a NICE product upgrade, and a time-saver during hotfixes and updates.

0 Kudos

Parallel execution of Scalability Engines Installers should only be done using the latest Scalability Engines Installer released with Hotfix 4. A neat trick is that you can use the latest version of the Product Module Installer to also install/upgrade Scalability Engines. Better still, you can use that installer to install/upgrade your scalability engines while the Configuration Wizard is still running on the main Orion server to cut even more time off your upgrades. This wasn't possible previously for a whole host of reasons, not the least of which is you couldn't yet access the Orion web interface to download the Scalability Engines Installer.

There's all sorts of crazy awesome stuff going on with Install/Upgrade these days. All of which is driven a direct result of feedback from people like yourself that have toiled their way through multi-hour/day upgrades in previous releases. We sincerely hope you've noticed the improvements we've made, as there has been a tremendous amount of effort placed on making this process easier for everyone, and we ain't done yet!

Does Hotfix 4 cover upgrades for NAM licensed setups?  Or do I need to wait for a NAM-flavored Hotfix 4?

0 Kudos

MSP4 should apply equally to NAM/NOM as it does to the individual products.

Thank you.

0 Kudos

I did a UX feedback session a few months ago relating to proposed improvements to doing upgrades in Orion.  I was thoroughly impressed at how much logic and use cases they had already built into the example we were working with.  I expect really wonderful things to come from that pipeline sooner or later. 

- Marc Netterfield, Github
0 Kudos
Level 9

If you don't have Solarwinds HA:

  • Why not?The lack of support for zero downtime upgrades turned this effort into a non-starter for the conversation within my company.
    • Cost? The cost could be easily justified if we did not incur outages during upgrades. However, since that wasn't on the table, I couldn't provide a justification for the additional cost.
    • Your company doesn't believe monitoring is important enough to support (with licenses, employee setup/support hours, hardware environment, etc.)? See statement above.
    • You've never thought about it? I was extremely excited when I learned that SolarWinds had an HA option after I had taken over the platform last year. This was quickly extinguished when I met with the sales team and found that we would still experience the 4+ hours outages for upgrades.

jlhartsock​ most upgrades now range in the 30m-1h ish, primary or APE. They have a new installer and it's beyond significantly faster. serena​ can probably vouch for how much the installer has made a difference but I don't know if she can divulge average install time improvements as percentage.