50 Replies Latest reply on Oct 23, 2018 3:23 PM by ecklerwr1

    Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?

    rschroeder

      If you don't have Solarwinds HA:

      • Why not?
        • Cost?
        • Your company doesn't believe monitoring is important enough to support (with licenses, employee setup/support hours, hardware environment, etc.)?
        • You've never thought about it?
        • You just don't need it?
        • You don't have time to set it up or to maintain it?

       

      If you DO have Solarwinds HA:

      • Why did you get it?
      • How did you convince your company it is necessary?
      • How satisfied are you with it?
      • Have you set up HA for ALL your polling engines?
      • What would you change about it?
      • Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?

       

      You can check out some of the HA views on your own Main Poller here:

      • https://<YOUR SOLARWINDS MAIN POLLER ADDRESS OR DNS>/ui/ha/settings
      • https://<YOUR SOLARWINDS MAIN POLLER ADDRESS OR DNS>/ui/ha/summary
        • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
          John Handberg

          We have talked about it since it's introduction and really want to do it.   Budgeting is the issue here right now.  Our monitoring group doesn't have their own official budget and no one has fully figured out where our budget is ultimately going to come from.

            • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
              rschroeder

              jhandberg I was in that same boat until I discovered NAM licensing.  I pay less now for SW support than I did when I had stand-alone versions of NPM, NTA, NCM, and three APE's.  And I got UDT, IPAM, VNQM, HA, and can install up to 20 APE's at no additional cost.

               

               

              Last year I knew I I needed more polling engines.

               

              I also needed UDT for our CMDB.

               

              And I needed VNQM for my Telecom team.

               

              And HA would benefit my team and all our SW customers..

               

              And I wanted to learn more about SW IPAM (I already have Infoblox DNS/DHCP/IPAM, and knew there'd be little budget available to duplicate part of it).

               

              I learned about NAM licensing almost a year ago, and had prepped my IT Director for considering it with the 2018 budget cycle.  I explained how UDT would provide the switch/blade/port info for every device, and that our CMDB could mine the Solarwinds Orion SQL database and extract that information to our LAN Desk solution.  That UDT would also save the Network team from continually having to go through a tedious process to discover the port any device is plugged into (ping the device's address, SSH to the router in that subnet, sho ip arp | in that address, then SSH through a Distribution switch and possibly several Access Switches to the switch).  Instead we just copy & paste the IP address or MAC address of the client, and the switch/blade/port it's attached to shows up.

               

              With that prep in place, last fall I reached out to my SW SW for a quote.  It came back very high, and I said thanks, but I can't bring that to my boss's boss for something unbudgeted.  The SE hemmed & hawed a while, then noted I was Solarwinds Certified, which I verified.  He hemmed & hawed some more, and then discovered I'm a Thwack MVP, and I confirmed that, too.  He trimmed double-digit percentages off the original quote, and I we discussed the ramifications it would have on the annual support contract.  After the initial up-front buy-in, the annual support contract went down, and I ended up with the ability to install up to 20 pollers and monitor up to 100,000 elements.  I brought the quote to the IT Director, reminded him of the project and its benefits, mentioned buying this would decrease our annual Support Contract costs, and went home for the day.

               

              The following morning I found the electronic invoice and the Activation Codes in my Inbox!  My Director had found the funds and made it happen.

               

              The great challenge in IT is always finding time to install and learn what you ask for, but I'd have a pretty good time of it.  I've installed more pollers, IPAM, UDT, and VNQM, and am still waiting to deploy HA.  It's why I created this post.

               

              It's not hard to see HA would solve some issues of various teams not having access to NPM during its upgrades / reboots / hotfixes.  Duplicating all my Pollers is simple--they already run on VMs.  I just haven't asked for the duplicates yet, since I'm in the middle of some bigger projects (replacing core and distribution hardware in my main campus, and replacing 400 Cisco 2960S's with 3850's for ISE compatibility).

               

              Your goal is to find the budget.  You probably read the paper I wrote a few years about that topic.  It still applies:  A Stratagem For Obtaining Funding For Your Projects

               

              Play the game that the decision-makers and budget stake holders play.  Once you find the right buttons to push, and can couch your request in the proper context, using their favorite buzz words, you stand a chance at getting the funds.

               

              Good luck!

               

              Rick

              2 of 2 people found this helpful
            • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
              John Handberg

              When you first posted about your NAM purchase, I forwarded that to my supervisor and he started talking to our sales rep.  None of us are MVPs, so that never came up.  The price quote was significantly higher, no 40% discount.  We also regularly use SAM, WPM, SRM, and VMAN, which were not officially supported on NAM.  We do have 9 APE's and are around 2-3k short of the recommended max elements.  EDIT: just under 20k elements short, but still....

               

              All this monitoring and proactive alerts has become the hot topic, but we have no official budget.  We have to beg and borrow from other silo budgets.  Network is paying part of the maintenance, as is the server group.  We don't have an official charge-back to the departments we monitor devices in.  I think my paycheck even comes from some other agency's budget.  They have talked about sorting all that out since I started here 1 3/4 years ago.  I just shrug my shoulders (state government, what are you going to do) and keep putting out my recommendations.  I will keep recommending HA and maybe NOM or NAM.

               

              Lately my supervisor and I were talking about what we are going to do when we hit that magical 100,000 element max.  Maybe split the networking off into another instance?  Then pull it in under the EOC if that works right?  We are not sure, but we have many more agencies to centralize under our monitoring umbrella yet.  That might be on opportunity to rethink licensing.

                • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                  rschroeder

                  My organization merged with two others a few years ago; each had their own NPM solution in place, so we bought EOC to give visibility into all of them.

                   

                  That was a mistake.

                   

                  It segmented the Network "Team" and facilitated regional members to focus solely on "their" sites, and not help out on other region's sites.

                   

                  I worked a deal with SW to get rid of EOC and convert two Main Polling Instances to APE's, then report everything to one Main Poller.  And just like that, the team was back together, working on every site--simply because they all showed up in the same pane of glass.  Better still, it turns out APE's were less expensive than running three separate main NPM instances, so licensing/support costs dropped when I dropped EOC and converted two of those three servers to APE's.

                   

                  I didn't like EOC because it was much slower at displaying issues than the individual Main Polling instances.   But I can see where it fits in, for organizations that exceed that 100K limit.

                   

                  For my $.02, I'd rather start up a completely separate NPM instance and begin growing it out to its own 100K elements, and have two screens showing both NPM's, than use EOC.  Unless EOC has become more affordable and has much-improved performance.  Then I might look into it when my organization's monitoring needs have grown that much.

                  2 of 2 people found this helpful
                    • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                      John Handberg

                      We have had the EOC for a long time, since before I started here.  Even the newest version needs improvement.  We have tried to create custom views in it for others, but when others log in, they can't see the views.  Something isn't right there, and SolarWinds support hasn't figured out what.  It isn't the ideal solution right now, but we have been told by support that our database tables are very large and may be causing some issues.  We started shortening retention levels to compensate, but they are still large, and we have less historical detail.  

                       

                      If we split off networking, we would kind of silo that, but it might make other module configurations available between instances.  I do wonder about something like site dependencies with mixed network/server/application infrastructure in that scenario.  The other thought was split regional.  Cut the state in half or something.  Time will tell.

                       

                      Budget issues need to be resolved first though.  Ok, that and the state legislature approving my employment contract.  Kind of working without one now.  We might be working under the old that expired last July 1st.  Yes, contract and budget need to be the priority.

                        • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                          mesverrum

                          Ignore this if this doesn't apply to you, but whenever I see orgs who are maxing out the 100k limit I have to ask if there is actual material value in monitoring bandwidth on all those individual interfaces? 

                           

                          A big UDT license would give you up/down/connected device visibility into all the switches, and is a hell of a lot cheaper than standing up a completely second instance/EOC/etc.  Does every endpoint really need that bandwidth history recorded?  I figure that it makes sense to use NPM licenses/resources to monitor everything that is up on the core and distribution layers, but only infrastructure uplinks at the edges.  Remove unused interfaces from NPM until they show as up in a discovery or via UDT.

                          Are you monitoring physical server uplinks from the server OS and the same interface from the switch perspective?  If things are getting tight wouldn't it make sense to consider dropping the interface monitoring at one side or the other? 

                          Have you ever tried to run a report to see how many interfaces you are monitoring that have not passed traffic in over 24 hours?  I bet you would find a lot of things in there that maybe aren't getting a business ROI from monitoring if you really dug into it.

                          I've had to make reports for clients where we itemized the licensing and ongoing maintenance costs associated with monitoring every object in the environment so they could use that as part of the bill back model they were setting up.  Sometimes when you see it in black and red it raises the question of "is this node really worth $345 of ongoing maintenance support"

                           

                          I have never found an instance that I couldn't trim a huge percent of their interface count away without having any negative impact on their actual visibility into problems.  Usually increases admin overhead to keep track of all these things instead of just blanket monitoring everything, but with good use of custom reports and the API you can cut down the admin burden by a mountain.  Even if you need to book training and pro services to get you over that hump the ROI on training tends to be a lot better than standing up a second environment if you could have avoided it.

                           

                           

                           

                          This is also somewhat off track from the HA topic though.  I've installed it many places and like aLTeReGo said it is pretty simple and just works where I've used it.  One thing I wanted to add though, you mentioned seeing HA as a solution during upgrades.  Unfortunately as it exists right now you still don't have the capability to do an upgrade without taking the system down, I didn't want anyone to get their hopes up.  Seems like that would be tricky since some upgrades make structural changes to the database and I expect it would be pretty hard to keep the system hot while you did those.

                           

                          Along the lines of some of the points I made above, HA also needs to be looked at in terms of ROI and impact.  Many orgs don't even allocate a proper budget to do monitoring, let alone setting aside additional money to offset a few hours a year of downtime.  Downtime of your monitoring tool is not downtime of your environment (unless maybe you run an MSP or similar) so the value for HA is in the potential it offers you to be notified of a problem during those windows when your system would otherwise have been down.  If you add up the dollars and cents that is often such an edge case scenario that many people just figure they can accept the risk and move on without it.  I am sure the business impact within a hospital like your environment rschroeder is a lot more dire than it would be for some other types of business so I can definitely appreciate the use case there. 

                           

                          -Marc Netterfield

                              Loop1 Systems: SolarWinds Training and Professional Services

                          2 of 2 people found this helpful
                            • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                              jeilers
                              One thing I wanted to add though, you mentioned seeing HA as a solution during upgrades.  Unfortunately as it exists right now you still don't have the capability to do an upgrade without taking the system down, I didn't want anyone to get their hopes up.  Seems like that would be tricky since some upgrades make structural changes to the database and I expect it would be pretty hard to keep the system hot while you did those.

                               

                              That right there is why we paused pursuing HA. I know it would be tricky but it is the one thing that stopped me looking into it. It feels like a rock and hard place needing monitoring 24/7 but also needing to stay up to date on upgrades. My hope is that a solution will be possible in the future without breaking the bank.

                               

                              I know there are ways around the issue that include migration, or another system handling the minimum but absolutely necessary monitoring. It would just be ideal for HA to have that capability.

                              • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                John Handberg

                                I don't disagree with anything you are saying about interfaces.  The network group here is very specific about what interfaces it wants monitored.  Most of it is up/down, and most of those interfaces are trunk links or links like critical sub-interfaces or VLANs routed across the state.  We don't have UDT.  Server group on the other hand do have unnecessary interfaces monitored.  I have managed to trim those down some, but I am sure they need more trimming. Another thing we might be able to do is pull back on historical data on non-critical interfaces.  The network team is interested in bandwidth capacity monitoring on the various WAN links across the state, but not necessarily on other interfaces.  Most network admins here don't dig too deeply into SolarWinds at all and rely on SolarWinds generating an alert and ticket. 

                                  • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                    designerfx

                                    To mesverrum's point and something I noticed as well - if you don't need to monitor a specific interface for traffic and errors, don't. There are options under individual interfaces in NPM beyond "monitor/not".  Unfortunately NPM is a bit of a fail in that regards in that by default it will select every active interface when you're doing a discovery or list resource on a node. By default it should be off so you can select things. It's funny considering how well a lot of things in solarwinds are done OOTB.

                                     

                                    UDT is much better at (and designed for) tracking interface up/down, what's connected, etc. Unfortunately, I'd tend to agree about the assessment of network admins (a group most core to solarwinds) not really fully utilizing solarwinds and/or that UDT isn't quite as utilized as it should be.

                                  • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                    rschroeder

                                    Here's my rationale for monitoring all ports:

                                    • We're a health care system; any device that attaches to the network needs to work without errors or discards.  We monitor all ports to ensure we can provide the immediate help to end users or their devices that can make the difference in a patient's life.  Whether it's a flaky built-in mini-switch in an IP phone, or a faulty jack or patch or drop cable, or incompatible settings on the end user's NIC, I like discovering the problem before the user calls the Help Desk to complain.  Sometimes they don't even call, figuring "well, I guess I have to live with this poor performance."
                                    • Watching devices attach and move around, through UDT, is useful for security and administrative purposes.  Managers in various departments are interested in the reports of this activity, and request them from us.
                                    • Seeing which ports are the bandwidth hogs really helps focus on process and flow and budgetary issues.  For example, if a clinic starts using a new voice transcription service that's very bandwidth intensive, or starts sending multi-gigabit radiological diagnostic images, often times we see reports of problems from other users at the sites.  Sometimes we can isolate the cause by comparing the schedule of complaints with a new doctor's visits to the sites.  It's amazing how correlating a new service to new complaints can light up a problem where management assumed all regional and neighborhood clinics and hospitals have the same 40 Gb/s throughput that C-Level people enjoy within the main hospitals' campus.  Once we have the data, we can use it to help Management NOT implement a new cloud-based app or process that requires more bandwidth than the site's WAN can provide.
                                    • My network team is never notified when devices are moved to new rooms, or new devices added to the network--until the site's run out of switch ports.  Monitoring all ports enables us to quickly and easily display which ones have not been used in three months or more, and then assign a tech to unpatch all the unused ones.  Or, we learn that NO ports are available, and we can install another switch, or discuss with management why their unbudgeted growth and assumptions have affected their plans.
                                    • Many of our access ports are compatible with Netflow, and rather than simply knowing what's going in/out of a Layer-3 switch, we can easily see what's going in/out of any of that switch's access ports.  It makes it simple to discover who's streaming Pandora for personal entertainment and unwittingly impacting everyone else at the site.  I recommend buying a cheap am/fm radio instead of streaming audio via our network, and I can back up my recommendations by showing the costs to the organization when people push back and demand the ability to stream personal entertainment across the LAN and WAN.
                                    • Monitoring all ports enables us to build a report for any switch and discover which of its ports haven't been used in X days/weeks/months/years.  Then we know what's safe to unpatch and repurpose for new devices, and this saves us from having to buy more switches.

                                     

                                    Yes, monitoring all ports is more expensive when your budget only allows buying the minimum amount of Solarwinds elements.  But once we saw the benefits of monitoring every port, they far outweighed the SW licenses.  Imagine putting up with a slow WAN for six months, and listening to providers' and patients' complaints--versus discovering immediately what's changed, what's eating up the WAN, and fixing it quickly--because you're monitoring the port that was down yesterday and is up today and causing problems.   I've found it's a lot cheaper to quickly identify problems--especially problems that are port-based and that impact WAN throughput, than it is to worry about monitoring costs. And now that NAM licensing allows up to 20 APEs and monitors 100,000 elements, I'm set.  Plus, that license extends to the equivalent number of volumes/nodes.  Suddenly I have all the licenses I need to monitor the 8,000 servers we have--at no extra charge to the System Admins' budget.  And plenty more licenses to use as nodes increase in numbers.

                                      • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                        mesverrum

                                        Agreed, my example is primarily focused on that tipping point between maxing out one instance and having to stand up a second one because you have crossed over into 6 digit interface monitoring and your system starts getting weird.  At that point a company would be looking at possibly doubling their licensing/maintenance costs, as well as significantly increasing the admin overhead so I would always want to be certain it was absolutely necessary before making that jump.  In cases where you are running NAM and only using ~60k elements then there's little reason not to go wild with adding them for a while.  But once someone butts up against a limit is when it's worth giving everything a thorough shake down before they spin up that second instance.

                                         

                                        And to be accurate, the 100k limit isn't a hard coded rule either, where it just suddenly stops letting you add things.  I recently worked with a big ISP on some upgrades and as part of the process we monitored over 160k interfaces for a week to feel out if there were going to be any bugs in the system.  Their system is extremely beefy and they are seasoned Solarwinds admins so they aren't afraid to get out on their own a bit.

                              • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                jm_sysadmin

                                I bought HA this morning, I will let you know how it goes.

                                 

                                Basically it boiled down to IS not wanting to lose visibility into its systems in the event of failures. We run much of the stack, NPM, NCM, NTA, SAM, SRM, IPAM, DPA, and VMAN. Losing all of them at once was not OK. We had a small fire in our primary data center recently, Orion was up the whole time, but had it been gone, we would not have understood the scope of the issues and how to prioritize work. HA discussions started shortly after.  If you want to know more about my roll out as I go, I'd be happy to talk about it. Just send me a message and I will keep you up to date.

                                2 of 2 people found this helpful
                                • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                  grizzlyferrett
                                  • Why did you get it?
                                    • Purchased at time we purchased other SolarWinds products and FoE was no longer available.
                                  • How did you convince your company it is necessary?
                                    • We needed a platform that was always on so it made sense to have HA. It works well for Windows Updates and we can re-start servers without affecting monitoring or access to the Web portal.
                                  • How satisfied are you with it?
                                    • Very satisfied
                                  • Have you set up HA for ALL your polling engines?
                                    • Yes
                                  • What would you change about it?
                                    • I would like to the platform to be available even during upgrades of the application. Having to take SolarWinds off-line to upgrade is not very Highly Available.
                                  • Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?
                                    • No, what is this?
                                  • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                    yaquaholic

                                    Why did you get it?

                                        We run monitoring as a service and when people are paying for something, they tend to get a little upset when it's not available.

                                     

                                    How did you convince your company it is necessary?

                                        Was within the requirements for the monitoring system and the budget was much less of a concern back then.

                                     

                                    How satisfied are you with it?

                                        Most of the time, very happy.

                                        We did have a few delayed fail-overs (8 minutes) and triggering hundreds of false alerts, never resolved; though thankfully never re-encountered.

                                        And product upgrades, large lumps of downtime, not good.

                                     

                                    Have you set up HA for ALL your polling engines?

                                        No only the core Orion, the APEs and AWS are not under HA (budget).

                                     

                                    What would you change about it?

                                        HA should allow for patching, hot-fixing and upgrading. Every upgrade leaves me with hours of downtime and missing data.

                                        We run another basic monitoring system just for such times, so we aren't completely blind, but those gaps in the data in the reports are still really obvious to the customers.

                                     

                                    Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?

                                        If you are referring to http://<orion>/ui/ha/summary, then yes I have. 

                                        I could with a reminder on the CLI controlling of HA though, for those times the GUI is not available.

                                    • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                      HerrDoktor

                                      i do have 2 clients with HA

                                       

                                      • Why did you get it?

                                      Requirement of 24/7 Monitoring. One of those two clients is reporting SLAs for their customers and need the system to run all the time

                                      • How did you convince your company it is necessary?

                                      Management set the requirement

                                      • How satisfied are you with it?

                                      Meh....HA broke 3 times during a failover test within the last 6 months

                                      • Have you set up HA for ALL your polling engines?

                                      YES all polling engines

                                      • What would you change about it?

                                      ability to do updates of solarwinds without downtime

                                      make it more robust

                                      • Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?

                                      yes

                                      1 of 1 people found this helpful
                                      • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                        david.botfield

                                        We have it but I'm not involved in the day to day running of the server. I just use the Solarwinds elements. We have lots of our own plus customer devices monitored and we report from Solarwinds on this. so the need for high availabilty is quite pressing. We're not on the latest versions at the moment (due this month or next) so I haven't seen the latest in this area.

                                        • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                          rschroeder

                                          There's been so MANY great comments and reports about HA in this thread--THANK YOU all!

                                           

                                          The biggest red flag to me is HA doesn't keep the monitoring & reporting going during maintenance windows.  My original intent was to ensure we had Solarwinds network status reporting 7x24, no matter the maintenance.

                                           

                                          I'd very much like this to be addressed.  And until it's corrected, the HA Marketing info should include that caveat up front, and list exactly what HA can NOT do to ensure 7x24 monitoring and reporting goes on.

                                           

                                          Right next to that, I'd like to see what issues HA addresses, how it makes things keep working--except for those specific conditions when it DOESN'T keep  reporting & monitoring going.

                                          • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                            josephllee247

                                            We don't have HA - A bone of contention for me and something I will address in time.

                                             

                                            Why not?

                                              • Cost - This will be one of the issues, of course. However, the costs of failure are much greater! (In my opinion!)
                                              • You don't have time to set it up or to maintain it? - From what I've read so far, it doesn't actually look particularly labour intensive in this regard!

                                             

                                             

                                             

                                            • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                              jlhartsock

                                              If you don't have Solarwinds HA:

                                              • Why not?The lack of support for zero downtime upgrades turned this effort into a non-starter for the conversation within my company.
                                                • Cost? The cost could be easily justified if we did not incur outages during upgrades. However, since that wasn't on the table, I couldn't provide a justification for the additional cost.
                                                • Your company doesn't believe monitoring is important enough to support (with licenses, employee setup/support hours, hardware environment, etc.)? See statement above.
                                                • You've never thought about it? I was extremely excited when I learned that SolarWinds had an HA option after I had taken over the platform last year. This was quickly extinguished when I met with the sales team and found that we would still experience the 4+ hours outages for upgrades.
                                              1 of 1 people found this helpful
                                              • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                                jlhartsock

                                                I'm going to perform another batch of upgrades within a month so hopefully I see the same experience. My last upgrade in October ran just under 4 hours when running the installer and config wizard on my primary server and 3 APEs.

                                                • Re: Let's talk about Solarwinds High Availability.  Do you want HA?  Why?  Do you have it?  Why'd you get it?  Does it work well?
                                                  wlouisharris

                                                  I found this post doing some research on HA.  We implemented HA in QA and PR.  Here is my feedback:

                                                   

                                                  • Why did you get it? Our customer wanted high availability.
                                                    • We had skepticism but the customer wanted to be able to check off the box  to say their services are redundant.
                                                  • How did you convince your company it is necessary? 
                                                    • Same as above, customer demand.
                                                  • How satisfied are you with it?
                                                    • So far the HA service works pretty good using BIND DNS.  We also have SQL HA.   We had an issue where both servers went into standby mode and we had to put in a support call to get the main poller back in active mode.
                                                    • The best use of HA is for Microsoft/operating system patching.  We can move servers to standby while we patch.
                                                    • The other use case I can think of for HA is if one of the main pollers has some type of o/s corruption.  Having a standby can make this seamless.  We had one case with NPM 12.0 where the main poller became corrupt, but with Windows Server 2016 and NPM 12.2 we have seen a lot more stability.  Even so it's an advantage.
                                                    • Make sure you implement this in a lower life cycle that closely resembles your production environment.  We had to purchase an SL100 license for QA; otherwise you only get a 30 day evaluation.  It's important for us to do our upgrades in QA first prior to PR so we need a persistent QA environment.
                                                  • Have you set up HA for ALL your polling engines?
                                                    • Yes
                                                  • What would you change about it?
                                                    • This product does not offer HA during Solarwinds upgrades or patches.  This is a huge drawback.
                                                    • We would like a better DR recovery process.  We would like to have the ability to bring a 3rd server online and replace this as a standby or active primary poller; then remove the original server.
                                                  • Have you seen the new HA administration view that shows all your standby / HA pollers, and that highlights any differences between them?
                                                    • no