This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Update while production environment keeps running?

Hi all,

At my company we got 100+ customers in our Solarwinds environment.

Whenever we update we announce downtime but since some of our customers rely on the data in Solarwinds we get some complaints on it.

Is there a way we can update the environment while staying available for our customers?

Can I use the HA servers while updating the main servers? or pre-stage a huge part of the update and that the rest will be done on restart to minimize down time as much as possible?

Really curious if there is a solution for this.

  • I'm stuck in that same spot. Downtime is extremely hard to get approved. Trying to get a VM environment set up so I can copy my production servers and database over into a temporary location, upgrade them and then copy them back. Hopefully someone will have a solution for this.

  • I've been trying to figure out a solution too. HA would be vastly improved if it worked like this:

    Active Primary Web Server/Active APE/Primary SQL Server keep running as normal while HA upgrades Standby Web Server/Standby APE/Secondary SQL. Runs post install check.

    HA switches over to new patched standby stack.

    Upgrades proceed on primary stack.

    Verify

    Switch back to primary stack and email report to admins.

    Easy, clean, and provides 100% uptime. 

  • If you've opened a Support Ticket for this question, you're on your way to the right answer.  

    It's excellent to ask this forum--folks who have history will be able to answer your questions.  Or people with the same question will benefit from your discussion.

    But I'd open a ticket with SW Support online first so it gets into a queue.  Once you have the ticket ID from the online case, then you can call in directly to Support, reference the ticket, and talk with someone (perhaps) sooner than just waiting for them to respond to your online case.

    Remember to include your complete environment / Solarwinds products descriptions, HA configuration overview, modules (NPM, NTA, DPA, SAM, etc.) and any information about APE's.

    Make special note that you're looking for a hitless upgrade to your HA environment in order to best serve your customers who rely on it.

    Finally, even if the answer is "We're sorry--we can't do a hitless upgrade in an HA environment (yet)", ask the question:  If a Hitless Upgrade in an HA environment is not yet possible, what is the expected downtime duration I should communicate to my customers?

    You'll want a clear process defined, with outage impact for each product and APE, for every stage of the upgrade.

    Or, you might open discussions with your team and with your clients to talk about what information is most critical to have during a maintenance window.  If a customer's critical needs boil down to something simple, like knowing that ten specific servers remain online, you could easily set up a temporary work around for that using Enhanced Ping from the Engineers' Tool Set (or any other low-level ping utility that can be configured to send alerts when X number of packets in a row are lost).

    Once you have an answer from SW Support, please post it here so others can learn, comment, and improve the process.

    Swift Packets!

    Rick Schroeder

  • The HA trick is definitely part of this. Yes - you could spin up new servers for this. It would also require a SQL HA cluster. You break the SQL cluster leaving the temp HA side running, while you upgrade the normal environment. Then clean up HA and SQL cluster when you are done... It is not the prettiest thing, but it should work. 

    Note this does not accommodate new IP addresses if you have firewalls/vpns etc setup. Downtime would need to occur if you have to re-IP post upgrade. 

  • That's a great question and I would expect HA to fully automate that process. (I don't believe that it currently does, but I would expect that as an end goal)

    For Example

    Palo Alto firewalls (just an example - I know others do the same thing) have an upgrade button that will automatically update the standby device, reboot the standby device once the device is back up it then becomes the active device and the original active device is updated, rebooted and returned to the active status. I would think that the same thing could be done in SolarWinds.

  • One of our other monitoring tools is similar, we just force failover to the second datacenter while upgrading the first. Then fail back and upgrade the second. No downtime, plus once a side is down we can run the install script concurrently on all the downed servers instead of one at a time. 

  • Thanks for the reply,

    I'm going to open a case for it to have a definite answer. as of now i always plan 3 hour down-time during business hours. we got some alternate monitoring solutions, but those are not available for our customers to use.

    Doing it outside business hours won't be a good solution for me either since since of our standby duty relies on the alerting.

    At least the good part is that we can already pre-download and distribute the update.

    Was hoping on some Thwack members that maybe had this question and found a way to counter it.

    I'll give an update on this thread when I get a response on my support case.

  • Support sent a reply:


    As of now, it is not possible to update SolarWinds without downtime. We have guides for upgrade with minimal downtime but unfortunately, there is no option right now for zero downtime.

    Unfortunate that there is not an option for this right now. Nevertheless maybe we can bring it up as an idea?