Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

Was it something I said....HA members not failing over

Had a network blip last night that caused split brain in our HA (again).....working with TAC at the moment, but I was poking around with SWQL to confirm something and found this...

Beginning to take this personally.

Find more posts tagged with

orion

HCO

Comments

mesverrum

I always tell my clients that I have had more downtime from fixing split brain scenarios for clients than I've ever seen from actual single server failures. Worst case I can fix a broken single instance pretty quick, often before any users realize anything happened. Fixing a problem and then sorting out all the reg keys and db entries for split brain to get it running again almost always adds another 15-20 minutes to the recovery time. If I had to live with an HA environment full time I'd probably have saved a set of scripts to set all the keys and db entries for HA back to my "default" position and i would fire that off immediately upon any failed HA handoffs.

Kita

@danbert If you can share the case number, we can help validate the issue. There are several customer running into this issue in 2024.2 and uncovered a potential fix for these scenarios in 2024.2.1. This was added to the release notes as well.

danbert

Hi @Kita here is the case #01734665. The engineer fixed the HA, but it went split brain again shortly after. We're aware of the 2024.2.1 fix, but need a stable platform first. Plus we're dealing with approximately 60 Nodes showing as down with false positive alerts that we're waiting for an engineer on potentially related to the first issue.

danbert

Ha....we'll be building out those scripts.....Solarwinds does not do HA well...

Kita

@danbert The fix in 2024.2.1 is to address the split brain issue. We will dig into why the false positive are occurring

danbert

Awesome, thanks @Kita . Don't hesitate to have the engineer reach out to us, as we're all on a call trying to repair these issues without the engineer at the moment.

adam.beedell

15-20?! Arg you're a machine!