Showing results for 
Search instead for 
Did you mean: 
Create Post

Guest blog: Failover thoughts (A post from our Chief Architect)

Level 17

Hi, I’m Joel Dolisy, chief architect for SolarWinds; over the next few months I’ll try to provide insight of what’s happening in the engineering side of the house.


As Josh mentioned a few weeks ago I’m researching the types of failover scenarios we need to support in Orion, and how those relate to the existing hot-standby polling engine strategy. As usual, the complexity of the solution depends on the level of failover support that the product needs to operate under. Claiming that a solution is fail safe is actually a complicated problem to solve and involves the interaction of different players such as the OS, the hardware (computer and network infrastructure), the storage subsystem and the application.

As part of the research, I’m considering if leveraging the built-in clustering services offered by the OS (such as Windows Server 2003) would be something that would be acceptable for our customers. Leveraging those services has lots of advantages from an engineering standpoint as we can concentrate on building features that are actually useful for you instead of having to write more plumbing, but on the other side such a solution comes with its own set of constraints. Every time I encounter additional constraints I always have to weigh the benefits of the proposed solution against its additional configuration burden.
One of the big assets of our products is their simplicity of installation/configuration and that they don’t require days of planning before a rollout like a lot of our larger competitors. Every time something has the potential of disturbing this ease of use/configuration I need to make sure that I understand under which circumstances those additional configuration steps will impact our customers.
I feel that customers asking for failover capabilities are ready to take additional configuration steps required by such configurations, in fact, most of those users will already have SQL Server clustered and requiring the OS clustering service in this case should not be an issue.

At this point I would really like to hear from you what your thoughts/comments/requirements/experience on failover scenario are and how do you expect us to support them.



Level 15


It's amazing how this takes me back so many years, when our SA's thought Microsoft could provide a reliable dual-NIC HA solution, or how they believed that LACP was the be all and end all. 

Now, with UCS and VM and port-channels to those technologies, life's become so much less unpredictable.

Thanks for keeping these old posts--there's much to be learned from the past.  Keeps us from re-inventing the wheel when we remember that those who fail to study history are doomed to repeat it.  ;^)


in some shops VM appliances won't go over well...

Agreed--VM is NOT the panacea some folks may wish for.  Depending on hardware budget and application design/compatibility, I've seen application performance take a big nose dive when moved to VM.  And some suites of applications, depending on how busy and demanding they may be, have VM specs that require dedicating an entire host to one Application.  Perhaps that might be OK if you've money to spare and hardware slots to devote to dedicated VM blades, but we try to be good stewards of the budget.  If there are specific apps that aren't compatible with VM, or that require more resources (RAM and CPU) than are available in a VM environment, stand alone servers won't all be moved into VM.  Thus showing the fallacy in server manufacturers' advertisements showing entire data centers being compressed down into one server.

Remember this one?

Of course, none of us would put all our eggs into just one basket . . .


It's not just may not be able to patch the OS of a VM appliance which then opens up whole in your overall security strategy and findings from audit.

About the Author
I've been in IT for almost 30 years beginning in the stockroom and working my way up through operations to help build and develop the Automated Operations Team at Radioshack before Enterprise Management was a cool thing. Working in several different shops over the years has exposed me to a number of different challenges regarding monitoring and alerting. I am a amateur radio operator, Skywarn spotter for the National Weather Service, and a volunteer firefighter in a rural county just West of Fort Worth.