Hi, I’m Joel Dolisy, chief architect for SolarWinds; over the next few months I’ll try to provide insight of what’s happening in the engineering side of the house.


As Josh mentioned a few weeks ago I’m researching the types of failover scenarios we need to support in Orion, and how those relate to the existing hot-standby polling engine strategy. As usual, the complexity of the solution depends on the level of failover support that the product needs to operate under. Claiming that a solution is fail safe is actually a complicated problem to solve and involves the interaction of different players such as the OS, the hardware (computer and network infrastructure), the storage subsystem and the application.

As part of the research, I’m considering if leveraging the built-in clustering services offered by the OS (such as Windows Server 2003) would be something that would be acceptable for our customers. Leveraging those services has lots of advantages from an engineering standpoint as we can concentrate on building features that are actually useful for you instead of having to write more plumbing, but on the other side such a solution comes with its own set of constraints. Every time I encounter additional constraints I always have to weigh the benefits of the proposed solution against its additional configuration burden.
One of the big assets of our products is their simplicity of installation/configuration and that they don’t require days of planning before a rollout like a lot of our larger competitors. Every time something has the potential of disturbing this ease of use/configuration I need to make sure that I understand under which circumstances those additional configuration steps will impact our customers.
I feel that customers asking for failover capabilities are ready to take additional configuration steps required by such configurations, in fact, most of those users will already have SQL Server clustered and requiring the OS clustering service in this case should not be an issue.

At this point I would really like to hear from you what your thoughts/comments/requirements/experience on failover scenario are and how do you expect us to support them.