A Disaster Recovery Story
Once upon a time there was a product called Fail Over Engine... This product was to allow companies a real disaster recovery option. The community that used it was less than pleased. Even though this product functioned (debatable) in a way which allowed IT monitoring architects check off a box that said a DR plan was implemented there was always an asterisk involved(This made the SolarWinds Subject Matter Expert (SME) a little uneasy- or at least me).
What could be done?! There must be a better solution... Well folks I am here today to tell you a little story of a SolarWinds evangelist who never gave up having faith in the company (SolarWinds) and the software (Orion and all Modules) he championed for the last 4 years of his 19 year IT career.... OK enough of all that.
Here's the deal. SolarWinds came out with HA for Orion and it is a marvel. I absolutely love* it and here is why (Notice the asterisk?.. Can't escape it yet but hot damn it is a great product. I'll explain a little later.)
These are some of the reasons
1.) It installs ridiculously easy and configures just as easily. Creating Pools, sending HA alerts, and removing pools are a cinch.
2.) It allowed me to never have to install FOE again. (It was ok for its time but there were a bunch of configuration gotchas and seemed like every time myself or any of my coworkers installed it was a unique first time)
3.) Clean and very cool UI (I mean really cool and there is some Rubik's cube looking thing that turns as the page loads).
4.) Failing over happens very fast. I have run tests and 100% of the time it takes 4 minutes for the other server in the pool to become fully active. NOTE: This applies to the Orion Core or the APEs
I am attaching a Visio diagram which shows my environment. NOTE: all of the IP Addresses and names have been changed to protect the innocent.
Environment Summary (12 Servers Total)
|Site 1 - Server 1||Site 2 - Server 2||URL / DNS Alias (Hosted on Netscalers)|
N / A
|N / A|
Additional Web Server (Active)
Additional Web Server (Active)
Orion Core-01HA01 (Active) 10.70.70.110
Orion Core-01HA02 (Passive for HA) 10.70.70.210
Orion Core 01HA Alias (For Traps - auto-DNS Failover)
APE-01HA01 (Active) 10.70.70.111
APE-01HA02 (Passive for HA) 10.70.70.211
SolarWinds APE 01HA Alias (For Traps - auto-DNS Failover)
APE-02HA01 (Active) 10.70.70.112
APE-02HA02 (Passive for HA) 10.70.70.212
SolarWinds APE 02HA Alias (For Traps - auto-DNS Failover)
APE-03HA01 (Active) 10.70.70.113
APE-03HA02 (Passive for HA) 10.70.70.213
SolarWinds APE 03HA Alias (For Traps - auto-DNS Failover)
|N / A||N / A|
(Passive for Log Shipping Fail Over)
SQL Server Alias (For auto-DNS Failover and OrionDB Connection)
Up to this point the environment seems standard. There is one exception to this. The HA IP Addresses are non-routable (I requested a network that wasn't routable from my network engineers and they gave me the 10.70.70.0/24 network. I chose all the IP Addresses to sync up on the last 2 digits example. x.x.x.10(Virtual IP), x.x.x.110(Site 1); x.x.x.210(Site 2) . The IP Addresses are not in DNS and I created a secondary Interface on the node as SolarWinds_HA. I also tested this as an additional IP Address on the primary interface and it works too. Here is the configuration broken down for the Orion Core (if Site 1 is active)
|Interface 0 Primary IP Address||10.12.15.72|
|Interface 1 Primary IP Address||10.70.70.110||10.70.70.210|
Interface 1 Additional IP Address
VIP Automatically Assigned via SolarWinds HA
|10.70.70.10||The VIP is not assigned until it is active|
To sum this up we have been running this configuration for a few months and it works exceptionally well. We have placed a tick in the [HA across the WAN for Monitoring]. The only reservation management has is they would like to have this become a supported configuration. I trust SolarWinds is testing an HA over the WAN additonal solution but until it is available this works and works well!
If you have any questions please feel free to reach out and I will help if I can.