Alright, so I don’t mean afraid of the dark meaning no lights on in the room. What I do mean is being in the dark if the server Orion is on fails due to a hard drive or OS failure (for example) and you have no visibility into monitoring your network.
What do you do if this happens? Sleep sound, my dear friend, we’ve got your covered with the new SolarWinds Orion Failover Engine. The Failover Engine -- or FoE for short -- will monitor and protect Orion, including all of the installed modules, additional pollers and even EOC if you have this. As you can see in the below marketecture graphic, you can monitor and protect all of your Orion machines.
Let me take you on a feature tour of FoE. The FoE client, which can be run from the Orion server or loaded on your desktop, allows you to configure failover settings and monitor the current status of all your FoE installations.
The FoE monitors all of the SolarWinds Orion services, including the IIS web server on both the primary and secondary Orion servers. Based on your preferences for each service you can define the behavior of what happens if a service stops/fails. This allows you to define some self-healing behavior into the FoE instead of just doing a flat out failover. For example, say the Orion Information Service stops. Since this is the first time, let’s go ahead and re-start the service. Hmm, nope that didn’t work, it won’t start and stopped again, maybe there is a dependency on one of the Orion services, so let’s go ahead and re-start the entire Orion application. OK, the service didn’t start that time either, something must be wrong, lets go ahead and initiate a failover to the secondary server.
The Orion FoE doesn’t just monitor the services, it also monitors key OS, server and web server statistics and just like above, you can define the behavior on first, second and third event. For example, if the server hard drive space gets to 15%, then send an email to user A. If it gets to 10%, then email user B. If it gets down to 5%, then initiate a failover to the secondary server.
The Orion FoE also has a built in alerting engine, independent of the Orion alerting infrastructure, which will notify you on any key statistic or event occurring with the FoE or the details it is monitoring.
Let’s get into more of the specifics here on some of the details going on in the back end. The FoE can support multiple hardware deployment configurations:
- Physical to Physical
- Physical to Virtual
- Virtual to Virtual
Once you get Orion installed and setup, the FoE will create a clone of key critical configuration, registry and file system parameters which get restored to the secondary server. Once that initial setup is complete, a set of real-time file and registry filters as you can see below, replicate any file system and registry changes to the secondary. This way you secondary Orion server is always up to date with what is going on within the primary Orion server.
Let’s walk through the specifics of how it works in a high availability scenario using the below diagram as our guide.
The Orion primary and secondary servers are located within the same subnet and share the same identity including IP Address. Since two identical IP’s cannot be on the network at the same time, a packet filter is installed on the secondary public NIC so it is not broadcasting or receiving traffic. A second NIC exists between the two servers which handles the heartbeat and real-time data replication between the primary and secondary.
When a failover condition occurs the following sequence of events occurs:
- the remaining Orion services on the Primary Orion server that are still running are shut down
- the packet filter is removed off the secondary NIC and the Orion services are started on the secondary server and the secondary server is now the active server
- the primary Orion server which is now down is now the passive server and if the server is still online, a packet filter is placed on its primary NIC
Your downtime is minimized to the time it takes for the server failover to initiate and the services to start on the secondary Orion server. The other beautiful thing here as well, since both servers have the same IP Address, is you do not have to reconfigure your network to send any Syslog, SNMP Traps or Netflow traffic to a new IP Address.
This is just one specific use case that the Orion Failover Engine can handle. In another post I will walk through additional use cases and scenarios.