Are you afraid of the dark?

Alright, so I don’t mean afraid of the dark meaning no lights on in the room.  What I do mean is being in the dark if the server Orion is on fails due to a hard drive or OS failure (for example) and you have no visibility into monitoring your network.

What do you do if this happens?  Sleep sound, my dear friend, we’ve got your covered with the new SolarWinds Orion Failover Engine.  The Failover Engine -- or FoE for short -- will monitor and protect Orion, including all of the installed modules, additional pollers and even EOC if you have this.  As you can see in the below marketecture graphic, you can monitor and protect all of your Orion machines.

image

Let me take you on a feature tour of FoE.  The FoE client, which can be run from the Orion server or loaded on your desktop, allows you to configure failover settings and monitor the current status of all your FoE installations.

 FoE Summary

The FoE monitors all of the SolarWinds Orion services, including the IIS web server on both the primary and secondary Orion servers.  Based on your preferences for each service you can define the behavior of what happens if a service stops/fails.  This allows you to define some self-healing behavior into the FoE instead of just doing a flat out failover.  For example, say the Orion Information Service stops.  Since this is the first time, let’s go ahead and re-start the service.  Hmm, nope that didn’t work, it won’t start and stopped again, maybe there is a dependency on one of the Orion services, so let’s go ahead and re-start the entire Orion application.  OK, the service didn’t start that time either, something must be wrong, lets go ahead and initiate a failover to the secondary server.

 FoE App Service Actions

The Orion FoE doesn’t just monitor the services, it also monitors key OS, server and web server statistics and just like above, you can define the behavior on first, second and third event.  For example, if the server hard drive space gets to 15%, then send an email to user A.  If it gets to 10%, then email user B.  If it gets down to 5%, then initiate a failover to the secondary server. 

 FoE App Rules

The Orion FoE also has a built in alerting engine, independent of the Orion alerting infrastructure, which will notify you on any key statistic or event occurring with the FoE or the details it is monitoring.

FoE Alerts

Let’s get into more of the specifics here on some of the details going on in the back end.  The FoE can support multiple hardware deployment configurations:

  • Physical to Physical
  • Physical to Virtual
  • Virtual to Virtual

Once you get Orion installed and setup, the FoE will create a clone of key critical configuration, registry and file system parameters which get restored to the secondary server.  Once that initial setup is complete, a set of real-time file and registry filters as you can see below, replicate any file system and registry changes to the secondary.  This way you secondary Orion server is always up to date with what is going on within the primary Orion server.

 FoE Data Replication Filters

Let’s walk through the specifics of how it works in a high availability scenario using the below diagram as our guide.

The Orion primary and secondary servers are located within the same subnet and share the same identity including IP Address.  Since two identical IP’s cannot be on the network at the same time, a packet filter is installed on the secondary public NIC so it is not broadcasting or receiving traffic.  A second NIC exists between the two servers which handles the heartbeat and real-time data replication between the primary and secondary. 

When a failover condition occurs the following sequence of events occurs:

  • the remaining Orion services on the Primary Orion server that are still running are shut down
  • the packet filter is removed off the secondary NIC and the Orion services are started on the secondary server and the secondary server is now the active server
  • the primary Orion server which is now down is now the passive server and if the server is still online, a packet filter is placed on its primary NIC

Your downtime is minimized to the time it takes for the server failover to initiate and the services to start on the secondary Orion server.  The other beautiful thing here as well, since both servers have the same IP Address, is you do not have to reconfigure your network to send any Syslog, SNMP Traps or Netflow traffic to a new IP Address.

This is just one specific use case that the Orion Failover Engine can handle.  In another post I will walk through additional use cases and scenarios.

If you need more information, please check out the product page here and you can request a demo from one of our SE’s by emailing your sales rep or clicking here.

image

  • I certainly can see a need for this in some organizations.  Frankly, in mine we rely on NPM to tell us where to concentrate our efforts to get the biggest bang for our time.

    But now that VM is a major player in our organization, and our pollers and NPM are all in VM, I see less need for the FOE here.

    If this were a major ISP or online retailer, I'd be on board for ordering it today.  As it is, my environment can live for the time it takes to spin up a new instance and migrate NPM onto it.

    I hope . . .

Thwack - Symbolize TM, R, and C