7 Replies Latest reply on Jul 24, 2017 7:10 PM by aLTeReGo

    Torture Testing High Availability

    aLTeReGo

      A few of you have asked for test failover scenarios for High Availability you can try using the beta. Below I outline a few that can be tested in Beta 1. There will be additional testing scenarios added once Beta 2 is made available.

       

       

      Test #1 - Network Connectivity Failure

       

      What to do: Unplug Network Cable or Disable Network Interface on the 'Active' member in the pool

      What to Expect: Failover should occur within a minute or two of disconnecting the server from the network. The server which was previously in 'Standby' mode should now be 'Active'.

       

      Connectivity Failure.pngDisable Windows Adapter.png

       

      Note: Ensure you re-enable the network interface or reconnect the network cable before moving on to test #2.

       

      Test #2 - Power Failure

       

      What to do: Pull Power Plug or Forcibly Power Off The Virtual Machine of the 'Active' member in the pool.

      Alternative Test Path: Crash Windows with the Blue Screen of Death

      What to Expect: Failover should occur within a minute or two of powering off the server from the network. The server which was previously in 'Standby' mode should now be 'Active'.

       

      Power Failure.pngPower Off.png

       

      Note: Be sure to power back on the server you shut down prior to moving on to test #3

       

      Test #3 - Application Failure

       

      What to do: Forcibly terminate critical Orion processes via Task Manager or Stop Orion Services on the 'Active' member in the pool.

      What to Expect: Failover should occur within a minute or two of stopping Orion services or terminating a critical Orion process. The server which was previously in 'Standby' mode should now be 'Active'.

      Terminate Process.png

      Stop Service.png

       

       

       

       

      Test #4 - Force a Manual Failover

       

      From the 'Orion Deployment Summary' located under [Settings -> All Settings -> High Availability Deployment Summary] select the Pool. From the right panel, click the 'Commands' drop down and select 'Force Failover'.

      Fore Failover.png

       

      Test #5 - Catastrophic Database Failure

      What to do: Power off, disconnect, or otherwise cause the database server to become inaccessible to both the primary and secondary servers in the HA pool.

      What to Expect: When this occurs both members are in isolation mode, meaning neither can't communicate with one another or with the database. In this situation failover does not occur because neither member is better off than the other. Polling remains on the active member which queues its results until database connectivity is restored. The passive member remains in this state since it is neither able to communicate with the database or with the active pool member.

      Catastrophic Database Failure.png