2 Replies Latest reply on Jan 19, 2011 1:22 PM by KubaM

    Failover engine - switchover works, but doesn't switch during actual failover?

      So I've got 2 servers installed in LAN failover mode.  NPM 10.1 with APM3.5.  

      If I go into failover manager and do a switchover by clicking on Passive/secondary server and "Make Active", no problems it fails over as expected and behaves properly when failing back.

      The failover timeout under Server -> Monitoring, then click the Configure Failover... button is set for 60 secs.  Under Network -> Configure Pings.. I've got ping destinations set and ping interval set to 10 secs.  

      Under Network -> Configure Auto-switchover, it's set for 10 pings.

       

      If i just shutdown the primary/active server, i'd think that the Secondary server would see it lost connection with the Primary ove the channel and take over, but it doesn't.  Is there something else I should need to do (PS I'm gracefully shutting down the server by shutting down FoE on primary (not the server group, but just the engine) and then shutting down via windows.

      From a NIC/IP perspective the primary front-end NICs are both in the same switch, same VLAN and the channel connection is a cross over cable directly between both servers.

      I do NOT have a secondary management IP address on the primary front-end NICs.

      Am I doing something obviously wrong here?  Or should this be working and I should open up a support case?

      thanks in advance,

      Bill 

        • Re: Failover engine - switchover works, but doesn't switch during actual failover?

          Also, generally after i reboot the primary/active NPM server it comes up and shows everything is up and IIS shows the website started, but I get this after logging in

           

          Orion Website Error

          There was an error communicating with the Orion server.

          Additional Information

              There was no  endpoint listening at  net.tcp://sv-lc-dc-sowfe1:17777/orion/core/businesslayer that could  accept the message. This is often caused by an incorrect address or SOAP  action. See InnerException, if present, for more details.

          I have to use the Config Wizard to rebuild the website and then it works fine.

          • Re: Failover engine - switchover works, but doesn't switch during actual failover?
            KubaM

            Hi Bill,

            the failover to Secondary does not occur because it's a  controlled shutdown of Primary (as you said, you are shutting it down  gracefully).

            In this case it's not viewed as a 'failure' that the  Primary server is not responding. The other server was informed about  the shutdown event. Replication / heartbeat got stopped intentionally etc.

            The  logic behind it is that it was the administrator's decision to bring  the server down. If he wanted the other server to be active in the  meantime, he would just perform a switchover (just the way you mentioned  it, by selecting the server and using Make Active) prior to the  shutdown.

             

            If you want to test the failover scenario but  would rather avoid ugly things like 'ripping off the power  cable' to shutdown the machine all from sudden I'd suggest this:

            -  Suddenly unplug both network cables from the machine. Secondary should  register this failure and after the set period trigger the failover.

            -  To connect the Primary back in, we need it to become Passive so we  don't get both servers Active at the same time the moment we plug the  cables back in.
            (Because since we didn't really stop the Primary, only disconnected it  to simulate the failure, the server remained in the Active state.)

            -  Simply restarting (the usual way is fine) the Primary should be  sufficient (if I remember correctly - the server should not be  automatically made Active after startup).
            - After the restart reconnect the Primary's interfaces and start FoE. It  should hook up with the Secondary and you can resume replication /  perform a switchover back to make Primary Active (Note: of course  synchronization check needs to finish before you can perform the  Switchover).

             

            KubaM.