
So I've got 2 servers installed in LAN failover mode. NPM 10.1 with APM3.5.
If I go into failover manager and do a switchover by clicking on Passive/secondary server and "Make Active", no problems it fails over as expected and behaves properly when failing back.
The failover timeout under Server -> Monitoring, then click the Configure Failover... button is set for 60 secs. Under Network -> Configure Pings.. I've got ping destinations set and ping interval set to 10 secs.
Under Network -> Configure Auto-switchover, it's set for 10 pings.
If i just shutdown the primary/active server, i'd think that the Secondary server would see it lost connection with the Primary ove the channel and take over, but it doesn't. Is there something else I should need to do (PS I'm gracefully shutting down the server by shutting down FoE on primary (not the server group, but just the engine) and then shutting down via windows.
From a NIC/IP perspective the primary front-end NICs are both in the same switch, same VLAN and the channel connection is a cross over cable directly between both servers.
I do NOT have a secondary management IP address on the primary front-end NICs.
Am I doing something obviously wrong here? Or should this be working and I should open up a support case?
thanks in advance,
Bill
Also, generally after i reboot the primary/active NPM server it comes up and shows everything is up and IIS shows the website started, but I get this after logging in
There was an error communicating with the Orion server.
There was no endpoint listening at net.tcp://sv-lc-dc-sowfe1:17777/orion/core/businesslayer that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
I have to use the Config Wizard to rebuild the website and then it works fine.
Hi Bill,
the failover to Secondary does not occur because it's a controlled shutdown of Primary (as you said, you are shutting it down gracefully).
In this case it's not viewed as a 'failure' that the Primary server is not responding. The other server was informed about the shutdown event. Replication / heartbeat got stopped intentionally etc.
The logic behind it is that it was the administrator's decision to bring the server down. If he wanted the other server to be active in the meantime, he would just perform a switchover (just the way you mentioned it, by selecting the server and using Make Active) prior to the shutdown.
If you want to test the failover scenario but would rather avoid ugly things like 'ripping off the power cable' to shutdown the machine all from sudden I'd suggest this:
- Suddenly unplug both network cables from the machine. Secondary should register this failure and after the set period trigger the failover.
- To connect the Primary back in, we need it to become Passive so we don't get both servers Active at the same time the moment we plug the cables back in.
(Because since we didn't really stop the Primary, only disconnected it to simulate the failure, the server remained in the Active state.)
- Simply restarting (the usual way is fine) the Primary should be sufficient (if I remember correctly - the server should not be automatically made Active after startup).
- After the restart reconnect the Primary's interfaces and start FoE. It should hook up with the Secondary and you can resume replication / perform a switchover back to make Primary Active (Note: of course synchronization check needs to finish before you can perform the Switchover).
KubaM.