Also, generally after i reboot the primary/active NPM server it comes up and shows everything is up and IIS shows the website started, but I get this after logging in
Orion Website Error
There was an error communicating with the Orion server.
There was no endpoint listening at net.tcp://sv-lc-dc-sowfe1:17777/orion/core/businesslayer that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
I have to use the Config Wizard to rebuild the website and then it works fine.
the failover to Secondary does not occur because it's a controlled shutdown of Primary (as you said, you are shutting it down gracefully).
In this case it's not viewed as a 'failure' that the Primary server is not responding. The other server was informed about the shutdown event. Replication / heartbeat got stopped intentionally etc.
The logic behind it is that it was the administrator's decision to bring the server down. If he wanted the other server to be active in the meantime, he would just perform a switchover (just the way you mentioned it, by selecting the server and using Make Active) prior to the shutdown.
If you want to test the failover scenario but would rather avoid ugly things like 'ripping off the power cable' to shutdown the machine all from sudden I'd suggest this:
- Suddenly unplug both network cables from the machine. Secondary should register this failure and after the set period trigger the failover.
- To connect the Primary back in, we need it to become Passive so we don't get both servers Active at the same time the moment we plug the cables back in.
(Because since we didn't really stop the Primary, only disconnected it to simulate the failure, the server remained in the Active state.)
- Simply restarting (the usual way is fine) the Primary should be sufficient (if I remember correctly - the server should not be automatically made Active after startup).
- After the restart reconnect the Primary's interfaces and start FoE. It should hook up with the Secondary and you can resume replication / perform a switchover back to make Primary Active (Note: of course synchronization check needs to finish before you can perform the Switchover).