I have to say, the new Orion High Availability (HA) is lovely to use, controlled by an attractive UI, it is quick, intuitive, easy to deploy and use.
However, it doesn't give true HA.
As every time we install a hotfix, patch or upgrade, we end up with ~50 minutes of loss of service.
All of the patches/upgrades require HA to be stopped and then the active Orion server to be upgraded first.
Then the standby server is patched/upgraded, whilst being kept in standby throughout the upgrade.
The problem being, it seems, is the database, which needs to be upgraded with the primary Orion server.
So why can't the procedure create a temporary database instance, which will allow the primary node to continue polling, receiving incoming traffic and most importantly alerting, whilst the secondary node is patched/upgraded.
Upgrading the standby server and the database, whilst the active server continues polling and collects the data into temporary database.
This way the original database can be upgraded, whilst the primary Orion node does it's thing into the temporary database.
Then when the primary database and then secondary Orion server are up to date, make the secondary the primary Orion server and take the over monitoring.
Upgrade the standby server, without interruption to monitoring.
Then once the upgrade is finished on the rest of the Orion servers, the collected data from the temporary database can be processed back into the primary, back filling the gaps in the data.
Process the data from the temporary database, filling in those gaps...
Now I know this is a very top level thought, to something that no doubt would involve a considerable amount of work, but the result of which would be a true HA solution.