High Availability 2.0 provides the first peek into supporting redundancy for Orion across subnets. This was previously referred to WAN deployment or Disaster Recovery with the Failover Engine, but under High Availability we refer to this simply as a multi-subnet failover configuration. In other words, this provides the same automated, near instantaneous, failover and recovery mechanisms as High Availability does in its first release, but extends that functionality to support pollers spread across different subnets. Those could be different sites, a dedicated disaster recovery location, or possibly even the cloud.
When installing the Primary Orion server you will follow the normal 'Advanced' installation process that you would for any other Orion product. Ensure not to select the 'Express' install option during installation, as a separate server running Microsoft SQL 2012 or later is required. When the Configuration Wizard runs you will be prompted to provide the Username, Password, and IP address of the SQL server you will be using for the installation.
Once the primary server is up and running using the NPM 12.2 installer, you will need to perform a similar installation on the secondary server using the separate High Availability installer which can be downloaded from within the Orion web interface under [Settings -> All Settings -> High Availability Deployment Summary -> Setup A New HA Server -> Get Started Setting Up a Server -> Download Installer Now].
Download the High Availability Secondary Server Installer
Next, execute the installation by double clicking on the "SolarWinds-Orion-Installer.exe" downloaded or copied to the secondary server. Enter the IP address of fully qualified domain name (FQDN) of your main Orion server, along with 'Admin' or equivalent credentials used to log into the Orion web interface and click 'Next'. On the following step of the Wizard, select the additional server role you wish to install. Since this will be a High Availability Backup for the main Orion server, select 'Backup Server for Main Server Protection' and click 'Next'.
Enter IP of Main Orion Server & Provide 'Admin' Credentials
Select Server Role to Install
Once the Installation completes the Configuration Wizard will be started. When prompted to provide information regarding the SQL server database, ensure you utilize the same SQL instance and SQL database that was chosen for the primary Orion server.
The following video, while arguably boring to watch, demonstrates the secondary server installation process.
As soon as both the primary and secondary servers are installed, return to the Orion web interface under [Settings -> All Settings -> High Availability Deployment Summary]. There you will be able to join the two servers into a multi-subnet failover pool.
|Click 'Set up High Availability Pool"|
|Enter a Virtual Hostname and click 'Next'|
|Select your DNS Server Type|
Enter the IP Address of your DNS Server, the DNS Zone (E.G. solarwinds.com) and administrative credentials to the DNS server to create the shared virtual hostname
If you are running BIND DNS, enter the IP address of your BIND DNS server, the DNS Zone, your TSIG secret key name, and the TSIG shared secret key value.
Once complete, review the summary and click "Create Pool"
When done, you will have pooled two Orion servers together across multiple subnets into a redundant, high availability pool
The following short video walks through this process in under a minute.
pratikmehta003 , there's quite a bit going on here between both pool members reporting a 'down' status to the HA licenses not being assigned. Note that HA licenses will be consumed automatically if they are unused HA licenses available. License assignment is simply a mechanism for moving HA licenses around between pools if needed. It sounds more likely to me that the issue you are having is orphaned entries in your HA database table. I would suggest you ask support to check there first.
Yes I too suspect the same and have been telling the support engineer about
it but he is not listening...
I did try to reboot the secondary today and ran config wizard but didn't
help much... I got an error for license business layer for secondary...
Primary is showing green now, earlier the HA service was not running and
that's why it was red..
I will dig in from my side on DB.. any other recommendations from.your side
or any KBs that I can follow?
We are trying to configure HA between our polling engines. BIND is out. We do have some microsoft DNS but our 'DNS people' in the company insist on not using WMI, "WMI queries are expensive and nowadays there are much better ways of achieving this, e.g. AD Web Services".
Im a networking guy, so I dont know much about this. Anyone have any ideas about how we could set this up?
We do have F5 GTMs which do DNS..however they do "health checks" to see which sides are up/down. As both the active and standby nodes return the HTTPs login page (with the same responses), a health check from an external source to decide which server is active/standby doesnt work. I'd need to develop some form of inteliigent health check, or otherwise follow this ADWS method that I know nothing about.
Any help much appreciated!
The SolarWinds information service only runs on the 'Active' member and should be used as your health check when front-ending HA with a load balancer. The Information Service port runs on TCP 17777.
Thanks. I just tried a telnet test to both Active and Standby on 17778. They both give no response (but not closed). Checked our firewalls and it just sees the connection "Aged out". No response is fine but they are both giving me the same (no) response, unless there is a HTTP Get or something I could perform to expect something back for a health check..
Interesting, look at what I have in my env:
netstat -ona | find ":17778"
TCP 0.0.0.0:17778 0.0.0.0:0 LISTENING 4
TCP 10.160.198.163:17778 10.160.198.99:49824 ESTABLISHED 4
TCP 10.160.198.163:17778 10.160.198.99:49826 ESTABLISHED 4
TCP [::]:17778 [::]:0 LISTENING 4
netstat -ona | find ":17778"
TCP 10.160.198.99:49824 10.160.198.163:17778 ESTABLISHED 2088
TCP 10.160.198.99:49826 10.160.198.163:17778 ESTABLISHED 2088
For me, only active one would respond to "telnet" on 17778. Also using TCPING (from Eli => tcping.exe - ping over a tcp connection )
c:\TOOLS>tcping 10.160.198.163 17778
Probing 10.160.198.163:17778/tcp - Port is open - time=1.987ms
Probing 10.160.198.163:17778/tcp - Port is open - time=1.186ms
Probing 10.160.198.163:17778/tcp - Port is open - time=1.172ms
Probing 10.160.198.163:17778/tcp - Port is open - time=1.163ms
Ping statistics for 10.160.198.163:17778
4 probes sent.
4 successful, 0 failed.
Approximate trip times in milli-seconds:
Minimum = 1.163ms, Maximum = 1.987ms, Average = 1.377ms
c:\TOOLS>tcping 10.160.198.99 17778
Probing 10.160.198.99:17778/tcp - No response - time=2001.450ms
Probing 10.160.198.99:17778/tcp - No response - time=2000.853ms
Probing 10.160.198.99:17778/tcp - No response - time=2000.448ms
Probing 10.160.198.99:17778/tcp - No response - time=2001.178ms
Ping statistics for 10.160.198.99:17778
4 probes sent.
0 successful, 4 failed.
Was unable to connect, cannot provide trip statistics.
Thats brilliant, thanks a lot. After some digging it looks like TCP/17778 was being blocked by Windows Server firewall locally. No idea why I didn't get other issues. Anyhow, I have successfully opened this, and set up the F5 to monitor tcp/17778 as a health check, works perfectly.
Thanks. Unfortunatly if I perform a test to that port, both Active and Standby servers appear open on that port.
telnet 10.x.x.x 17777
Connected to xxxxxx.abc.com.
Escape character is '^]'.
^CConnection closed by foreign host.
telnet 10.y.y.y 17777
Connected to yyyyyyy.abc.com.
Escape character is '^]'.
Is this not expected behaviour? Ips masked, but 10.x.x.x is our Active node and 10.y.y.y is our standby node..
Thanks in advance.
We are planning to setup Soalrwinds environment in Azure including HA component. In the current plan both primary and secondary polling engines will be in same availability set.
I have gone through the responses you made. I have some doubts regarding those relating to my environment. Please help me to get a clarification on those.
1. If we use virtual hostname for servers build in same subnet, is there any application related issues expected.
2. Why VIP cannot be used in cloud environment (Sorry, im not a cloud expert )
3. Read that F5 load balancer is used by some customers. Can we use Azure load balancer here, if yes then do we need DNS entries to be created for virtual hostname.
Can you please clarify the licensing when HA is being used - Do I need to buy 2 NPM licenses and one HA license or do I just need one NPM and one HA license? My initial understanding was that I only need the HA license as the topology is Active / Passive and the second node is only used when the primary node is down.
You only need 1 NPM license (or 1 of any other Solarwinds module license you plan on using with HA) and 1 HA license (per pool). The HA license makes the 2nd poller, in the HA pool, standby. So in my environment, I have 1 MPE, and 2 APEs. I have 1 license for each module we use, and 2 APE licenses. Because I have 3 polling engines, I have 3 HA pools, so therefore 3 HA license, but still only 1 license for NPM, SAM, etc.
Hope that helps!
Does an HA license also make that HA server act as an Orion Web front end server?
I was told today that my second HA box is a replica of the Primary polling engine, and will automatically become a web server with IIS on it. This doesn't seem plausible to me.
I'm thinking I need a web server license for HA, correct?
We want to provide redundancy obviously for the web front end. We need more than just an HA license I'm assuming.
The secondary server in a main HA pool will serve the Orion web interface when a failover occurs and it becomes the 'active' member. The secondary member's web interface is only available when its the 'active' member.
I am planning to deploy the HA pair in Azure using availability sets and i can't use VIP in azure, is it possible to use Multi Subnet HA if both the nodes are in same subnet because i wanted to use the DNS functionality to failover and not the VIP method?
You will need to have each Orion server in their own separate subnet if you plan to use a virtual hostname exclusively. Unfortunately, Virtual IPs will not work in a cloud environment and they are required for same-subnet deployments of HA.
We are looking at developing the HA Solution and this page is a great place to start.
aLTeReGo - The level of "You Rock!" you have obtained has exceed all known measures of numeric functioning and accounting!!!
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.