cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Product Manager
Product Manager

Multi-Subnet Failover (WAN/DR) Deployment

High Availability 2.0 provides the first peek into supporting redundancy for Orion across subnets. This was previously referred to WAN deployment or Disaster Recovery with the Failover Engine, but under High Availability we refer to this simply as a multi-subnet failover configuration. In other words, this provides the same automated, near instantaneous, failover and recovery mechanisms as High Availability does in its first release, but extends that functionality to support pollers spread across different subnets. Those could be different sites, a dedicated disaster recovery location, or possibly even the cloud.

HIGH AVAILABILITY REQUIREMENTS

  • High Availability 2.0 Installer (Built-in and located under [Settings -> All Settings -> High Availability Deployment Summary -> Setup A New HA Server -> Get Started Setting Up a Server -> Download Installer Now]
    • High Availability 2.0  Can be used only with product modules running with Orion Core 2017.3
  • Two servers running Windows Server 2012 or later
    • Both primary and secondary servers must reside on different subnets for multi-subnet failover
      • Primary and secondary servers which reside on the same subnet can be used for same-subnet failover using a traditional VIP
    • Windows or BIND DNS Server credentials for configuring the virtual hostname
    • Windows Server OS version, edition, or bitness need not match between primary and secondary servers.
    • Primary and secondary servers may be optionally joined to a Windows domain
    • High Availability supports the following configurations of primary and secondary servers.
      • Physical to Physical
      • Physical to Virtual
      • Virtual to Virtual
      • Virtual to Physical
  • A separate server running SQL 2012 or later.
    • This server does not need to reside on the same subnet as either the primary and secondary Orion server
    • Any Microsoft SQL edition may be used, including SQL Express
    • Bonus points for utilizing a SQL Cluster

pastedImage_4.png

PRIMARY SERVER INSTALL

When installing the Primary Orion server you will follow the normal 'Advanced' installation process that you would for any other Orion product. Ensure not to select the 'Express' install option during installation, as a separate server running Microsoft SQL 2012 or later is required. When the Configuration Wizard runs you will be prompted to provide the Username, Password, and IP address of the SQL server you will be using for the installation.

SECONDARY SERVER INSTALL

Once the primary server is up and running using the NPM 12.2 installer, you will need to perform a similar installation on the secondary server using the separate High Availability installer which can be downloaded from within the Orion web interface under [Settings -> All Settings -> High Availability Deployment Summary -> Setup A New HA Server -> Get Started Setting Up a Server -> Download Installer Now].

Download the High Availability Secondary Server Installer

All Settings.png
High Availability Settings.png
High Availability Deployment Summary.png
pastedImage_7.png
Evaluate High Availability.png
pastedImage_9.png

Next, execute the installation by double clicking on the "SolarWinds-Orion-Installer.exe" downloaded or copied to the secondary server.  Enter the IP address of fully qualified domain name (FQDN) of your main Orion server, along with 'Admin' or equivalent credentials used to log into the Orion web interface and click 'Next'. On the following step of the Wizard, select the additional server role you wish to install. Since this will be a High Availability Backup for the main Orion server, select 'Backup Server for Main Server Protection' and click 'Next'.

Enter IP of Main Orion Server & Provide 'Admin' Credentials

Select Server Role to Install

pastedImage_0.pngpastedImage_1.png

Once the Installation completes the Configuration Wizard will be started. When prompted to provide information regarding the SQL server database, ensure you utilize the same SQL instance and SQL database that was chosen for the primary Orion server.

The following video, while arguably boring to watch, demonstrates the secondary server installation process.

CLUSTER POOL CREATION

As soon as both the primary and secondary servers are installed, return to the Orion web interface under [Settings -> All Settings -> High Availability Deployment Summary]. There you will be able to join the two servers into a multi-subnet failover pool.

Click 'Set up High Availability Pool"
Setup High Availability Pool.png
Enter a Virtual Hostname and click 'Next'
Pool Properties.png
Select your DNS Server Type
DNS Settings.png
Microsoft DNS

Enter the IP Address of your DNS Server, the DNS Zone (E.G. solarwinds.com) and administrative credentials to the DNS server to create the shared virtual hostname

Microsoft DNS.png

BIND DNS

If you are running BIND DNS, enter the IP address of your BIND DNS server, the DNS Zone, your TSIG secret key name, and the TSIG shared secret key value.

BIND.png

Summary

Once complete, review the summary and click "Create Pool"

Summary.png

Success

When done, you will have pooled two Orion servers together across multiple subnets into a redundant, high availability pool

Setup Complete.png

The following short video walks through this process in under a minute.

110 Replies

pratikmehta003 , there's quite a bit going on here between both pool members reporting a 'down' status to the HA licenses not being assigned. Note that HA licenses will be consumed automatically if they are unused HA licenses available. License assignment is simply a mechanism for moving HA licenses around between pools if needed. It sounds more likely to me that the issue you are having is orphaned entries in your HA database table. I would suggest you ask support to check there first.

0 Kudos

Yes I too suspect the same and have been telling the support engineer about

it but he is not listening...

I did try to reboot the secondary today and ran config wizard but didn't

help much... I got an error for license business layer for secondary...

Primary is showing green now, earlier the HA service was not running and

that's why it was red..

I will dig in from my side on DB.. any other recommendations from.your side

or any KBs that I can follow?

0 Kudos

What is your case number? I'll do my best to look it for you.

0 Kudos

Here is the case number: 00173972

0 Kudos
Level 9

We are trying to configure HA between our polling engines. BIND is out. We do have some microsoft DNS but our 'DNS people' in the company insist on not using WMI, "WMI queries are expensive and nowadays there are much better ways of achieving this, e.g. AD Web Services".

Im a networking guy, so I dont know much about this. Anyone have any ideas about how we could set this up?

We do have F5 GTMs which do DNS..however they do "health checks" to see which sides are up/down. As both the active and standby nodes return the HTTPs login page (with the same responses), a health check from an external source to decide which server is active/standby doesnt work. I'd need to develop some form of inteliigent health check, or otherwise follow this ADWS method that I know nothing about.

Any help much appreciated!

0 Kudos

The SolarWinds information service only runs on the 'Active' member and should be used as your health check when front-ending HA with a load balancer. The Information Service port runs on TCP 17777.

I think in HA env better option would be TCP/17778 instead

0 Kudos

Thanks. I just tried a telnet test to both Active and Standby on 17778. They both give no response (but not closed). Checked our firewalls and it just sees the connection "Aged out". No response is fine but they are both giving me the same (no) response, unless there is a HTTP Get or something I could perform to expect something back for a health check..

0 Kudos

Interesting, look at what I have in my env:

ACTIVE:

netstat -ona | find ":17778"

  TCP    0.0.0.0:17778          0.0.0.0:0              LISTENING       4

  TCP    10.160.198.163:17778   10.160.198.99:49824    ESTABLISHED     4

  TCP    10.160.198.163:17778   10.160.198.99:49826    ESTABLISHED     4

  TCP    [::]:17778             [::]:0                 LISTENING       4

PASSIVE:

netstat -ona | find ":17778"

  TCP    10.160.198.99:49824    10.160.198.163:17778   ESTABLISHED     2088

  TCP    10.160.198.99:49826    10.160.198.163:17778   ESTABLISHED     2088

For me, only active one would respond to "telnet" on 17778. Also using TCPING (from Eli => tcping.exe - ping over a tcp connection )

ACTIVE:

c:\TOOLS>tcping 10.160.198.163 17778

Probing 10.160.198.163:17778/tcp - Port is open - time=1.987ms

Probing 10.160.198.163:17778/tcp - Port is open - time=1.186ms

Probing 10.160.198.163:17778/tcp - Port is open - time=1.172ms

Probing 10.160.198.163:17778/tcp - Port is open - time=1.163ms

Ping statistics for 10.160.198.163:17778

     4 probes sent.

     4 successful, 0 failed.

Approximate trip times in milli-seconds:

     Minimum = 1.163ms, Maximum = 1.987ms, Average = 1.377ms

PASSIVE:

c:\TOOLS>tcping 10.160.198.99 17778

Probing 10.160.198.99:17778/tcp - No response - time=2001.450ms

Probing 10.160.198.99:17778/tcp - No response - time=2000.853ms

Probing 10.160.198.99:17778/tcp - No response - time=2000.448ms

Probing 10.160.198.99:17778/tcp - No response - time=2001.178ms

Ping statistics for 10.160.198.99:17778

     4 probes sent.

     0 successful, 4 failed.

Was unable to connect, cannot provide trip statistics.

Thats brilliant, thanks a lot. After some digging it looks like TCP/17778 was being blocked by Windows Server firewall locally. No idea why I didn't get other issues. Anyhow, I have successfully opened this, and set up the F5 to monitor tcp/17778 as a health check, works perfectly.

Thanks. Unfortunatly if I perform a test to that port, both Active and Standby servers appear open on that port.

telnet 10.x.x.x 17777

Trying 10.x.x.x...

Connected to xxxxxx.abc.com.

Escape character is '^]'.

^CConnection closed by foreign host.

telnet 10.y.y.y 17777

Trying 10.y.y.y...

Connected to yyyyyyy.abc.com.

Escape character is '^]'.

Is this not expected behaviour? Ips masked, but 10.x.x.x is our Active node and 10.y.y.y is our standby node..

Thanks in advance.

0 Kudos
Level 11

We are planning to setup Soalrwinds environment in Azure including HA component. In the current plan both primary and secondary polling engines will be in same availability set.

I have gone through the responses you made. I have some doubts regarding those relating to my environment. Please help me to get a clarification on those.

1. If we use virtual hostname for servers build in same subnet, is there any application related issues expected.

2. Why VIP cannot be used in cloud environment (Sorry, im not a cloud expert )

3. Read that F5 load balancer is used by some customers. Can we use Azure load balancer here, if yes then do we need DNS entries to be created for virtual hostname.

0 Kudos

Can you please clarify the licensing when HA is being used - Do I need to buy 2 NPM licenses and one HA license or do I just need one NPM and one HA license? My initial understanding was that I only need the HA license as the topology is Active / Passive and the second node is only used when the primary node is down.

Regards,

Felix

0 Kudos

You only need 1 NPM license (or 1 of any other Solarwinds module license you plan on using with HA) and 1 HA license (per pool). The HA license makes the 2nd poller, in the HA pool, standby. So in my environment, I have 1 MPE, and 2 APEs. I have 1 license for each module we use, and 2 APE licenses. Because I have 3 polling engines, I have 3 HA pools, so therefore 3 HA license, but still only 1 license for NPM, SAM, etc.

Hope that helps!

0 Kudos

Awesome! Thanks for the speedy response. I thought that was the case but

never hurts to check

Regards,

Felix

On Tue, Aug 7, 2018 at 3:55 PM, pheonixnyte

0 Kudos
Level 13

Does an HA license also make that HA server act as an Orion Web front end server?

I was told today that my second HA box is a replica of the Primary polling engine, and will automatically become a web server with IIS on it. This doesn't seem plausible to me.

I'm thinking I need a web server license for HA, correct?

We want to provide redundancy obviously for the web front end. We need more than just an HA license I'm assuming.

Thanks

0 Kudos

The secondary server in a main HA pool will serve the Orion web interface when a failover occurs and it becomes the 'active' member. The secondary member's web interface is only available when its the 'active' member.

0 Kudos

Thank you kind sir!

0 Kudos
Level 8

I am planning to deploy the HA pair in Azure using availability sets and i can't use VIP in azure, is it possible to use Multi Subnet HA if both the nodes are in same subnet because i wanted to use the DNS functionality to failover and not the VIP method?

0 Kudos

You will need to have each Orion server in their own separate subnet if you plan to use a virtual hostname exclusively. Unfortunately, Virtual IPs will not work in a cloud environment and they are required for same-subnet deployments of HA.

0 Kudos

We are looking at developing the HA Solution and this page is a great place to start. 

aLTeReGo​ - The level of "You Rock!" you have obtained has exceed all known measures of numeric functioning and accounting!!!