cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Product Manager
Product Manager

Multi-Subnet Failover (WAN/DR) Deployment

High Availability 2.0 provides the first peek into supporting redundancy for Orion across subnets. This was previously referred to WAN deployment or Disaster Recovery with the Failover Engine, but under High Availability we refer to this simply as a multi-subnet failover configuration. In other words, this provides the same automated, near instantaneous, failover and recovery mechanisms as High Availability does in its first release, but extends that functionality to support pollers spread across different subnets. Those could be different sites, a dedicated disaster recovery location, or possibly even the cloud.

HIGH AVAILABILITY REQUIREMENTS

  • High Availability 2.0 Installer (Built-in and located under [Settings -> All Settings -> High Availability Deployment Summary -> Setup A New HA Server -> Get Started Setting Up a Server -> Download Installer Now]
    • High Availability 2.0  Can be used only with product modules running with Orion Core 2017.3
  • Two servers running Windows Server 2012 or later
    • Both primary and secondary servers must reside on different subnets for multi-subnet failover
      • Primary and secondary servers which reside on the same subnet can be used for same-subnet failover using a traditional VIP
    • Windows or BIND DNS Server credentials for configuring the virtual hostname
    • Windows Server OS version, edition, or bitness need not match between primary and secondary servers.
    • Primary and secondary servers may be optionally joined to a Windows domain
    • High Availability supports the following configurations of primary and secondary servers.
      • Physical to Physical
      • Physical to Virtual
      • Virtual to Virtual
      • Virtual to Physical
  • A separate server running SQL 2012 or later.
    • This server does not need to reside on the same subnet as either the primary and secondary Orion server
    • Any Microsoft SQL edition may be used, including SQL Express
    • Bonus points for utilizing a SQL Cluster

pastedImage_4.png

PRIMARY SERVER INSTALL

When installing the Primary Orion server you will follow the normal 'Advanced' installation process that you would for any other Orion product. Ensure not to select the 'Express' install option during installation, as a separate server running Microsoft SQL 2012 or later is required. When the Configuration Wizard runs you will be prompted to provide the Username, Password, and IP address of the SQL server you will be using for the installation.

SECONDARY SERVER INSTALL

Once the primary server is up and running using the NPM 12.2 installer, you will need to perform a similar installation on the secondary server using the separate High Availability installer which can be downloaded from within the Orion web interface under [Settings -> All Settings -> High Availability Deployment Summary -> Setup A New HA Server -> Get Started Setting Up a Server -> Download Installer Now].

Download the High Availability Secondary Server Installer

All Settings.png
High Availability Settings.png
High Availability Deployment Summary.png
pastedImage_7.png
Evaluate High Availability.png
pastedImage_9.png

Next, execute the installation by double clicking on the "SolarWinds-Orion-Installer.exe" downloaded or copied to the secondary server.  Enter the IP address of fully qualified domain name (FQDN) of your main Orion server, along with 'Admin' or equivalent credentials used to log into the Orion web interface and click 'Next'. On the following step of the Wizard, select the additional server role you wish to install. Since this will be a High Availability Backup for the main Orion server, select 'Backup Server for Main Server Protection' and click 'Next'.

Enter IP of Main Orion Server & Provide 'Admin' Credentials

Select Server Role to Install

pastedImage_0.pngpastedImage_1.png

Once the Installation completes the Configuration Wizard will be started. When prompted to provide information regarding the SQL server database, ensure you utilize the same SQL instance and SQL database that was chosen for the primary Orion server.

The following video, while arguably boring to watch, demonstrates the secondary server installation process.

CLUSTER POOL CREATION

As soon as both the primary and secondary servers are installed, return to the Orion web interface under [Settings -> All Settings -> High Availability Deployment Summary]. There you will be able to join the two servers into a multi-subnet failover pool.

Click 'Set up High Availability Pool"
Setup High Availability Pool.png
Enter a Virtual Hostname and click 'Next'
Pool Properties.png
Select your DNS Server Type
DNS Settings.png
Microsoft DNS

Enter the IP Address of your DNS Server, the DNS Zone (E.G. solarwinds.com) and administrative credentials to the DNS server to create the shared virtual hostname

Microsoft DNS.png

BIND DNS

If you are running BIND DNS, enter the IP address of your BIND DNS server, the DNS Zone, your TSIG secret key name, and the TSIG shared secret key value.

BIND.png

Summary

Once complete, review the summary and click "Create Pool"

Summary.png

Success

When done, you will have pooled two Orion servers together across multiple subnets into a redundant, high availability pool

Setup Complete.png

The following short video walks through this process in under a minute.

110 Replies

Thank you!! This is absolutely amazing!

0 Kudos
Level 9

aLTeReGo

We have installed HA over different subnet in our dev environment and everything is working fine. But after every failover we have to give the active servers name to access the console. We are using Infoblox as a dns here. And we also put the alias name in infoblox for the primary server but we cant put the same alias for secondary(HA) also.

So i just want to know how this virtual hostname thing will work in this scenario? Or what should i do to use the same name for web console so that everytime we dont have to manually change the web address?

Please help!!

0 Kudos

rahul.s  wrote:

aLTeReGo

We have installed HA over different subnet in our dev environment and everything is working fine. But after every failover we have to give the active servers name to access the console. We are using Infoblox as a dns here. And we also put the alias name in infoblox for the primary server but we cant put the same alias for secondary(HA) also.

So i just want to know how this virtual hostname thing will work in this scenario? Or what should i do to use the same name for web console so that everytime we dont have to manually change the web address?

Please help!!

Each server will have their own unique name, but there should also be a third FQDN CNAME record which is updated upon failover to point to the active member in the HA pool. This is the name that users will use to access the Orion web interface.

0 Kudos

We have created virtual hostname as 'solarwindsdev' and with this name only we have created A record in infoblox with the target server as primary server. Now in Infoblox we cannot point this virtual hostname to standby server again. So when the failover is happening then we are not able to reach the console by giving URL as 'solarwindsdev'. We have to put the secondary server name in URL to open the console again.

We are not getting any KB for switching the DNS record if we are using non Microsoft/Bind DNS apart from AWSRoute53.

Can you please provide us some steps or resolutions for this.

0 Kudos

rahul.s  wrote:

We have created virtual hostname as 'solarwindsdev' and with this name only we have created A record in infoblox with the target server as primary server. Now in Infoblox we cannot point this virtual hostname to standby server again. So when the failover is happening then we are not able to reach the console by giving URL as 'solarwindsdev'. We have to put the secondary server name in URL to open the console again.

We are not getting any KB for switching the DNS record if we are using non Microsoft/Bind DNS apart from AWSRoute53.

Can you please provide us some steps or resolutions for this.

You may want to review the following post. Especially the comments at the bottom.

SolarWinds High Availability update Infoblox DNS Record

Thanks a lot !!

0 Kudos
Level 10

I'm currently in the process of setting up HA using "other" DNS option. I have a script that does DNS record switch. However, when trying to set up alert following the documentation guide, it gets triggered for both active and standby in a loop and eventually locks out. Has anyone been able to get that alerts set up?

pastedImage_0.png

0 Kudos
Level 9

We have 10x HA pairs setup but don't like to rely on DNS for network devices as its outside of our control (and a lot don't support hostname trapping etc).

For our setup we have created HA pairs using dummy virtual hostnames that do not actually exist and a dns server that doesn't exist and is non-routable.

We have all of our HA pairs built and we use external load-balancers in front of the setup (with source address translation turned off) to forward syslog, traps and netflow across all of our poller pairs. that way we dont have to rely on DNS to receive incoming traps. As long as you trap to matching pairs its all good, even with multiple load balancers across different geographical regions because the Standby HA poller just drops any incoming messages, so no duplicates.

Probably not the most simple setup but it means we dont have to use any DNS and we can have a consistent config across our entire estate for Syslog, trap and netflow destinations.

Hi, would you be able to provide the process used to create the dummy virtual and DNS?

0 Kudos

Its old post but there is a question in Mutil Subnet deployment.

If the reachable between both subnets fail, will both Primary and Secondary servers start polling out or how will it be ?

In case of receiving traps and netflow on the VIP/dns how it will be ?

0 Kudos

abdulraheemsidz  wrote:

Its old post but there is a question in Mutil Subnet deployment.

If the reachable between both subnets fail, will both Primary and Secondary servers start polling out or how will it be ?

In case of receiving traps and netflow on the VIP/dns how it will be ?

If both members are equally distributed, then no failover will occur. Split brain is not something which can happen with HA, as the SQL server acts as quorum.

0 Kudos

Thanks aLTeReGo​.

Since Multiple Subnet has DNS in picture to configure Virutal Hostname for HA to detect fail overs and work, How do we configure Netflow and SNMP traps?

0 Kudos

There are at least three options available to you. Probably many more depending how creative you get.

  1. Configure your devices to sent netflow/syslog/traps to the HA Pool's virtual hostname. Probably the simplest solution
  2. Configure your devices to send netflow/syslog/traps to both members of the HA Pool. Only the 'active' member will process them.
  3. Send netflow/syslog/traps to a load balancer like and F5, NetScaler, etc. or use Cisco SLB which will forward to the active pool member.
0 Kudos

I would highly recommend you create a DNS Alias that you setup outside of your Solarwinds HA which in turn points to the HA DNS Name. That way you can control the re-direct later on if required such as putting traps into a dedicated log server etc.

Both NetFlow and SNMP Traps should be configurable to a hostname, but it might depend on your equipment.

- David Smith
0 Kudos
Level 16

aLTeReGo

One question on sync between Primary and Secondary servers... We recently had patching of OS and what happened was that secondary took over, probably due to the HA service being in manual state for primary and after reboot the services were in 'not running state'... Everything was working from secondary...

Now my question is-- When some issues like this happen and the pool status shows ' it is partially working' OR we disable it for some time then what happens to the sync from both servers towards DB? Anything specific that should be followed so that there is no data loss or bare minimum loss... I had this issue twice and also had to run config wizard as i got some error for DPA plugin...

0 Kudos

pratikmehta003​, I'm not sure I follow exactly. When performing Windows Updates, it's normal that a failover to the other member in an HA pair occurs. This is the result of the server rebooting. If you like, you can have HA fail back to a prefered member when it comes back online. This is an option when editing or creating an HA pool.

HA is aware of the status of its other members via direct communication with each other, as well as through the database in the event direct network connectivity is lost for any reason (usually WAN). When a member is lost for any reason, the standby will pickup the responsibility. When a member is in a 'partially working' state, it will not assume responsibility unless the other member is in an even worse state. E.G. down.

0 Kudos

Hi aLTeReGo​ Yes i have already made the setting of having the preferred primary server... I was trying to understand the sync between the 2 servers and the data written to DB.

So if i understand correctly, then it means that any member which is active in the pool and other is down, there would still be no data loss. When the other member is also up it would be termed as standby and there's no sync required with other member... am i right?

And about your statement on 'partially working state', if i have one member in Up state and other experiencing problem then it would still not impact the data collection right? Only thing is that the other member will not be taking part in the Pool....

0 Kudos

Whichever server is acting as the Primary will ensure that the second server doesn't take over until it is fully ready to do so (Assuming you have that configured). If your primary server fails and your secondary takes over it will resume polling all your devices and storing the data in the database. In the event that the Primary restores and is ready to take over, the secondary will cease polling and the Primary will once again take over. It is impossible for both servers to be polling/collecting data at the same time so the only data loss would be the small window between switching server roles.

- David Smith

Thanks David for the explanation

I will observe the behavior in next patching and provide feedback if i find something abnormal...

0 Kudos
Level 16

aLTeReGo

I have got into some issues w.r.t HA.. I had it running fine after intial test and both members were showing correct status in pool...

Later, we had to remove the secondary member for the pool due to some request and what i did was just removed the member from the pool and deleted the pool--> this was done from console.

After a month or so we had to get the HA ready but found that we are not able to create HA pool( no option was visible in console). So i worked with support and we did a reboot of the secondary and the option was visible back again.. But we found another issue where the HA license is showing not assigned in the license manager..

Current status: Both members are showing down in console in the High availability summary and license is also showing not assigned... i still have the support ticket open but m not finding it comfortable with the way steps are being checked... And today i am being told to reset all the modules licenses and re-activate it...

Can you provide some insight into this if there is a better way to troubleshoot and resolve?

0 Kudos