HA for NPM with Microsoft Active Directory & DNS

Greetings & Salutations my fellow Thwackians...

We are trying to utilize our Microsoft Domain through Active Directory and DNS to establish several regional High Availability Pollers. We have come up with a conceptual drawing, please see the attached PNG file. Our questions are:

- Has anyone done this yet?

--- If so, how were you able to utilize a "Virtual Host Name" for polling traffic instead of going to the primary poller?

- We have followed the steps in this SW Article.

- We have successfully run "WBEMTEST"

When triggering a failover, SW successfully fails over to the HA/failover poller with the appropriate server services on each server stopping and starting. However, there is no change in our DNS to redirect traffic to the HA/failover poller. 

--- The end result is inbound agent node and web traffic do not see the failover and are still pointed to the inactive poller. 

--- Has anyone been able to get this to work? 

jkokozian_0-1604524026034.png

Parents

  • --- If so, how were you able to utilize a "Virtual Host Name" for polling traffic instead of going to the primary poller?

    Which IP address is used for polling is determined by the operating system, but you have some ability to control this behavior based upon the IP addresses you assign to each machine and the VIP. 

    https://documentation.solarwinds.com/en/Success_Center/orionplatform/Content/HA_Which_IP_is_the_Source.htm


    -When triggering a failover, SW successfully fails over to the HA/failover poller with the appropriate server services on each server stopping and starting. However, there is no change in our DNS to redirect traffic to the HA/failover poller. 

    --- The end result is inbound agent node and web traffic do not see the failover and are still pointed to the inactive poller. 


    Agents are not reliant upon the DNS name or a VIP to connect to the Active Member in an HA pair. The Agents themselves are fully HA aware and will follow the active member, even when DNS is not used. It sounds like something else here is amiss.

    I would recommend reviewing the Windows Event Log on the DNS server to see why the DNS update failed. For simplicity, I would suggest trying domain administrator credentials for the DNS update. If that works, then it's definitely a permission issue that needs to be worked through. If it doesn't work even with the domain admin credentials, then the first place I would suggest looking is access control lists and firewall policies to see where/why traffic is being dropped. 

    --- Has anyone been able to get this to work? 


    There are literally thousands of customers running HA today successfully. A not-insignificant percentage of them doing so in Azure, AWS, GCP, Oracle Cloud, and others. 

Reply

  • --- If so, how were you able to utilize a "Virtual Host Name" for polling traffic instead of going to the primary poller?

    Which IP address is used for polling is determined by the operating system, but you have some ability to control this behavior based upon the IP addresses you assign to each machine and the VIP. 

    https://documentation.solarwinds.com/en/Success_Center/orionplatform/Content/HA_Which_IP_is_the_Source.htm


    -When triggering a failover, SW successfully fails over to the HA/failover poller with the appropriate server services on each server stopping and starting. However, there is no change in our DNS to redirect traffic to the HA/failover poller. 

    --- The end result is inbound agent node and web traffic do not see the failover and are still pointed to the inactive poller. 


    Agents are not reliant upon the DNS name or a VIP to connect to the Active Member in an HA pair. The Agents themselves are fully HA aware and will follow the active member, even when DNS is not used. It sounds like something else here is amiss.

    I would recommend reviewing the Windows Event Log on the DNS server to see why the DNS update failed. For simplicity, I would suggest trying domain administrator credentials for the DNS update. If that works, then it's definitely a permission issue that needs to be worked through. If it doesn't work even with the domain admin credentials, then the first place I would suggest looking is access control lists and firewall policies to see where/why traffic is being dropped. 

    --- Has anyone been able to get this to work? 


    There are literally thousands of customers running HA today successfully. A not-insignificant percentage of them doing so in Azure, AWS, GCP, Oracle Cloud, and others. 

Children
No Data