1 2 3 4 5 Previous Next 102 Replies Latest reply on Mar 21, 2019 10:35 AM by aLTeReGo Go to original post
      • 75. Re: Multi-Subnet Failover (WAN/DR) Deployment
        felixforbes

        Awesome! Thanks for the speedy response. I thought that was the case but

        never hurts to check  

         

        Regards,

         

        Felix

         

        On Tue, Aug 7, 2018 at 3:55 PM, pheonixnyte

        • 76. Re: Multi-Subnet Failover (WAN/DR) Deployment
          sreenathmp

          We are planning to setup Soalrwinds environment in Azure including HA component. In the current plan both primary and secondary polling engines will be in same availability set.

           

          I have gone through the responses you made. I have some doubts regarding those relating to my environment. Please help me to get a clarification on those.

           

          1. If we use virtual hostname for servers build in same subnet, is there any application related issues expected.

          2. Why VIP cannot be used in cloud environment (Sorry, im not a cloud expert )

          3. Read that F5 load balancer is used by some customers. Can we use Azure load balancer here, if yes then do we need DNS entries to be created for virtual hostname.

          • 77. Re: Multi-Subnet Failover (WAN/DR) Deployment
            pratikmehta003

            RichardLetts

            what kind of account did you create for this? Did the non admin creds work and was able to authenticate from Solarwinds?

             

            We are having tough time with one of the customer who is not willing to allow connection to DNS inspite of informing no changes will be performed apart from the entry being created for HA...

            • 78. Re: Multi-Subnet Failover (WAN/DR) Deployment
              aLTeReGo

              BIND does not use account credentials. Instead, a TSIG key is used.

              1 of 1 people found this helpful
              • 79. Re: Multi-Subnet Failover (WAN/DR) Deployment
                pratikmehta003

                sorry i missed the BIND DNS here... u can ignore my query...

                • 80. Re: Multi-Subnet Failover (WAN/DR) Deployment
                  ashleyh

                  We are trying to configure HA between our polling engines. BIND is out. We do have some microsoft DNS but our 'DNS people' in the company insist on not using WMI, "WMI queries are expensive and nowadays there are much better ways of achieving this, e.g. AD Web Services".

                   

                  Im a networking guy, so I dont know much about this. Anyone have any ideas about how we could set this up?

                   

                  We do have F5 GTMs which do DNS..however they do "health checks" to see which sides are up/down. As both the active and standby nodes return the HTTPs login page (with the same responses), a health check from an external source to decide which server is active/standby doesnt work. I'd need to develop some form of inteliigent health check, or otherwise follow this ADWS method that I know nothing about.

                   

                  Any help much appreciated!

                  • 81. Re: Multi-Subnet Failover (WAN/DR) Deployment
                    aLTeReGo

                    The SolarWinds information service only runs on the 'Active' member and should be used as your health check when front-ending HA with a load balancer. The Information Service port runs on TCP 17777.

                    1 of 1 people found this helpful
                    • 82. Re: Multi-Subnet Failover (WAN/DR) Deployment
                      ashleyh

                      Thanks. Unfortunatly if I perform a test to that port, both Active and Standby servers appear open on that port.

                       

                      telnet 10.x.x.x 17777

                      Trying 10.x.x.x...

                      Connected to xxxxxx.abc.com.

                      Escape character is '^]'.

                      ^CConnection closed by foreign host.

                       

                      telnet 10.y.y.y 17777

                      Trying 10.y.y.y...

                      Connected to yyyyyyy.abc.com.

                      Escape character is '^]'.

                       

                      Is this not expected behaviour? Ips masked, but 10.x.x.x is our Active node and 10.y.y.y is our standby node..

                       

                      Thanks in advance.

                      • 83. Re: Multi-Subnet Failover (WAN/DR) Deployment
                        oiram

                        I think in HA env better option would be TCP/17778 instead

                        • 84. Re: Multi-Subnet Failover (WAN/DR) Deployment
                          ashleyh

                          Thanks. I just tried a telnet test to both Active and Standby on 17778. They both give no response (but not closed). Checked our firewalls and it just sees the connection "Aged out". No response is fine but they are both giving me the same (no) response, unless there is a HTTP Get or something I could perform to expect something back for a health check..

                          • 85. Re: Multi-Subnet Failover (WAN/DR) Deployment
                            oiram

                            Interesting, look at what I have in my env:

                             

                            ACTIVE:

                             

                            netstat -ona | find ":17778"

                              TCP    0.0.0.0:17778          0.0.0.0:0              LISTENING       4

                              TCP    10.160.198.163:17778   10.160.198.99:49824    ESTABLISHED     4

                              TCP    10.160.198.163:17778   10.160.198.99:49826    ESTABLISHED     4

                              TCP    [::]:17778             [::]:0                 LISTENING       4

                             

                             

                            PASSIVE:

                             

                            netstat -ona | find ":17778"

                              TCP    10.160.198.99:49824    10.160.198.163:17778   ESTABLISHED     2088

                              TCP    10.160.198.99:49826    10.160.198.163:17778   ESTABLISHED     2088

                             

                             

                            For me, only active one would respond to "telnet" on 17778. Also using TCPING (from Eli => tcping.exe - ping over a tcp connection )

                             

                             

                            ACTIVE:

                             

                            c:\TOOLS>tcping 10.160.198.163 17778

                             

                             

                            Probing 10.160.198.163:17778/tcp - Port is open - time=1.987ms

                            Probing 10.160.198.163:17778/tcp - Port is open - time=1.186ms

                            Probing 10.160.198.163:17778/tcp - Port is open - time=1.172ms

                            Probing 10.160.198.163:17778/tcp - Port is open - time=1.163ms

                             

                             

                            Ping statistics for 10.160.198.163:17778

                                 4 probes sent.

                                 4 successful, 0 failed.

                            Approximate trip times in milli-seconds:

                                 Minimum = 1.163ms, Maximum = 1.987ms, Average = 1.377ms

                             

                             

                            PASSIVE:

                             

                            c:\TOOLS>tcping 10.160.198.99 17778

                             

                             

                            Probing 10.160.198.99:17778/tcp - No response - time=2001.450ms

                            Probing 10.160.198.99:17778/tcp - No response - time=2000.853ms

                            Probing 10.160.198.99:17778/tcp - No response - time=2000.448ms

                            Probing 10.160.198.99:17778/tcp - No response - time=2001.178ms

                             

                             

                            Ping statistics for 10.160.198.99:17778

                                 4 probes sent.

                                 0 successful, 4 failed.

                            Was unable to connect, cannot provide trip statistics.

                            1 of 1 people found this helpful
                            • 86. Re: Multi-Subnet Failover (WAN/DR) Deployment
                              ashleyh

                              Thats brilliant, thanks a lot. After some digging it looks like TCP/17778 was being blocked by Windows Server firewall locally. No idea why I didn't get other issues. Anyhow, I have successfully opened this, and set up the F5 to monitor tcp/17778 as a health check, works perfectly.

                              • 87. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                pratikmehta003

                                aLTeReGo

                                 

                                I have got into some issues w.r.t HA.. I had it running fine after intial test and both members were showing correct status in pool...

                                Later, we had to remove the secondary member for the pool due to some request and what i did was just removed the member from the pool and deleted the pool--> this was done from console.

                                 

                                After a month or so we had to get the HA ready but found that we are not able to create HA pool( no option was visible in console). So i worked with support and we did a reboot of the secondary and the option was visible back again.. But we found another issue where the HA license is showing not assigned in the license manager..

                                 

                                Current status: Both members are showing down in console in the High availability summary and license is also showing not assigned... i still have the support ticket open but m not finding it comfortable with the way steps are being checked... And today i am being told to reset all the modules licenses and re-activate it...

                                 

                                Can you provide some insight into this if there is a better way to troubleshoot and resolve?

                                • 88. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                  aLTeReGo

                                  pratikmehta003 , there's quite a bit going on here between both pool members reporting a 'down' status to the HA licenses not being assigned. Note that HA licenses will be consumed automatically if they are unused HA licenses available. License assignment is simply a mechanism for moving HA licenses around between pools if needed. It sounds more likely to me that the issue you are having is orphaned entries in your HA database table. I would suggest you ask support to check there first.

                                  • 89. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                    pratikmehta003

                                    Yes I too suspect the same and have been telling the support engineer about

                                    it but he is not listening...

                                     

                                    I did try to reboot the secondary today and ran config wizard but didn't

                                    help much... I got an error for license business layer for secondary...

                                    Primary is showing green now, earlier the HA service was not running and

                                    that's why it was red..

                                     

                                    I will dig in from my side on DB.. any other recommendations from.your side

                                    or any KBs that I can follow?

                                    • 90. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                      aLTeReGo

                                      What is your case number? I'll do my best to look it for you.

                                      • 91. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                        pratikmehta003

                                        Here is the case number: 00173972

                                        • 92. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                          pratikmehta003

                                          aLTeReGo

                                           

                                          One question on sync between Primary and Secondary servers... We recently had patching of OS and what happened was that secondary took over, probably due to the HA service being in manual state for primary and after reboot the services were in 'not running state'... Everything was working from secondary...

                                           

                                          Now my question is-- When some issues like this happen and the pool status shows ' it is partially working' OR we disable it for some time then what happens to the sync from both servers towards DB? Anything specific that should be followed so that there is no data loss or bare minimum loss... I had this issue twice and also had to run config wizard as i got some error for DPA plugin...

                                          • 93. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                            aLTeReGo

                                            pratikmehta003, I'm not sure I follow exactly. When performing Windows Updates, it's normal that a failover to the other member in an HA pair occurs. This is the result of the server rebooting. If you like, you can have HA fail back to a prefered member when it comes back online. This is an option when editing or creating an HA pool.

                                             

                                            HA is aware of the status of its other members via direct communication with each other, as well as through the database in the event direct network connectivity is lost for any reason (usually WAN). When a member is lost for any reason, the standby will pickup the responsibility. When a member is in a 'partially working' state, it will not assume responsibility unless the other member is in an even worse state. E.G. down.

                                            1 of 1 people found this helpful
                                            • 94. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                              pratikmehta003

                                              Hi aLTeReGo Yes i have already made the setting of having the preferred primary server... I was trying to understand the sync between the 2 servers and the data written to DB.

                                               

                                              So if i understand correctly, then it means that any member which is active in the pool and other is down, there would still be no data loss. When the other member is also up it would be termed as standby and there's no sync required with other member... am i right?

                                               

                                              And about your statement on 'partially working state', if i have one member in Up state and other experiencing problem then it would still not impact the data collection right? Only thing is that the other member will not be taking part in the Pool....

                                              • 95. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                                David Smith

                                                Whichever server is acting as the Primary will ensure that the second server doesn't take over until it is fully ready to do so (Assuming you have that configured). If your primary server fails and your secondary takes over it will resume polling all your devices and storing the data in the database. In the event that the Primary restores and is ready to take over, the secondary will cease polling and the Primary will once again take over. It is impossible for both servers to be polling/collecting data at the same time so the only data loss would be the small window between switching server roles.

                                                1 of 1 people found this helpful
                                                • 96. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                                  pratikmehta003

                                                  Thanks David for the explanation

                                                   

                                                  I will observe the behavior in next patching and provide feedback if i find something abnormal...

                                                  • 97. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                                    abdulraheemsidz

                                                    Its old post but there is a question in Mutil Subnet deployment.

                                                     

                                                    If the reachable between both subnets fail, will both Primary and Secondary servers start polling out or how will it be ?

                                                     

                                                    In case of receiving traps and netflow on the VIP/dns how it will be ?

                                                    • 98. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                                      aLTeReGo

                                                      abdulraheemsidz  wrote:

                                                       

                                                      Its old post but there is a question in Mutil Subnet deployment.

                                                       

                                                      If the reachable between both subnets fail, will both Primary and Secondary servers start polling out or how will it be ?

                                                       

                                                      In case of receiving traps and netflow on the VIP/dns how it will be ?

                                                      If both members are equally distributed, then no failover will occur. Split brain is not something which can happen with HA, as the SQL server acts as quorum.

                                                      • 99. Re: Multi-Subnet Failover (WAN/DR) Deployment
                                                        abdulraheemsidz

                                                        Thanks aLTeReGo.

                                                         

                                                        Since Multiple Subnet has DNS in picture to configure Virutal Hostname for HA to detect fail overs and work, How do we configure Netflow and SNMP traps?