20 Replies Latest reply on May 16, 2017 2:29 PM by foonly

    NetPath last hop high latency

    foonly

      We're just getting started with NPM12, and have installed:

       

      NPM 12.0 with Cumulative Update 4 contains:

          Orion Platform HotFix 4

          NPM HotFix 1

          NetPath HotFix 1

       

      And also installed SolarWinds-NPM-v12.0-HF2.exe.

       

      I have not see NetPath HotFix 1 listed as a downloadable item on the portal page, so I'm not sure if that is the same as what is listed in SolarWinds-NPM-v12.0-HF1.Readme.txt.

       

      In any case, this problem is not with the the 1st hop - it's with the last hop. I'm seeing very high latency on only that hop, for both the canned Google service as well as internal services I setup across our WAN. The latency is hundreds of milliseconds.

       

      Anyone else see this?

      !

      We are indeed going through firewalls fairly early in the path, and we do not decrement TTL on the firewall.

       

      =Foon=

        • Re: NetPath last hop high latency
          jamesatloop1

          This can be down to so many things but i can tell you it is not down to netpath as i have tested this numerous times. The most common things that cause issues like this would be a misconfigured load balancer, proxy server or even an incorrect vlan configuration. I would setup a new probe in a different location to the destination and see if you get the same issue.

            • Re: NetPath last hop high latency
              foonly

              I kinda agree that it has to be something between Orion and the destination.

               

              No load balancers or proxy servers involved - just an ASA with no NAT between our sites on an internal (not internet) MPLS WAN. We have 3 different Orion installs at different WAN sites with different local topologies.

               

              We do use LACP to connect Orion to a dual core switch at each location, and we see the expected 2 cores on the diagram. But that's the 1st hop. The problem I have is with the last hop.

               

              I finished upgrading the rest of the Orion modules (NCM, NTA, SAM) on one install. and behavior did not change. I also added the following to the ASAs:

               

              policy-map global_policy

              class class-default

                set connection decrement-ttl

               

              so that the hop for the ASA is visible.

               

              I've tried the following ports so far:

               

              443

              445

              22

               

              None of those have inspects on an ASA. So I tried ASA inspect netbios (TCP 139). No joy.

               

              So I added an additional poller that has a single NIC, and the last hop symptom disappeared. So I suspect the problem is with LACP etherchannel on the Orion server. Our LACP is set to hash on source and dest IP. That might be a curve ball for NetPath.

               

              Other than the oddity on the last hop, I really love NetPath. I do think NetPath will reveal other issues like this besides delays.

              1 of 1 people found this helpful
            • Re: NetPath last hop high latency
              jamesatloop1

              Yes, it is very good alright, i love the ability to see the hops within the isp. Try from another probe. It is very good for diagnosing drops on the atm path and verifying vlan configuration as well.

                • Re: NetPath last hop high latency
                  foonly

                  It does appear to be related to LACP on the Orion probe server. Our switch chassis is set:

                   

                  # show port-channel load-balance

                  System config:

                    Non-IP: src-dst mac

                    IP: src-dst ip rotate 0

                   

                  I tried killing one ethernet port of the LACP port channel of a different probe server, and it made no difference.

                   

                  A probe server connected to edge switches with a single NIC works fine to the same destinations.

                   

                  I have opened a medium priority support case # 1048405 on this.

                   

                  I'll post results as time permits.

                   

                  =Foon=

                    • Re: NetPath last hop high latency
                      cobrien

                      Any differences in the transit to the first L3 hop (including the first L2 segment, as you're talking about) should not impact the performance numbers NetPath provides for the last hop.

                       

                      Out of curiosity, are the min, median, and max latency values for the endpoint within 20% of each other or do they very wildly?

                       

                      I suspect we'll have to look at packet captures to solve this, so Support case as you've done is probably the best bet.

                        • Re: NetPath last hop high latency
                          foonly

                          No difference to 1st hop - under 1msec.

                           

                          I have a case open. We turned on some debugs in Orion, and did a couple of packet captures from the Orion server. I'm awaiting results.

                           

                          I learned some stuff - like how to enable debug level logging in Orion (run LogAdjuster.exe), and how to see the NetPath debugs on the web page (append ?debug to the URL for the particular "service").

                           

                          I don't like calling them "services". I'd rather call them "operations" or "instances".

                    • Re: NetPath last hop high latency
                      jamesatloop1

                      ah, what device is the latency showing on?

                      • Re: NetPath last hop high latency
                        jamesatloop1

                        Oh i know, but is it a firwall, router, wan swich?

                        • Re: NetPath last hop high latency
                          foonly

                          I think I found a fix for this particular symptom:

                           

                          Edit C:\ProgramData\Solarwinds\Orion\NetPath\NetPathAgent.cfg and change:

                           

                          "SendResetPerRoundStandard": to true. It is normally false.

                           

                          It's the very last setting in the file. I then stopped and started all Orion services via OrionService Manager, though I wish I knew which exact ones need to be restarted for an edit to this file.

                           

                          We had noticed that our ASA's were seeing hits from Orion to the NetPath target for TCP SYN attacks - part of ASA Attack Guard settings. It's not clear why this should happen more with LACP connected servers than single NIC servers. But now the ASA is happy, and NetPath is happy.

                           

                          This might help with other NetPath operations traversing an ASA with Attack Gaurds enabled. It's worth a try.

                            • Re: NetPath last hop high latency
                              cobrien

                              That's odd.  I believe that setting controls whether we send resets for the TCP sessions we create during probing.  Closing out the sessions with resets is being a good network citizen, but it seems your environment works better with us not closing.  Nice find foonly. Heads up lanli.fsm!

                                • Re: NetPath last hop high latency
                                  lanli.fsm

                                  Yes. I came across a similar case before and used the same workaround.

                                  The cause of this is due to:

                                  1. NetPath standard probing sends multiple rounds of SYN packets to the endpoint. The endpoint responds to each SYN with SYN-ACK.

                                  2. Typically, OS on NetPath probe catches SYN-ACK and responds with RESET packet which clears the half-open connection in ASA. So the ASA treats the next SYN for a new half-open connection.

                                  3. But in some rare cases, OS doesn't send RESET. ASA would catch an extra SYN for the existing half-open connection and may consider an anomaly.

                                   

                                  "SendResetPerRoundStandard" flag basically forces NetPath probe to send RESET packet by itself so it clears out the half-open connection in ASA.

                                  • Re: NetPath last hop high latency
                                    foonly

                                    Embryonic TCP sessions that traverse an ASA will remain open for minutes, I believe. Since we use TCP intercept to defend against DoS attacks, there's a limit to how many embryonic sessions. The Orion server has TCP sessions to other things via the ASA as well. That's why some sites may see this problem, and others may not.

                                     

                                    Sessions that have been ACKed stay open longer. So ASA messes with your head in the model of end-to-end TCP session packet crafting.

                                     

                                    Look at embryonic sessions and threat-detection statistics tcp-intercept at:

                                     

                                    http://www.cisco.com/c/en/us/td/docs/security/asa/asa93/configuration/firewall/asa-firewall-cli/conns-connlimits.pdf

                                     

                                    A lot depends on how many packets are sent how often, and how far into the path the ASA is. In our case, it's pretty early for a site that's 10 hops or more away.

                                     

                                    The one puzzling thing is how this differs on LACP from a single NIC probe node on an edge switch. I would have needed to capture packets at both the ASA and the probe node to see. Since our LACP hashes based on src and dest IP, returns from intermediate hops where TTL has expired may traverse a different path in the core switch. The ASA has

                                     

                                    Earlier in the problem, I had also enabled decrement-ttl to see the ASA itself:

                                     

                                    class class-default

                                      user-statistics accounting

                                      set connection decrement-ttl

                                     

                                    That's not the default setting.

                                     

                                    I also tried enabling and disabling inspect icmp and inspect icmp-error. I saw no difference.

                                     

                                    All this also inspires me to see if I can do a UnDP for TCP Intercepts to catch stuff like this, plus real DoS attacks, faster.

                                     

                                    =Foon=

                                      • Re: NetPath last hop high latency
                                        cobrien

                                        Aha, it seems enabling that setting makes us into the good net citizen we should be, rather than turning it off.

                                         

                                        I don't know what version of ASA code you're running, but in later versions, the embryonic limits don't control a hard cap but instead control when TCP cookies are used instead of TCP intercept.  With TCP cookies, the ASA derives the state information of a TCP session from an ack (as the third packet in the TCP 3 way handshake) so that it doesn't have to store state information.  This allows the ASA to protect from syn floods up to the limits of its CPU and bandwidth, rather than the limits of some state table.

                                         

                                        Lan and I will have to talk some more about how we deal with latency impact of syn flood protection.  Just another example where simple latency numbers can be more than a little bit complex to get right!

                                         

                                        I think you're on the right track about the LACP.  After LACP, as a control protocol, selects the links for the data plane to use, simple hashing to select egress port is the only way the traffic is functionally affected.  However, when a specific port is selected, i means the traffic hashes in a specific way that likely also defines what links will be used in load balancing decisions later along the transit path.

                                    • Re: NetPath last hop high latency
                                      foonly

                                      To confirm, I opened a case with SolarWinds, and sent them sniffer dumps. I learned from them about C:\ProgramData\Solarwinds\Orion\NetPath\NetPathAgent.cfg and I experimented and changed "SendResetPerRoundStandard": to true. It is normally false.as I noted in my post obove, dated Sep 30, 2016 5:43 PM.

                                       

                                      I cannot say that I see any downside to this setting.

                                       

                                      =seymour=

                                        • Re: NetPath last hop high latency
                                          mark@hi-technetworks.net

                                          I don't see this issue with probes that are configured on our Orion server, but I do see it with the NetPath Agent probe running on an external PC. I set C:\ProgramData\Solarwinds\Orion\NetPath\NetPathAgent.cfg  "SendResetPerRoundStandard": to true on the Orion server, but it didn't make any difference on the external NetPath probe. I don't see this setting on the external PC. Is there a way to force TCP RST on an external PC running the NetPath Agent?

                                           

                                          Thank you,

                                          Mark

                                            • Re: NetPath last hop high latency
                                              foonly

                                              Mark -

                                               

                                              Sorry, I missed your post.

                                               

                                              Is this still acting the same way?

                                               

                                              Sometimes, I wonder when we make a setting change, how does it propagate. What services must be restarted, In your case, it sounds like the external PC is an agent. I wonder if the agent needs to restart.

                                               

                                              When in doubt, reboot the whole state .... if I could just find the command ....