This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

NetPath last hop high latency

We're just getting started with NPM12, and have installed:

NPM 12.0 with Cumulative Update 4 contains:

    Orion Platform HotFix 4

    NPM HotFix 1

    NetPath HotFix 1

And also installed SolarWinds-NPM-v12.0-HF2.exe.

I have not see NetPath HotFix 1 listed as a downloadable item on the portal page, so I'm not sure if that is the same as what is listed in SolarWinds-NPM-v12.0-HF1.Readme.txt.

In any case, this problem is not with the the 1st hop - it's with the last hop. I'm seeing very high latency on only that hop, for both the canned Google service as well as internal services I setup across our WAN. The latency is hundreds of milliseconds.

Anyone else see this?

!

We are indeed going through firewalls fairly early in the path, and we do not decrement TTL on the firewall.

=Foon=

  • This can be down to so many things but i can tell you it is not down to netpath as i have tested this numerous times. The most common things that cause issues like this would be a misconfigured load balancer, proxy server or even an incorrect vlan configuration. I would setup a new probe in a different location to the destination and see if you get the same issue.

  • I kinda agree that it has to be something between Orion and the destination.

    No load balancers or proxy servers involved - just an ASA with no NAT between our sites on an internal (not internet) MPLS WAN. We have 3 different Orion installs at different WAN sites with different local topologies.

    We do use LACP to connect Orion to a dual core switch at each location, and we see the expected 2 cores on the diagram. But that's the 1st hop. The problem I have is with the last hop.

    I finished upgrading the rest of the Orion modules (NCM, NTA, SAM) on one install. and behavior did not change. I also added the following to the ASAs:

    policy-map global_policy

    class class-default

      set connection decrement-ttl

    so that the hop for the ASA is visible.

    I've tried the following ports so far:

    443

    445

    22

    None of those have inspects on an ASA. So I tried ASA inspect netbios (TCP 139). No joy.

    So I added an additional poller that has a single NIC, and the last hop symptom disappeared. So I suspect the problem is with LACP etherchannel on the Orion server. Our LACP is set to hash on source and dest IP. That might be a curve ball for NetPath.

    Other than the oddity on the last hop, I really love NetPath. I do think NetPath will reveal other issues like this besides delays.

  • Yes, it is very good alright, i love the ability to see the hops within the isp. Try from another probe. It is very good for diagnosing drops on the atm path and verifying vlan configuration as well.

  • It does appear to be related to LACP on the Orion probe server. Our switch chassis is set:

    # show port-channel load-balance

    System config:

      Non-IP: src-dst mac

      IP: src-dst ip rotate 0

    I tried killing one ethernet port of the LACP port channel of a different probe server, and it made no difference.

    A probe server connected to edge switches with a single NIC works fine to the same destinations.

    I have opened a medium priority support case # 1048405 on this.

    I'll post results as time permits.

    =Foon=

  • ah, what device is the latency showing on?

  • Any differences in the transit to the first L3 hop (including the first L2 segment, as you're talking about) should not impact the performance numbers NetPath provides for the last hop.

    Out of curiosity, are the min, median, and max latency values for the endpoint within 20% of each other or do they very wildly?

    I suspect we'll have to look at packet captures to solve this, so Support case as you've done is probably the best bet.

  • No difference to 1st hop - under 1msec.

    I have a case open. We turned on some debugs in Orion, and did a couple of packet captures from the Orion server. I'm awaiting results.

    I learned some stuff - like how to enable debug level logging in Orion (run LogAdjuster.exe), and how to see the NetPath debugs on the web page (append ?debug to the URL for the particular "service").

    I don't like calling them "services". I'd rather call them "operations" or "instances".

  • Oh i know, but is it a firwall, router, wan swich?

  • I think I found a fix for this particular symptom:

    Edit C:\ProgramData\Solarwinds\Orion\NetPath\NetPathAgent.cfg and change:

    "SendResetPerRoundStandard": to true. It is normally false.

    It's the very last setting in the file. I then stopped and started all Orion services via OrionService Manager, though I wish I knew which exact ones need to be restarted for an edit to this file.

    We had noticed that our ASA's were seeing hits from Orion to the NetPath target for TCP SYN attacks - part of ASA Attack Guard settings. It's not clear why this should happen more with LACP connected servers than single NIC servers. But now the ASA is happy, and NetPath is happy.

    This might help with other NetPath operations traversing an ASA with Attack Gaurds enabled. It's worth a try.