This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Do you adjust TCP window size to increase WAN throughput as distance and latency increases?

Occasionally I hear complaints from vendors about low WAN throughput at some of my remote sites.  I provide dual resilient 1 Gig uplinks from them into a 10 Gig MPLS cloud that also has dual 10 Gig links to my data centers.

NPM proves bandwidth is not an issue--traffic is not bottle-necking or choking at the end sites, there's room for them to send or receive plenty more.  Yet sometimes vendors say they're running into throughput problems.

I ask if their application is very sensitive to latency.  My three regional hub sites are about 300 miles away from each other, and latency across those 1 Gig MPLS circuits is 11 milliseconds.  I warn them that if they need less latency than that, their application / hardware may not be appropriate for our WAN sites.

One of my peers recently was troubleshooting this exact issue and came across an interesting article:  How to Calculate TCP throughput for long distance WAN links

It's brief, simple, and helps understand how a PC or server using 1 Gig WAN pipe might only be able to transfer a LOT less than 1 Gb/s.

In my case I learned TCP window size and 11ms WAN latency limits individual transmissions across the 1 Gig WAN pipes to no more than 52 Mb/s.

Read the article in the link and let me know:

  1. Does apply to your environment?
  2. Have you run into the issue and improved throughput by adjusting TCP window size?
  3. Are you using WAN Accelerators to address the problem?
    1. If you're using WAN accelerators, what brands & models have you tested, what did you like, what was the cost per end site and per core site?
    2. What problems did you discover on the way?  Were there things that simply couldn't be addressed & corrected?
  4. How do you adjust TCP windows size on a PC and on a server?
  5. Does the TCP window size need to be adjusted on both PC and Server, or only on one side or the other?
  • At the end of last year we had a site with a 100 meg line. We suggested WAN acceleration, but the customer decided that increasing the pipe size was what they wanted. We put in a Gig connection to them (30 miles away) and the speed appeared the same, lots of delays and complaints about performance. Working with our firewall vendor and the ISP everything seemed to function as expected. Even though latency appeared to be pretty good the ISP finally agreed to "optimize" the connection. i.e. instead of just tossing the traffic to the "cloud" they set up a hard coded pathing between the two sites. Once that was done everything performed much better.

    Bottom line it turned out to be latency.  We tried tinkering with the windows size but since the traffic is also in a VPN tunnel, that didn't ultimately have much effect.

  • 30 miles shouldn't have produced appreciable latency, IMHO.  I'd expect <5 ms delay for that distance, based on my sites that are that far away.  It makes me wonder what the latency restrictions for the vendor's application or hardware might be, and what latency you were experiencing across that distance.  But I'm using direct fiber--your distance could have been over wireless for all I know.

    Optimizing a connection by selecting more efficient / shorter paths certainly is a factor.  Years ago I opened a new hospital WAN connect with some bundled T1's.  The distance as the crow flies was 75 miles, but latency was in the 250 ms range.  A traceroute revealed my packets to that site went from Duluth, Minnesota to St. Louis, Missouri, then to Nashville, then to Chicago, then to Minneapolis, Minnesota, and finally back up to the new hospital at Hinckley, Minnesota.  That 75-mile path was actually traveling over 2,600 miles to get to the new site, via this "un-optimized" path.  That was with ATT/Qwest.  Eventually we got that resolved and the latency dropped to 7 ms.

    Certainly a VPN tunnel adds overhead & doesn't help latency at all.

  • Over the years I have come across many applications that are not "WAN friendly". If an application can't handle running at 52Mbps then it probably isn't even LAN friendly.

    Another thing is verify where your traffic actually goes. I have seen a vendor sell a "virtual Ethernet" circuit, with both ends in one small city. The only switch they had in the state (they were running on another vendor's fiber) was a couple of hundred miles away. It was still pretty quick (~15 ms latency) but if I we had bought the fiber providers circuit, latency would have been <4 ms.

    1. Does apply to your environment?  This applies to every environment; Hold the application vendors to account.
    2. Have you run into the issue and improved throughput by adjusting TCP window size? No, I have never adjusted the window size.
    3. Are you using WAN Accelerators to address the problem? I have, in my previous position, used WAN Accelerators.
      1. If you're using WAN accelerators, what brands & models have you tested, what did you like, what was the cost per end site and per core site? I don't know the cost, but Cisco WAAS is very solid.
      2. What problems did you discover on the way?  Were there things that simply couldn't be addressed & corrected? Some things just can't be accelerated.
    4. How do you adjust TCP windows size on a PC and on a server? Not sure....
    5. Does the TCP window size need to be adjusted on both PC and Server, or only on one side or the other? Honestly, most modern operating systems allow for Selective ACK and Window Scaling, so I can't imagine that this would be necessary.
  • The physical distance is about 30 miles, but the real distance could vary greatly depending upon the route that the traffic was taking. Keeping in mind in an area that has far less density a left turn instead of a right turn could actually add hundreds of miles to the actual route. Kind of like if you are in a major city and make a wrong turn at an intersection you might add a mile or so to the overall distance, but when you are out here if you turn on the wrong country road it could be 20 miles before you get the chance to turn to get back in the right direction and another 20 to the next street and another 30 to get back on the right destination road. It's the same way with telecom - sort of.

  • You're right on, regarding paths and distance.

    One of my regional hubs is 125 miles from my main site, and the two resilient paths are very different geographically.  One runs about that same 125 miles, the other is closer to 300 miles. Sadly, the WAN provider insists the longer path be the primary path.  The users love maintenance windows, when the primary path fails over to the shorter path.  They can tell the improvement in latency by noticeably better application performance.  And they know right away when the longer path becomes primary again.

  • 1. We totally run into this issue; in our case our WAN links are 100Gbps(*) and stretch from Tokyo (through Seattle) to Chicago and the bandwidth-latency product is an issue,

    2. For large datasets we use bbcp

    $ bbcp -N io -v -s 64 -W 16M -P 10 'tar -c .' username@host:'tar -x -C /gscratch/groupdir’

    if you have smaller data sets and non-bulk transfers then the RTT will affect performance since you will have 2*RTT for the acknowledgement (before the application sending data builds up a buffer)

    if your data chunks are very small then then this will kill performance as you never get to take advantage of the TCP window.

    3. no

    4. do not bother -- for bulk transfer we use bbcp

    5. no. it's not the 1990's -- but Firewalls and IPS devices can totally destroy the ability to use larger windows, or packet sizes.

    For our truly large bulk transfers we have special systems that sit outside our security perimeter specifically designed for bulk data transfer

    if you can't send unfragmented 9000 byte packets to the destination then that will also affect performance.

    (* I hate it when vendors talk about WANs when they mean T1 circuits)