the shortcomings of TCP file transfer

Transferring large data sets—big files, big collections of files—via inexpensive IP networks, instead of shipping tapes, discs, or film, promises to change fundamentally the economics of content production, distribution, and management. Under ideal conditions, data may be moved quickly and inexpensively using ordinary file transfer methods such as FTP, HTTP, and Windows CIFS copy. However, on real wide-area and high-speed network paths these methods' throughput collapses, failing to use more than a small fraction of available capacity. This is a consequence of the design of TCP, the underlying protocol they all rely on. New TCP stacks and new network acceleration devices are marketed to help, but they fail to fully utilize many typical wide-area network paths. Consequently, conventional FTP, and even new "acceleration" solutions, cannot provide the speed and predictability needed for global file transfers.


the TCP bottleneck in file transfer

The transmission control protocol (TCP) that provides reliable data delivery for conventional file transfer protocols has an inherent throughput bottleneck that becomes more severe with increased packet loss and latency. The bar graph shows the maximum throughput achievable under various packet loss and network latency conditions on an OC-3 (155 Mbps) link for file transfer technologies that use TCP (shown in yellow). The throughput has a hard theoretical limit that depends only on the network round-trip time (RTT) and the packet loss. Note that adding more bandwidth does not change the effective throughput. File transfer speeds do not improve and expensive bandwidth is underutilized.


The source of the throughput bottleneck is the mechanism TCP uses to regulate its data flow rate. The TCP sender requires an acknowledgment of every packet from the TCP receiver in order to inject more data into the network. When an acknowledgment is missed, TCP assumes that it is overdriving the network capacity and enters an aggressive congestion avoidance algorithm. This algorithm reduces the data flow rate severely, and recovers the rate too slowly to keep modern pipes full. Even small variation in round-trip latency or bit errors due to the network media can cause TCP to enter congestion avoidance. Standard TCP is not equipped to distinguish these sources of packet loss from true link congestion. The consequences to file transfer are dramatic.


consequences

TCP file transfers are slow and bandwidth utilization of single file transfers is poor. In local or campus area networks, where packet loss and latency are small but non-negligible (0.1%/10ms), the maximum TCP throughput is 50 Mbps. Typical file transfer rates are lower, 20-40 Mbps (with TCP stack tuning on the endpoints) on gigabit ethernet. Because standard TCP halves its throughput in response to a single packet loss event, at high speeds, even a low loss percentage significantly lowers TCP throughput. Even with an abundance of bandwidth, transfer times are disappointing and expensive bandwidth is underutilized.

The bandwidth utilization problem compounds on wide area links where increased network latency combines with packet loss. A typical FTP transfer across the United States has a maximum theoretical limit of 1.7 megabits per second (Mbps), the maximum throughput of a single TCP stream for 90ms latency and 1% loss, independent of link bandwidth. On typical intercontinental links or satellite networks, the effective file transfer throughput may be as low as 0.1% to 10% of available bandwidth. On a typical global link (3%/150ms), maximum TCP throughput degrades to 500-600 kilobits per second, 5% of a 10 Mbps link. Sometimes network engineers attempt to improve the throughput by "tuning" the operating system parameters used by the TCP networking stack on the file transfer endpoints or applying a TCP acceleration device. While this technique boosts throughput on clean networks, the improvement vanishes when real packet loss due to channel characteristics or network congestion increases.

TCP file transfers over difficult networks (with high packet loss or variable latency) are extremely slow and unreliable. TCP does not distinguish packet losses due to network congestion from normal latency variations or bit errors on some physical channels such as satellite links and wireless LANs, and severely self-throttles. FTP throughput over good satellite conditions is 100 kbps and degrades by more than half during high error periods such as rain fade. Large transfers can be extremely slow and may not complete.

TCP file transfer rates and times are unpredictable. As a window-based protocol, TCP can only determine its optimal rate through feedback from the network. TCP overdrives the network until packets are dropped by intervening routers, and in the best case, oscillates around its optimal rate, causing instabilities in the network for file transfer and other applications. Over commodity Internet links where traffic loads vary, file transfer rates may vary widely with network load. File transfers slow down and may exceed the allotted time window, missing business-critical deadlines and requiring costly transfer oversight and redundancy. TCP acceleration devices may improve throughput and smooth the transfer rate when links are clean, but are also window-based and subject to unpredictable back off.

Security, monitoring, and logging are deferred to external systems. In addition to suffering from TCP's throughput bottleneck and unpredictable flow rate, conventional file transfer protocols do not meet the security and manageability requirements of most business applications. FTP may require external security mechanisms to prevent content piracy or tampering. Also network performance details and transfer statistics are not available to the transfer operator for monitoring or billing.