Resolving low TCP data transfer speed issues: a stack-level approach

Something's wrong with the network. I used to have 4Gbps, but now it only gives 120Mbps. Did you change something recently?

Does this sound familiar? If you've done any support for production environments, you've probably heard similar complaints. Before we jump to conclusions about the causes of the problem, we need to understand what is happening at the TCP level on both hosts.

What WILL NOT be in this article

Before we begin, let me make it perfectly clear: this is not yet another explanation of TCP fundamentals. I am not going to discuss three-way handshakes, explain what SYN/ACK is, or draw state diagrams of the TCP finite state machine. There are hundreds of articles and RFCs that cover all of this well.

We will focus on the tools and metrics that will truly help you diagnose real problems.

Approach to Troubleshooting

When someone complains about slow network performance, a key point in diagnosing issues is understanding what is happening with TCP on both hosts. Modern TCP implementations exhibit remarkable resilience and can maintain good throughput even under moderate packet loss (1-2%). By examining the internal state of TCP, we can answer these crucial questions:

  • What actual data transfer rate is TCP achieving, and why?

  • Is one side limiting the throughput? Which one?

  • Is TCP correctly adapting to the state of the network?

  • Are there packet losses, and how does TCP correct for them?

Your task is to gather this information using the tools provided by Linux. This means working with two main tools: ss (socket statistics) and nstat (network statistics).

Tool #1: ss — TCP state for each socket

The ss command (which replaces the older netstat) provides detailed information about individual TCP connections. The magic happens when you use the -i (information) flag:

ss -tin dst 3.130.241.210:443        

Here is the internal state of TCP for a specific connection. This is what normal data transfer looks like:

State                Recv-Q                Send-Q                                Local Address:Port                                   Peer Address:Port                Process                
ESTAB                0                     0                                      172.17.1.187:47128                                 3.130.241.210:443                 
	 cubic wscale:12,7 rto:236 rtt:35.047/19.403 ato:40 mss:1288 pmtu:1500 rcvmss:1288 advmss:1448 cwnd:10 bytes_sent:102830 bytes_acked:102831 bytes_received:169536 segs_out:2785 segs_in:2247 data_segs_out:1571 data_segs_in:1214 send 2940052bps lastsnd:7176 lastrcv:7154 lastack:7154 pacing_rate 5879976bps delivery_rate 1401584bps delivered:1572 app_limited busy:54422ms reord_seen:6 rcv_rtt:27 rcv_space:14480 rcv_ssthresh:70646 minrtt:16.109 rcv_ooopack:2 snd_wnd:77824

Are your eyes already wandering? Let's break down the key fields that you need to understand.

Understanding Window Scaling

The field wscale:12,7 (wscale: sending scaling factor, receiving scaling factor) shows the window scaling factors negotiated during connection establishment. Without scaling, TCP is limited to windows of 64KB, which significantly constrains bandwidth!

Max Throughput = Window Size / RTT        

If you see wscale:0,0, it means that window scaling was not enabled on one or both sides, or it was disabled during connection establishment. Check:

cat /proc/sys/net/ipv4/tcp_window_scaling  # Should be 1        

With wscale:7, multiply the declared window size by 2^7 (128). For 10Gbps at an RTT of 10ms, you need a window of ~12.5MB. To understand if your window size can support the expected throughput, calculate using the formula: Required window = Throughput * RTT.

RTT: TCP's View on Network Latency

The field rtt:35.047/19.403 shows the smoothed RTT value (35.047ms) and the average deviation (19.403ms). THIS IS NOT ICMP ping — this is a measurement based on actual ACKs, whose results are used in calculating RTO, making congestion control decisions, and selecting the rate of change for transmission.

Each socket maintains its own RTT value!

What this tells us: High RTT (>100ms in local networks) or large deviation indicates latency or jitter. If you see rtt:150.5/75.2 in a network where it should be 1ms, check if the Recv-Q is full — this indicates that the receiving side is not reading data quickly enough, causing TCP to measure a higher RTT as it waits for the window to open.

RTO: Retransmission Timeout

The field rto:236 shows the current retransmission timeout (236ms), dynamically calculated based on RTT: RTO = SRTT + max(G, 4 * RTTVAR). Understanding RTO helps explain retransmission behavior—an excessively high RTO compared to the actual state of the network can cause unnecessary data retransmissions, while an inflated RTO (much higher than the current RTT) indicates the TCP recovery process after previous timeout events.

MSS and Path MTU Discovery

mss:1288 pmtu:1500 rcvmss:1288 advmss:1448        

MSS (1288 bytes) and PMTU (1500 bytes) tell you about the packet sizes along the entire path.

What these values indicate:

  • pmtu:1500, but the connection drops after the initial data transmission: The actual MTU of the path is smaller (tunnels/VPNs create PMTU black holes)

  • Unusually small MSS value on standard Ethernet: An intermediate device limits the MSS value

  • MTU set to 9000 bytes, but small MSS message appears: Somewhere along the traffic path jumbo frames are not supported or the correct value could not be determined via PMTU

Tool #2: nstat — System-wide TCP Counters

While ss provides information on each socket, nstat displays aggregate TCP counters across all connections. This is invaluable for identifying patterns:

nstat -az | grep -i tcp        

The -a flag shows absolute counter values (instead of increments since the last usage), and the -z flag shows zero-value counters. Run it again after a few seconds to see changes:

nstat -az | grep -i tcp
sleep 5
nstat -az | grep -i tcp        

Retransmissions: SACK vs. Duplicate ACKs

To understand whether packet retransmissions are occurring, pay attention to the key counters in the output of the nstat command, which are irrefutable evidence of packet loss.

Key counters:

  • TcpRetransSegs — Total number of retransmitted segments

  • TcpExtTCPSACKRecovery — Fast retransmission via SACK (modern, efficient method)

  • TcpExtTCPSACKReorder — Fast reordering of received packets

  • TcpExtTCPRenoRecovery — Fast retransmission using duplicate acknowledgments (outdated, less efficient method)

  • TcpExtTCPTimeouts — Total RTO expiration (leads to slow start, severely reduces throughput)

Understanding the difference: SACK informs the sender which specific packets were lost, allowing for selective retransmission. Without SACK, TCP relies on three repeated ACKs and may unnecessarily retransmit data. Check the SACK status: cat /proc/sys/net/ipv4/tcp_sack (should be 1).

Practical example:

TcpRetransSegs           245
TcpExtTCPSACKRecovery     12
TcpExtTCPSACKReorder     243
TcpExtTCPTimeouts          2        

Here we see 245 retransmitted segments with 12 SACK recovery events (indicating that TCP is effectively handling packet loss) and 2 total timeouts. Additionally, there were 243 reordering events recorded.

Although SACK generally handles fast recovery of lost data well, any SACK activity indicates suboptimal flow transmission of network traffic. A large number of SACK recovery events and retransmitted data segments is a key indicator of performance issues, as well as helping to determine further troubleshooting steps.

Another case that TCP must handle is segment reordering, which can lead to reduced data transmission performance. A high level of such events indicates delivery of a significant number of packets in the wrong order, requiring the receiver to buffer and reorder packets before the data is available to the application. This results in higher latency and lower throughput.

Summary: Diagnostic Algorithm

When someone reports low TCP performance, here is my systematic approach to understanding what is happening:

# While data transfer is ongoing
ss -tin dst         

Let's look at:

  • Is window scaling enabled and is it adequate? (the value of wscale should not be 0,0)

  • What RTT value did TCP measure?

  • Do the MSS/PMTU values correspond to the traffic path?

  • Is there any retransmission happening? (field retrans)

  • Are Send-Q or Recv-Q consistently full? (indicates which side is limiting throughput)

nstat -az | grep -i tcp; sleep 10; nstat -az | grep -i tcp        

Let's look at:

  • TcpRetransSegs rate (shows the frequency of retransmission)

  • TcpExtTCPTimeouts (indicates serious packet loss or delays)

  • TcpExtTCPSACKRecovery vs TcpExtTCPRenoRecovery (which recovery mechanism is active)

  • TcpExtTCPSACKReorder (shows packets received out of order)

  • TcpExtTCPSACKReneging (if the value is not zero, the receiver behaves inconsistently)

Only after you understand how TCP perceives the connection should you proceed to packet capture. TCP stack statistics usually tell you everything you need to know.

General patterns and what they tell us

Case: Briefly high throughput, then drop to nearly zero

  • Check: constant fullness of Recv-Q

  • Interpretation: The receiving application cannot read data fast enough; TCP flow control limits the sender's speed

Case: Throughput limited regardless of bandwidth

  • Check: wscale:0,0 in the output of ss

  • Interpretation: Window scaling is disabled; TCP window size is limited to 64KB, throughput is constrained based on RTT

Case: Many retransmissions, but ping shows minimal loss.

  • Check: high value of TcpExtTCPTimeouts, RTO much greater than current RTT value

  • Interpretation: Previous timeout events inflated RTO; TCP operates in a conservative mode during its recovery

Case: Data transfer hangs after sending part of the data

  • Check: MSS/PMTU values, connection stops responding

  • Interpretation: PMTU black hole — packets larger than the actual MTU of the path are silently dropped

Case: retransmission predominantly through TcpExtTCPRenoRecovery

  • Check: SACK is disabled (tcp_sack=0)

  • Interpretation: TCP uses an older recovery mechanism with duplicate ACKs, which is less efficient than SACK in the case of multiple losses

Packet Capture Note

I intentionally did not cover packet capture in this article because it deserves a separate detailed analysis. Tools like tcpdump and Wireshark are incredibly powerful, but they also consume a lot of time and generate huge amounts of data. In my experience, most TCP performance issues can be diagnosed using just ss and nstat.

However, there are cases where packet capture is absolutely necessary—especially when you suspect interference from an intermediary device—you need to check congestion control behavior or you want to see the exact timing of events during connection establishment. I will discuss practical packet capture analysis for troubleshooting TCP in a future article.

Conclusion

The next time someone reports low TCP throughput, start by understanding what is happening with TCP on both hosts. Use ss to analyze the state of each socket and nstat to observe overall system patterns. Pay attention to window scaling, RTT measurements, RTO values, MSS/PMTU parameters, and retransmission occurrences.

These tools provide direct insight into the TCP decision-making process and help answer important questions:

  • What throughput is TCP achieving?

  • Which side is limiting the bandwidth?

  • How does TCP adapt to network conditions?

  • Is packet loss being handled effectively?

Understanding the internal state of TCP helps systematically diagnose issues and explains what is happening "under the hood." Sometimes the explanation points to the need for changes in certain settings, sometimes it reveals application behavior characteristics that require attention, and sometimes it shows that TCP is working exactly as it should given the network conditions.

The goal is not to find a culprit but to understand what is happening so that you can reasonably agree on the next steps to resolve the issue.

What is your main approach to resolving TCP performance issues? Have you found other useful TCP metrics? Let me know in your comments.

Links

Man Pages

  1. ss(8) - Linux manual page https://man7.org/linux/man-pages/man8/ss.8.html Socket statistics utility - part of the iproute2 package

  2. nstat(8) - Linux manual page https://man7.org/linux/man-pages/man8/nstat.8.html Network statistics tool for monitoring kernel SNMP counters

IETF RFCs

  1. RFC 6298 - Computing TCP's Retransmission Timer https://www.rfc-editor.org/rfc/rfc6298.html V. Paxson, M. Allman, J. Chu, M. Sargent (June 2011) Defines the standard algorithm for calculating TCP's RTO

  2. RFC 7323 - TCP Extensions for High Performance https://www.rfc-editor.org/rfc/rfc7323.html D. Borman, B. Braden, V. Jacobson, R. Scheffenegger (September 2014) Specifies TCP window scaling and timestamp parameters

  3. RFC 2018 - TCP Selective Acknowledgment Options https://www.rfc-editor.org/rfc/rfc2018.html M. Mathis, J. Mahdavi, S. Floyd, A. Romanow (October 1996) Defines the SACK mechanism for TCP

  4. RFC 1191 - Path MTU Discovery https://www.rfc-editor.org/rfc/rfc1191.html J. Mogul, S. Deering (November 1990) Describes a method for dynamically determining path MTU

Comments