- Network
- A
Resolving low TCP data transfer speed issues: a stack-level approach
Something's wrong with the network. I used to have 4Gbps, but now it only gives 120Mbps. Did you change something recently?
Does this sound familiar? If you've done any support for production environments, you've probably heard similar complaints. Before we jump to conclusions about the causes of the problem, we need to understand what is happening at the TCP level on both hosts.
What WILL NOT be in this article
Before we begin, let me make it perfectly clear: this is not yet another explanation of TCP fundamentals. I am not going to discuss three-way handshakes, explain what SYN/ACK is, or draw state diagrams of the TCP finite state machine. There are hundreds of articles and RFCs that cover all of this well.
We will focus on the tools and metrics that will truly help you diagnose real problems.
Approach to Troubleshooting
When someone complains about slow network performance, a key point in diagnosing issues is understanding what is happening with TCP on both hosts. Modern TCP implementations exhibit remarkable resilience and can maintain good throughput even under moderate packet loss (1-2%). By examining the internal state of TCP, we can answer these crucial questions:
What actual data transfer rate is TCP achieving, and why?
Is one side limiting the throughput? Which one?
Is TCP correctly adapting to the state of the network?
Are there packet losses, and how does TCP correct for them?
Your task is to gather this information using the tools provided by Linux. This means working with two main tools: ss (socket statistics) and nstat (network statistics).
Tool #1: ss — TCP state for each socket
The ss command (which replaces the older netstat) provides detailed information about individual TCP connections. The magic happens when you use the -i (information) flag:
ss -tin dst 3.130.241.210:443
Here is the internal state of TCP for a specific connection. This is what normal data transfer looks like:
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 172.17.1.187:47128 3.130.241.210:443
cubic wscale:12,7 rto:236 rtt:35.047/19.403 ato:40 mss:1288 pmtu:1500 rcvmss:1288 advmss:1448 cwnd:10 bytes_sent:102830 bytes_acked:102831 bytes_received:169536 segs_out:2785 segs_in:2247 data_segs_out:1571 data_segs_in:1214 send 2940052bps lastsnd:7176 lastrcv:7154 lastack:7154 pacing_rate 5879976bps delivery_rate 1401584bps delivered:1572 app_limited busy:54422ms reord_seen:6 rcv_rtt:27 rcv_space:14480 rcv_ssthresh:70646 minrtt:16.109 rcv_ooopack:2 snd_wnd:77824
Are your eyes already wandering? Let's break down the key fields that you need to understand.
Understanding Window Scaling
The field wscale:12,7 (wscale: sending scaling factor, receiving scaling factor) shows the window scaling factors negotiated during connection establishment. Without scaling, TCP is limited to windows of 64KB, which significantly constrains bandwidth!
Max Throughput = Window Size / RTT
If you see wscale:0,0, it means that window scaling was not enabled on one or both sides, or it was disabled during connection establishment. Check:
cat /proc/sys/net/ipv4/tcp_window_scaling # Should be 1
With wscale:7, multiply the declared window size by 2^7 (128). For 10Gbps at an RTT of 10ms, you need a window of ~12.5MB. To understand if your window size can support the expected throughput, calculate using the formula: Required window = Throughput * RTT.
RTT: TCP's View on Network Latency
The field rtt:35.047/19.403 shows the smoothed RTT value (35.047ms) and the average deviation (19.403ms). THIS IS NOT ICMP ping — this is a measurement based on actual ACKs, whose results are used in calculating RTO, making congestion control decisions, and selecting the rate of change for transmission.
Each socket maintains its own RTT value!
What this tells us: High RTT (>100ms in local networks) or large deviation indicates latency or jitter. If you see rtt:150.5/75.2 in a network where it should be 1ms, check if the Recv-Q is full — this indicates that the receiving side is not reading data quickly enough, causing TCP to measure a higher RTT as it waits for the window to open.
RTO: Retransmission Timeout
The field rto:236 shows the current retransmission timeout (236ms), dynamically calculated based on RTT: RTO = SRTT + max(G, 4 * RTTVAR). Understanding RTO helps explain retransmission behavior—an excessively high RTO compared to the actual state of the network can cause unnecessary data retransmissions, while an inflated RTO (much higher than the current RTT) indicates the TCP recovery process after previous timeout events.
MSS and Path MTU Discovery
mss:1288 pmtu:1500 rcvmss:1288 advmss:1448
MSS (1288 bytes) and PMTU (1500 bytes) tell you about the packet sizes along the entire path.
What these values indicate:
pmtu:1500, but the connection drops after the initial data transmission: The actual MTU of the path is smaller (tunnels/VPNs create PMTU black holes)
Unusually small MSS value on standard Ethernet: An intermediate device limits the MSS value
MTU set to 9000 bytes, but small MSS message appears: Somewhere along the traffic path jumbo frames are not supported or the correct value could not be determined via PMTU
Tool #2: nstat — System-wide TCP Counters
While ss provides information on each socket, nstat displays aggregate TCP counters across all connections. This is invaluable for identifying patterns:
nstat -az | grep -i tcp
The -a flag shows absolute counter values (instead of increments since the last usage), and the -z flag shows zero-value counters. Run it again after a few seconds to see changes:
nstat -az | grep -i tcp
sleep 5
nstat -az | grep -i tcp
Retransmissions: SACK vs. Duplicate ACKs
To understand whether packet retransmissions are occurring, pay attention to the key counters in the output of the nstat command, which are irrefutable evidence of packet loss.
Key counters:
TcpRetransSegs — Total number of retransmitted segments
TcpExtTCPSACKRecovery — Fast retransmission via SACK (modern, efficient method)
TcpExtTCPSACKReorder — Fast reordering of received packets
TcpExtTCPRenoRecovery — Fast retransmission using duplicate acknowledgments (outdated, less efficient method)
TcpExtTCPTimeouts — Total RTO expiration (leads to slow start, severely reduces throughput)
Understanding the difference: SACK informs the sender which specific packets were lost, allowing for selective retransmission. Without SACK, TCP relies on three repeated ACKs and may unnecessarily retransmit data. Check the SACK status: cat /proc/sys/net/ipv4/tcp_sack (should be 1).
Practical example:
TcpRetransSegs 245
TcpExtTCPSACKRecovery 12
TcpExtTCPSACKReorder 243
TcpExtTCPTimeouts 2
Here we see 245 retransmitted segments with 12 SACK recovery events (indicating that TCP is effectively handling packet loss) and 2 total timeouts. Additionally, there were 243 reordering events recorded.
Although SACK generally handles fast recovery of lost data well, any SACK activity indicates suboptimal flow transmission of network traffic. A large number of SACK recovery events and retransmitted data segments is a key indicator of performance issues, as well as helping to determine further troubleshooting steps.
Another case that TCP must handle is segment reordering, which can lead to reduced data transmission performance. A high level of such events indicates delivery of a significant number of packets in the wrong order, requiring the receiver to buffer and reorder packets before the data is available to the application. This results in higher latency and lower throughput.
Summary: Diagnostic Algorithm
When someone reports low TCP performance, here is my systematic approach to understanding what is happening:
# While data transfer is ongoing
ss -tin dst
Let's look at:
Is window scaling enabled and is it adequate? (the value of
wscaleshould not be 0,0)What RTT value did TCP measure?
Do the MSS/PMTU values correspond to the traffic path?
Is there any retransmission happening? (field
retrans)Are Send-Q or Recv-Q consistently full? (indicates which side is limiting throughput)
nstat -az | grep -i tcp; sleep 10; nstat -az | grep -i tcp
Let's look at:
TcpRetransSegs rate (shows the frequency of retransmission)
TcpExtTCPTimeouts (indicates serious packet loss or delays)
TcpExtTCPSACKRecovery vs TcpExtTCPRenoRecovery (which recovery mechanism is active)
TcpExtTCPSACKReorder (shows packets received out of order)
TcpExtTCPSACKReneging (if the value is not zero, the receiver behaves inconsistently)
Only after you understand how TCP perceives the connection should you proceed to packet capture. TCP stack statistics usually tell you everything you need to know.
General patterns and what they tell us
Case: Briefly high throughput, then drop to nearly zero
Check: constant fullness of Recv-Q
Interpretation: The receiving application cannot read data fast enough; TCP flow control limits the sender's speed
Case: Throughput limited regardless of bandwidth
Check:
wscale:0,0in the output ofssInterpretation: Window scaling is disabled; TCP window size is limited to 64KB, throughput is constrained based on RTT
Case: Many retransmissions, but ping shows minimal loss.
Check: high value of TcpExtTCPTimeouts, RTO much greater than current RTT value
Interpretation: Previous timeout events inflated RTO; TCP operates in a conservative mode during its recovery
Case: Data transfer hangs after sending part of the data
Check: MSS/PMTU values, connection stops responding
Interpretation: PMTU black hole — packets larger than the actual MTU of the path are silently dropped
Case: retransmission predominantly through TcpExtTCPRenoRecovery
Check: SACK is disabled (
tcp_sack=0)Interpretation: TCP uses an older recovery mechanism with duplicate ACKs, which is less efficient than SACK in the case of multiple losses
Packet Capture Note
I intentionally did not cover packet capture in this article because it deserves a separate detailed analysis. Tools like tcpdump and Wireshark are incredibly powerful, but they also consume a lot of time and generate huge amounts of data. In my experience, most TCP performance issues can be diagnosed using just ss and nstat.
However, there are cases where packet capture is absolutely necessary—especially when you suspect interference from an intermediary device—you need to check congestion control behavior or you want to see the exact timing of events during connection establishment. I will discuss practical packet capture analysis for troubleshooting TCP in a future article.
Conclusion
The next time someone reports low TCP throughput, start by understanding what is happening with TCP on both hosts. Use ss to analyze the state of each socket and nstat to observe overall system patterns. Pay attention to window scaling, RTT measurements, RTO values, MSS/PMTU parameters, and retransmission occurrences.
These tools provide direct insight into the TCP decision-making process and help answer important questions:
What throughput is TCP achieving?
Which side is limiting the bandwidth?
How does TCP adapt to network conditions?
Is packet loss being handled effectively?
Understanding the internal state of TCP helps systematically diagnose issues and explains what is happening "under the hood." Sometimes the explanation points to the need for changes in certain settings, sometimes it reveals application behavior characteristics that require attention, and sometimes it shows that TCP is working exactly as it should given the network conditions.
The goal is not to find a culprit but to understand what is happening so that you can reasonably agree on the next steps to resolve the issue.
What is your main approach to resolving TCP performance issues? Have you found other useful TCP metrics? Let me know in your comments.
Links
Man Pages
ss(8) - Linux manual page https://man7.org/linux/man-pages/man8/ss.8.html Socket statistics utility - part of the iproute2 package
nstat(8) - Linux manual page https://man7.org/linux/man-pages/man8/nstat.8.html Network statistics tool for monitoring kernel SNMP counters
IETF RFCs
RFC 6298 - Computing TCP's Retransmission Timer https://www.rfc-editor.org/rfc/rfc6298.html V. Paxson, M. Allman, J. Chu, M. Sargent (June 2011) Defines the standard algorithm for calculating TCP's RTO
RFC 7323 - TCP Extensions for High Performance https://www.rfc-editor.org/rfc/rfc7323.html D. Borman, B. Braden, V. Jacobson, R. Scheffenegger (September 2014) Specifies TCP window scaling and timestamp parameters
RFC 2018 - TCP Selective Acknowledgment Options https://www.rfc-editor.org/rfc/rfc2018.html M. Mathis, J. Mahdavi, S. Floyd, A. Romanow (October 1996) Defines the SACK mechanism for TCP
RFC 1191 - Path MTU Discovery https://www.rfc-editor.org/rfc/rfc1191.html J. Mogul, S. Deering (November 1990) Describes a method for dynamically determining path MTU
Write comment