Most engineers treat the network as a “black box” that just works. Staff engineers treat it as a resource to be tuned. How you handle congestion and connection overhead determines your system’s “Floor” latency.
1. TCP Tuning: The “Slow Start” Problem
Every TCP connection starts in Slow Start. It begins with a small “Congestion Window” (CWND) and doubles it every time an acknowledgement (ACK) is received.
Staff Pitfall: Connection Overhead
If your API returns a 1KB JSON but the TCP handshake + Slow Start takes 3 Round Trips (RTTs), your “Network Latency” will be 300ms even if your database responds in 1ms.
The Fixes:
- Keep-Alives: Reuse existing connections to avoid the handshake and slow-start penalty.
- TCP Fast Open (TFO): Allows data to be sent inside the very first SYN packet.
- Kernel Tuning: Adjusting
tcp_tw_reuseand buffer sizes allows a single machine to handle 100k+ concurrent connections without running out of ports.
2. BBR: Google’s Congestion Control
Historically, TCP used “Loss-based” congestion control (like CUBIC). It assumed that if a packet was lost, the network was full, so it slashed its speed by 50%.
In the modern world (shallow buffers, high-speed fiber), packet loss doesn’t always mean congestion.
BBR (Bottleneck Bandwidth and RTT) is a “model-based” algorithm released by Google in 2016.
- How it works: It constantly measures the maximum bandwidth and the minimum RTT. It ignores random packet loss and continues at full speed as long as the RTT doesn’t spike.
- Impact: Up to 20% higher throughput and 80% lower tail latency on messy networks (like mobile/cellular).
[!TIP] Enable BBR today. Most modern Linux kernels (4.9+) support it. Adding
net.core.default_qdisc=fqandnet.ipv4.tcp_congestion_control=bbrto yoursysctl.confis one of the highest-ROI changes you can make.
3. QUIC and HTTP/3: The End of TCP?
TCP has a fundamental flaw called Head-of-Line (HoL) Blocking.
Imagine an HTTP/2 connection sending 3 files (A, B, C) over a single TCP stream.
- Packet from File A is lost.
- The OS stops delivering File B and C to the application until A is retransmitted.
- The Result: One slow packet kills the performance of the entire “multiplexed” connection.
The QUIC Solution
QUIC runs over UDP. It implements its own reliability logic but treats every “stream” (file) independently.
- HoL Fixed: If a packet for File A is lost, File B and C continue to be processed.
- 0-RTT Handshake: QUIC can resume a previous connection and send data in the first packet.
Staff Takeaway
You cannot build a high-performance system on default settings.
- TCP is reliable but “blind” to modern network physics.
- BBR is a mandatory upgrade for any high-traffic service.
- HTTP/3 isn’t just a version bump; it’s a fundamental move to UDP to bypass the architectural limits of the 1970s networking stack.