A junior engineer says “We’re using HTTP/2, so we get connection reuse.” A Staff engineer asks: “What’s your load balancer doing with those multiplexed streams?”
Modern protocols like HTTP/2, gRPC, and WebSockets fundamentally change how traffic behaves under load balancing. Understanding the difference between connections, streams, and messages is critical for building low-latency services.
1. HTTP/2 Multiplexing & Head-of-Line Blocking
The Promise: One Connection, Many Streams
HTTP/1.1 requires one TCP connection per request (or uses pipelining, which has its own issues). HTTP/2 allows multiple streams over a single TCP connection.
Benefit: No connection setup overhead (3-way handshake, TLS negotiation) for every request.
The Hidden Cost: TCP Head-of-Line Blocking
Even though HTTP/2 streams are independent at the application layer, they ALL share one TCP connection. If one packet is lost, TCP stops delivering ALL streams until that packet is retransmitted.
Example:
- Stream 1: Downloading a 10MB video
- Stream 2: Fetching a 1KB JSON API response
- If a packet from the video is lost, the JSON response is blocked until the video packet is recovered.
[!IMPORTANT] This is why QUIC (HTTP/3) moved to UDP—each QUIC stream has independent loss recovery, eliminating head-of-line blocking at the transport layer.
2. gRPC Streaming & Long-Lived Connections
Connection Pinning
gRPC clients typically open a few long-lived connections and reuse them for many RPCs (via HTTP/2 multiplexing).
Problem for Load Balancers:
- L4 load balancers balance connections, not streams.
- If Client A opens 1 connection to Server 1, ALL of Client A’s RPCs go to Server 1, even if Server 2 is idle.
Example: 10 clients, each with 1 connection.
- Round-robin connection-level LB might send 7 connections to Server 1, 3 to Server 2.
- Result: 70% of traffic goes to Server 1, even though you have 2 servers.
The Fix: Client-Side Load Balancing
gRPC clients can use the grpclb or xDS protocol to:
- Query a control plane for the list of backend IPs.
- Open connections to multiple backends.
- Balance RPCs (not connections) across those backends.
3. WebSockets & Sticky Sessions
WebSockets are full-duplex, long-lived connections that stay open for minutes to hours. Unlike HTTP requests, they can’t be easily “rebalanced” mid-flight.
The Session Affinity Problem
If a WebSocket connection is terminated (server restart, connection draining), the client must:
- Reconnect to the load balancer.
- Hope to land on the same server (if session state is in memory).
Without Session Affinity: Client reconnects and lands on Server B, which has no context. The user’s shopping cart or game state is lost.
With Session Affinity (Sticky Sessions): The load balancer uses a cookie or client IP hash to route the client back to the same server.
Trade-off: Sticky sessions reduce load balancing effectiveness (you can’t evenly distribute new connections if old ones are “stuck”).
4. Interactive: Connection & Stream Load Balancer Visualizer
See how protocol choice affects load distribution.
Load Distribution
5. Protocol Selection Trade-offs
| Protocol | Connection Model | Best For | Load Balancing Challenge |
|---|---|---|---|
| HTTP/1.1 | One conn per request | Simple REST APIs | Many connections (resource overhead) |
| HTTP/2 | Few long-lived conns | Low-latency APIs | L4 LBs can’t balance streams |
| gRPC | Few long-lived conns | Microservices (RPC) | Requires client-side LB or L7 LB |
| WebSockets | Long-lived, bidirectional | Real-time (chat, games) | Sticky sessions required |
| QUIC/HTTP/3 | UDP-based, per-stream recovery | Mobile, lossy networks | Still emerging support |
Staff Takeaway
Protocol choice isn’t just about “features”—it fundamentally changes your operational model:
- HTTP/2 requires stream-aware load balancing or client-side LB.
- gRPC needs the
xDScontrol plane or manual connection management. - WebSockets demand sticky sessions and connection draining strategies.
Understanding these nuances is the difference between theoretical “high performance” and actual p99 latency in production.