Protocol-Level Complexities: HTTP/2, gRPC, WebSockets

A junior engineer says “We’re using HTTP/2, so we get connection reuse.” A Staff engineer asks: “What’s your load balancer doing with those multiplexed streams?”

Modern protocols like HTTP/2, gRPC, and WebSockets fundamentally change how traffic behaves under load balancing. Understanding the difference between connections, streams, and messages is critical for building low-latency services.

1. HTTP/2 Multiplexing & Head-of-Line Blocking

The Promise: One Connection, Many Streams

HTTP/1.1 requires one TCP connection per request (or uses pipelining, which has its own issues). HTTP/2 allows multiple streams over a single TCP connection.

Benefit: No connection setup overhead (3-way handshake, TLS negotiation) for every request.

The Hidden Cost: TCP Head-of-Line Blocking

Even though HTTP/2 streams are independent at the application layer, they ALL share one TCP connection. If one packet is lost, TCP stops delivering ALL streams until that packet is retransmitted.

Example:

Stream 1: Downloading a 10MB video
Stream 2: Fetching a 1KB JSON API response
If a packet from the video is lost, the JSON response is blocked until the video packet is recovered.

[!IMPORTANT] This is why QUIC (HTTP/3) moved to UDP—each QUIC stream has independent loss recovery, eliminating head-of-line blocking at the transport layer.

2. gRPC Streaming & Long-Lived Connections

Connection Pinning

gRPC clients typically open a few long-lived connections and reuse them for many RPCs (via HTTP/2 multiplexing).

Problem for Load Balancers:

L4 load balancers balance connections, not streams.
If Client A opens 1 connection to Server 1, ALL of Client A’s RPCs go to Server 1, even if Server 2 is idle.

Example: 10 clients, each with 1 connection.

Round-robin connection-level LB might send 7 connections to Server 1, 3 to Server 2.
Result: 70% of traffic goes to Server 1, even though you have 2 servers.

The Fix: Client-Side Load Balancing

gRPC clients can use the grpclb or xDS protocol to:

Query a control plane for the list of backend IPs.
Open connections to multiple backends.
Balance RPCs (not connections) across those backends.

3. WebSockets & Sticky Sessions

WebSockets are full-duplex, long-lived connections that stay open for minutes to hours. Unlike HTTP requests, they can’t be easily “rebalanced” mid-flight.

The Session Affinity Problem

If a WebSocket connection is terminated (server restart, connection draining), the client must:

Reconnect to the load balancer.
Hope to land on the same server (if session state is in memory).

Without Session Affinity: Client reconnects and lands on Server B, which has no context. The user’s shopping cart or game state is lost.

With Session Affinity (Sticky Sessions): The load balancer uses a cookie or client IP hash to route the client back to the same server.

Trade-off: Sticky sessions reduce load balancing effectiveness (you can’t evenly distribute new connections if old ones are “stuck”).

4. Interactive: Connection & Stream Load Balancer Visualizer

See how protocol choice affects load distribution.

Protocol: LB Policy:

Load Distribution

5. Protocol Selection Trade-offs

Protocol	Connection Model	Best For	Load Balancing Challenge
HTTP/1.1	One conn per request	Simple REST APIs	Many connections (resource overhead)
HTTP/2	Few long-lived conns	Low-latency APIs	L4 LBs can’t balance streams
gRPC	Few long-lived conns	Microservices (RPC)	Requires client-side LB or L7 LB
WebSockets	Long-lived, bidirectional	Real-time (chat, games)	Sticky sessions required
QUIC/HTTP/3	UDP-based, per-stream recovery	Mobile, lossy networks	Still emerging support

Staff Takeaway

Protocol choice isn’t just about “features”—it fundamentally changes your operational model:

HTTP/2 requires stream-aware load balancing or client-side LB.
gRPC needs the xDS control plane or manual connection management.
WebSockets demand sticky sessions and connection draining strategies.

Understanding these nuances is the difference between theoretical “high performance” and actual p99 latency in production.