Real-Time Strategies
Here’s an engineering challenge that sounds simple but hides enormous complexity: how do you tell 2 billion users that they have a new message? In 2014, WhatsApp handled this for 450 million users with just 50 engineers. In 2022, Robinhood’s stock ticker needed to push real-time price updates to 10+ million users simultaneously without draining their mobile batteries. The protocol choices these teams made — and the ones they rejected — define how modern real-time systems are built.
Every real-time feature starts with the same question: who initiates the conversation, the client (Pull) or the server (Push)? The answer determines your infrastructure cost, mobile battery consumption, and scalability ceiling.
[!IMPORTANT] In this lesson, you will master:
- Latency vs. Efficiency: Why WebSockets trade server memory for client-side responsiveness — and when that trade is worth it.
- Hybrid Real-Time: Using “Signal vs. Content” patterns (like WhatsApp) to scale to billions without stateful connection overhead.
- Physical Constraints: Managing CPU context switching and interrupt overhead in high-traffic polling systems.
The web was originally built for “Request-Response” (Pull). But modern apps (Uber, WhatsApp, Robinhood) need “Push”. How does the server tell the client that something changed?
1. Short Polling (The “Are we there yet?” approach)
The client repeats the request at a fixed interval (e.g., every 5 seconds).
- Flow:
- Client: “Any new messages?”
- Server: “No.”
- (Wait 5s)
- Client: “Any new messages?”
- Server: “Yes, here is one.”
- Pros: Simple. Stateless. Works everywhere.
- Cons: High Latency (up to interval time). Wasted Resources (empty responses). Battery Drain (radio wakes up constantly).
[!NOTE] Hardware-First Intuition: Polling is an “Interrupt Storm” for your servers. Every request requires the NIC to interrupt the CPU, which then context-switches from your application to the Kernel stack. At scale (100k+ req/sec), the CPU spends more time switching contexts than actually processing logic. Persistent connections (WS) allow the CPU to stay “hot” on the socket data longer.
2. Long Polling (Hanging GET)
The client sends a request, and the server holds it open until data is available or a timeout occurs. This technique is often called Long Polling.
Long Polling Flow
Unlike Short Polling, the connection stays open.
- Pros: Lower latency than short polling.
- Cons: Connection Overhead (Header parsing for every event). Still not true bi-directional.
Interactive Visualizer: Bandwidth Overhead
Compare the data transmitted for 100 messages.
- Polling: Sends HTTP Headers (Cookie, User-Agent) ~800 bytes every single time.
- WebSocket: Sends Headers once (Handshake), then tiny frames (2 bytes overhead).
3. WebSockets (Full Duplex)
A persistent, bi-directional communication channel over a single TCP connection.
Starts as HTTP, then “Upgrades” to WebSocket protocol (ws:// or wss://).
- Pros: Lowest Latency. Full Duplex (Client and Server can talk anytime). Low overhead (no headers after handshake).
- Cons: Stateful (Server must remember connection). Harder to scale (requires Redis Pub/Sub to broadcast across servers). Firewalls sometimes block non-80 ports.
3.1 The Liveness Trap: Heartbeats
Why can’t we just rely on TCP to tell us if a connection is dead?
- Ghost Connections: If a user’s phone dies or enters a tunnel, the server might think the connection is “Up” for minutes (until TCP timeout).
- The Fix: Application-Level Heartbeats. The server sends a PING frame every 30 seconds; if the client doesn’t PONG back, the server kills the socket and frees up RAM immediately.
Interactive Visualizer: Protocol Racer (Chat App Simulation)
See the difference in user experience and device efficiency. Scenario: You are receiving chat messages.
- Polling: Observe the “Lag” and the constant packet requests (battery drain).
- WebSockets: Instant delivery with minimal traffic.
4. Server-Sent Events (SSE)
Standard HTTP connection where the server pushes updates to the client.
- Pros: Simple (Standard HTTP). Built-in reconnection logic. Good for “One-way” feeds (Stock Ticker, News).
- Cons: Unidirectional (Server → Client only). Limit on open connections per browser (HTTP/1.1 limit is 6, HTTP/2 fixes this). This is SSE.
4.1 SSE Deep Dive: The Protocol
SSE isn’t just “Server Push”; it’s a specific MIME type: text/event-stream.
Standardized Headers:
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
Retrying naturally: If an SSE connection drops, the browser automatically tries to reconnect after a delay (usually 3 seconds). It sends a Last-Event-ID header so the server can resume precisely where it left off.
5. Modern Protocols: HTTP/3 & WebTransport
The future of real-time is moving beyond TCP.
5.1 The Problem with TCP: Head-of-Line Blocking
TCP treats data as a single ordered stream. If packet #50 is lost, packets #51-100 must wait until #50 is retransmitted, even if they are already arrived. This causes Jitter in real-time apps.
5.2 HTTP/3 (QUIC)
Built on top of UDP instead of TCP.
- Solves Head-of-Line Blocking: In HTTP/3, streams are independent. If Stream A loses a packet, Stream B keeps flowing.
5.3 WebTransport
A new API (successor to WebSockets) built on HTTP/3.
- Features: Supports both reliable (streams) and unreliable (datagrams) transmission.
- Use Case: Great for Gaming (where you care about latest state, not all states) and Live Streaming.
Code Example: WebTransport in Action
// Connect
const transport = new WebTransport('https://game.example.com');
await transport.ready;
// Send Unreliable Datagram (Fast, low latency)
const writer = transport.datagrams.writable.getWriter();
const data = new Uint8Array([10, 20, 30]); // Player coordinates
writer.write(data);
// Receive Reliable Stream (Chat messages)
const reader = transport.incomingUnidirectionalStreams.getReader();
const stream = await reader.read();
// ... process stream ...
5.4 WebRTC Data Channels (True P2P)
For sub-100ms latency without server hops, we use WebRTC Data Channels.
- Architecture: Unlike WebSockets (Client-Server), WebRTC is Peer-to-Peer.
- Signaling: Peers still need a server to exchange “Handshakes” (SDP/ICE), but once connected, data flows directly between browsers.
- Use Case: Video conferencing (Zoom/Meet), real-time file sharing, and high-performance gaming.
6. Deep Dive: Connection Draining
What happens when you deploy new code to your WebSocket server?
- Stateless (REST): Easy. Wait for the current request to finish (50ms), then kill the server.
- Stateful (WebSocket): Hard. Users might be connected for hours.
The Strategy: Connection Draining
- Mark as Draining: Tell the Load Balancer “Don’t send new users here”.
- Wait: Allow existing connections to close naturally (or set a timeout, e.g., 1 hour).
- Force Close: If they are still there after the timeout, send a
GOAWAYframe (or equivalent) telling the client to reconnect elsewhere. - Terminate: Now it’s safe to restart the server.
[!WARNING] War Story: The Thundering Herd at Discord If you don’t drain properly, you will cause a Thundering Herd. When millions of users are connected and a server restarts without draining, they all disconnect at once and immediately try to reconnect to the remaining servers, DDOSing your own infrastructure. Discord famously experienced this when a single backend failover caused a massive wave of WebSocket reconnections that overwhelmed their gateway servers, forcing them to implement aggressive backoff and jitter algorithms to recover.
7. Case Study: Hybrid Approach at WhatsApp
[!NOTE] WhatsApp’s architectural secret: the Signal channel is stateful and persistent (XMPP/Erlang), but carries almost no data. The Content pull is stateless HTTP. This separation is what enables extreme efficiency.
7.1 Process Requirements
You are designing a chat app where users expect instant delivery, but they also send heavy media (Images, Videos). Most users are idle 99% of the time, so maintaining active connections for everyone is inefficient.
7.2 Estimate
- WebSockets: Great for instant text, but expensive to maintain statefully on the server (RAM usage for TCP buffers).
- Polling: Too slow and drains battery.
7.3 Data Model
The system distinguishes between “Signal” (a tiny 20-byte packet containing "New Msg: ID 123") and “Content” (the actual message payload or media blob).
7.4 Architecture
WhatsApp uses a custom protocol (based on XMPP, optimized with Erlang) for a lightweight “Signal” channel.
- The Signal: Server sends a tiny packet to the client’s persistent socket.
- The Wake Up: The App (running in background) wakes up.
- The Pull: The App initiates a standard HTTP/2 (or HTTP/3)
GET /messages/123.
7.5 Localized Details
- Optimized for Media: HTTP is better for large files (CDNs, Range Requests, Resumability). WebSockets are bad for large binary blobs (Head-of-Line blocking in TCP, though QUIC fixes this).
- Push Notification Architecture (APNs/FCM): When the app is completely closed, neither WebSockets nor SSE work. The system relies on OS-level Push Notification services (APNs for Apple, FCM for Google). The OS wakes up the phone and shows the notification.
7.6 Scale
- Battery Efficiency: The “Signal” channel is extremely quiet. It only pings when necessary.
- Server Load: The signaling servers handle millions of idle connections (low RAM). The content servers handle the heavy lifting but only for active users.
7.7 Result
This architecture allows WhatsApp to run with a surprisingly small number of engineers and servers relative to its user base.
8. Summary
- Short Polling: Good for prototypes. Avoid in production.
- Long Polling: Good fallback when WebSockets are blocked by firewalls.
- SSE: Best for Feeds (Twitter timeline, Sports scores). One-way only. HTTP-native.
- WebSockets: Best for Chat and Gaming. Stateful — requires Redis Pub/Sub.
- WebTransport: The future for low-latency, unreliable streams (Live Video, Gaming).
Mnemonic: “Some Long Snakes Wander West” (Short, Long, SSE, WebSockets, WebTransport) — in order of increasing complexity and capability.
Staff Engineer Tip: Use SSE Before Defaulting to WebSockets. Most real-time features are one-directional (server → client): dashboards, live feeds, progress bars, notifications. For these, SSE is strictly better because: (1) it’s vanilla HTTP — works through any proxy/firewall. (2) Browsers auto-reconnect on dropout. (3) No stateful connection management required. Only upgrade to WebSockets when you genuinely need bidirectional communication (live chat, collaborative editing, multiplayer games). The WebSocket scaling tax — Redis Pub/Sub, sticky sessions, FD limits — is real and expensive.