Design Slack/Discord (Real-Time Messaging)

[!NOTE] This module explores the core principles of Design Slack/Discord (Real-Time Messaging), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. What is a Real-Time Messaging System?

Building a chat app for 10 users is easy: INSERT into a database and SELECT * every second. Building Slack (Enterprise) or Discord (Gaming Communities) for 10 Million concurrent users is a Distributed Systems masterpiece.

The challenge is not just storing messages; it’s Synchronization.

  • Real-Time: When I type “Hello”, 50,000 people in the #general channel must see it in < 50ms.
  • Presence: Knowing exactly who is “Online”, “Idle”, or “Typing…” among millions of users.
  • Statefulness: Unlike a REST API, the server must maintain a persistent TCP connection (WebSocket) with the client.

[!TIP] Real-World Examples:

  • Slack: Workplace communication (High reliability, structured channels).
  • Discord: Voice/Text for communities (Massive scale, ephemeral voice channels).
  • WhatsApp: Mobile-first, End-to-End Encryption (different architecture, usually Long Polling/Push).

2. Requirements & Goals

2.1 Functional Requirements

  1. 1-on-1 & Group Chat: Send/Receive messages instantly.
  2. Channels: Support for large channels (e.g., Discord servers with 500k members).
  3. Presence: Show Online/Offline status in real-time.
  4. History: Infinite scroll of message history.
  5. Multi-Device: Sync state between Phone and Laptop.

2.2 Non-Functional Requirements

  1. Low Latency: Message delivery < 50ms (within the same region).
  2. High Availability: 99.99%. Chat is often business-critical.
  3. Scalability: Handle 10 Million concurrent connections.
  4. Consistency: Messages must appear in the correct order (Total Ordering within a channel).

3. Capacity Estimation

3.1 Traffic Analysis

  • DAU: 20 Million.
  • Concurrent Users: 10 Million (Peak).
  • Messages: 50 msg/user/day → 1 Billion msg/day.
  • Write QPS: 109 / 86400 ≈ 11,500 msg/sec.
  • Peak QPS: 5x Average → ~60,000 msg/sec.

3.2 Bandwidth & Storage

  • Avg Message Size: 100 Bytes.
  • Ingress Bandwidth: 60k × 100 Bytes = 6 MB/s (Trivial).
  • Egress Bandwidth (Fanout):
  • If a user posts to a channel with 10k online users: 100 Bytes × 10,000 = 1 MB for a single message.
  • This Fanout is the bottleneck.
  • Storage: 1 Billion msg/day × 100 Bytes = 100 GB/day.
  • 5 Years: 100 GB × 365 × 5 ≈ 180 TB.
  • Conclusion: We need a sharded NoSQL store (Cassandra/ScyllaDB) for history.

4. System APIs

We use a hybrid approach: REST for actions (Login, Join Channel, Upload File) and WebSockets for real-time events.

Method Endpoint Description
POST /v1/login Authenticates user, returns auth_token and gateway_url.
POST /v1/channels/{id}/messages Sends a message. Payload: { content: "Hello" }
GET /v1/channels/{id}/history Fetches old messages. Params: before_id=...
WS /gateway WebSocket Handshake. Params: token=...

5. Database Design

5.1 Cassandra (Message History)

We need massive write throughput and range queries (get messages by time).

  • Partition Key: channel_id (Groups all messages for a channel together).
  • Clustering Key: message_id (Snowflake ID, time-sorted).
CREATE TABLE channel_messages (
  channel_id BIGINT,
  message_id BIGINT,
  user_id BIGINT,
  content TEXT,
  created_at TIMESTAMP,
  PRIMARY KEY (channel_id, message_id DESC)
);

5.2 Redis (State & Presence)

  • User Session: user:{id}:gateway10.0.0.5 (Which server holds the TCP connection?)
  • Presence: user:{id}:statusonline (TTL 30s, refreshed by heartbeat).

6. High-Level Architecture

We move from “Request-Response” to a Stateful Gateway Architecture.

System Architecture: Real-Time Chat
Stateful Gateway | Redis Pub/Sub Fanout | Cassandra History
WebSocket Path
Pub/Sub Fanout
Persistence Path
User Devices
Gateway Cluster
Backend Services
👤
User A
(Sender)
👥
Users B, C, D
(Receivers)
WS Gateways
Gateway 1
Holds User A
Gateway 2
Holds Users B, C, D
Chat Service
Orchestrator
Redis Pub/Sub
Channel Fanout
Cassandra
Message Logs
Service Discovery
ZooKeeper / Etcd
1. Send (WS) 2. RPC 3. Write 4. PUB channel:123 5. SUB (Notify GW 2) 6. Push (WS)

7. Component Design (Deep Dive)

7.1 Gateway Aggregation

A user might belong to 100 channels. If we subscribe the User’s Gateway connection to 100 Redis channels, Redis will be overwhelmed by the number of subscriptions.

  • Naive Approach: 10M Users × 100 Channels = 1 Billion Redis Subscriptions. Too slow.
  • Optimized Approach: The Gateway subscribes to Redis channels, not the user.
  • If User A (on GW-1) and User B (on GW-1) are both in #general, GW-1 subscribes to #general once.
  • When GW-1 receives a message for #general, it looks up its local Channel &rarr; [Socket] map and fans out locally in memory.

7.2 Presence (Heartbeats)

Presence is a “Heavy Write” problem. 10M users sending “I’m alive” every 5 seconds = 2M writes/sec.

  • Optimization: Do not write to DB on every heartbeat.
    1. Client: Sends heartbeat to Gateway (WebSocket Ping).
    2. Gateway: Holds state in memory. Only updates Redis if status changes or TTL is about to expire (e.g., every 30s).
    3. Redis: Keys expire automatically (SETEX user:1:status 40 "online"). If Gateway crashes, key expires, user appears offline.

8. Data Partitioning & Sharding

8.1 Sharding Messages (Cassandra)

We shard by channel_id.

  • Pros: All messages for a channel are on the same node. Reading history is one disk seek.
  • Cons: The Celebrity Problem. If #general has 1B messages, the partition gets too big.
  • Fix: Bucket the partition by time. Partition Key = (channel_id, month_year).

8.2 Service Discovery

How does User A know to connect to Gateway-52?

  • Consistent Hashing: hash(user_id) % N_Gateways.
  • Problem: If we add gateways, connections break.
  • Service Discovery (ZooKeeper/Etcd): Gateways register themselves. The Load Balancer asks ZK for an available node and assigns it to the user.

9. Reliability, Caching, & Load Balancing

9.1 The “Unread Count” badge

Calculating unread counts (SELECT count(*) WHERE id > last_read_id) is expensive.

  • Optimization: Store unread_count in Redis. Increment it when a message arrives. Reset to 0 when user opens the channel.

9.2 Mobile Push Notifications

If the WebSocket is disconnected (App closed), the Gateway cannot push.

  • Fallback: The Notification Service detects the missing WebSocket connection and sends a payload to APNS (iOS) or FCM (Android).

10. Interactive Decision Visualizer: Pub/Sub Propagation

Visualize how a single message fans out through Redis to multiple Gateways and Users.

Pub/Sub Propagation Simulator

Trace a message from Alice to Bob & Charlie

👩
Alice
GW 1
Redis Pub/Sub
GW 2
👨
Bob
👴
Charlie
Ready to send.

11. Interview Gauntlet

Q1: How do you handle “Typing…” indicators?

  • Answer: Typing indicators are ephemeral. Do not store them in the DB. Use a lightweight Redis Pub/Sub channel. Use “Debouncing” on the client to send a signal only once every 2 seconds while typing, not on every keystroke.

Q2: What happens if a user is in 500 channels? Do they keep 500 WebSocket connections?

  • Answer: No. One WebSocket connection per device. The Gateway multiplexes messages from all 500 channels down that single pipe.

Q3: How do you sync messages across devices (Phone + Laptop)?

  • Answer: Each device has a unique device_id. When a message is sent, the server pushes it to all device_ids associated with the user_id (except the sender).

Q4: Why not use HTTP Long Polling?

  • Answer: Long polling is inefficient for chat because of the header overhead and latency in re-establishing connections. WebSockets are preferred for bi-directional, low-latency comms.

Q5: How do you sort messages if two people send at the exact same millisecond?

  • Answer: Use Snowflake IDs (Twitter’s ID generator) which are roughly time-ordered. If timestamps are identical, sort by worker_id or sequence_id embedded in the Snowflake.

12. Summary: The Whiteboard Strategy

1. Requirements

  • Func: Chat, History, Presence.
  • Scale: 10M Concurrent, < 50ms Latency.

2. Architecture

[Client] <-> [Gateway] <-> [Redis] | [Cassandra] (History)

* Gateway: Stateful WebSocket holder. * Redis: Pub/Sub for routing.

3. Data & API

WS /gateway → Connect Cassandra: (channel_id, message_id)

4. Deep Dives

  • Fanout: Gateway subscribes, not User.
  • Presence: Heartbeats to Redis with TTL.