Citus: Turning Postgres into a Cluster
Standard Postgres scales vertically (bigger CPU, more RAM). But eventually, you hit the limits of a single machine (e.g., 128 cores, 4TB RAM). Moreover, vertical scaling becomes disproportionately expensive at the high end.
Citus allows Postgres to scale horizontally by sharding your data across multiple nodes (machines), while still appearing as a single database to your application. This is essential for highly scalable SaaS architectures, multi-tenant databases, and real-time analytics dashboards.
1. The Genesis: Why Shard?
If you have 10TB of data but only 1TB of RAM, most of your queries will hit the disk (slow). If you split that 10TB across 10 machines (1TB each), the entire dataset can fit in aggregate RAM (fast). This is conceptually similar to how a group of people can carry a heavy load much more easily than one person.
The Reality of Distributed Systems Distributing data introduces complexity. The CAP Theorem implies that we must balance Consistency, Availability, and Partition Tolerance. Citus typically prioritizes Consistency and Partition Tolerance (CP), ensuring data integrity across the cluster.
The Challenge: Joins If Table A is on Node 1 and Table B is on Node 2, joining them requires sending data over the network (slow).
The Solution: Co-location
Citus ensures that related data (e.g., all orders for user_id: 123) lives on the same node. This means joins can happen locally on that node without network overhead.
2. Interactive: Sharding Simulator
See how Citus distributes rows based on a Shard Key (e.g., tenant_id).
- Coordinator: Receives the query.
- Workers: Store the actual data.
- Router: Decides which worker gets the data based on
Hash(shard_key) % num_shards.
Control Panel
Simulate inserting rows into a distributed table.
Coordinator: Top Node
Workers: Bottom Nodes
Current Distribution: Uniform
3. Architecture Patterns
A. Distributed Tables
Tables are sharded by a key. Large tables (Users, Orders, Events).
- Write: Hashed to a specific shard/node.
- Read: Routed to a single node (if filter includes shard key) or scatter-gathered (if it doesn’t).
B. Reference Tables
Small tables that are needed everywhere (Countries, Currencies, Product Categories).
- Replication: These tables are replicated to all nodes.
- Benefit: Local joins! You can join a Distributed Table (Orders) with a Reference Table (Currencies) without any network traffic.
4. Performance Comparison
| Operation | Single Node Postgres | Citus (Distributed) | Notes |
|---|---|---|---|
| Point Read | Very Fast | Fast | Citus adds slight routing overhead (Coordinator -> Worker). |
| Local Join | Very Fast | Very Fast | Citus excels when data is co-located on the same shard. |
| Cross-Shard Join | N/A | Slow | Requires pulling data across the network. Avoid if possible. |
| Aggregate Query | Slower | Very Fast | Citus pushes aggregation down to workers and parallelizes. |
| Write Throughput | Limited by disk/CPU | Highly Scalable | Writes are distributed across multiple worker nodes. |
5. Code Implementation
A. Sharding a Table
B. Multi-Tenant Query
When you run a query filtering by tenant_id, Citus routes it to a single worker node. This is the secret to scaling SaaS apps to millions of tenants.