Shards, Replicas, and The Cluster
[!NOTE] This module explores the core principles of Shards, Replicas, and The Cluster, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Hook: Infinite Scalability?
Relational DBs scale Vertically (Bigger machine). Elasticsearch scales Horizontally (More machines). How? By chopping your data into pieces called Shards.
2. The Hierarchy
- Cluster: The collection of all servers.
- Node: A single server (JVM process).
- Index: A logical namespace (e.g.,
users). - Shard: A self-contained Search Engine (A full Lucene Instance).
- Segment: An immutable file on disk holding the Inverted Index.
Key Insight: An “Index” is just a logical grouping. The Shard is where the work happens.
3. Shards & Replicas
Primary Shards (The Scale)
- You split
usersinto 5 Primary Shards (P0, P1, P2, P3, P4). - Each shard holds 20% of the data.
- You can put each shard on a different node.
- Result: 5x Parallelism.
Replica Shards (The Safety)
- Each Primary gets 1 Replica (R0, R1 …).
- R0 is an exact copy of P0.
- Purpose:
- High Availability: If Node 1 dies (holding P0), Node 2 (holding R0) promotes R0 \to P0. Zero downtime.
- Read Throughput: You can search P0 OR R0. Doubling replicas = Doubling read capacity.
4. Interactive: Cluster Visualizer
Scenario: A 3-Node Cluster with Index logs (3 Primaries, 1 Replica).
Kill a node to see Self-Healing.
Add a node to see Rebalancing.
5. The Hardware Reality
A Shard is not just a process. It is a directory on disk. Inside that directory are Lucene Segments.
- Segment: A mini-inverted index.
- Search: To search a Shard, we search every segment in parallel and merge results.
- Merge: Background processes constantly merge small segments into big ones (to reduce file handle usage).
Latency Tip: Too many small segments = Slow Search. Force Merge is your friend (for cold data).