Capacity Planning: The Math of Shards

[!NOTE] This module explores the core principles of Capacity Planning: The Math of Shards, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Goldilocks Zone: 10GB - 50GB

The #1 question: “How many shards should I have?”

The Rule: Keep shard size between 10GB and 50GB.

To understand why, we must look at the physical realities of the underlying hardware and the JVM architecture:

  1. Too Small (< 1GB) (The “Oversharding” Problem):
    • Overhead: Each shard is a full Apache Lucene index. It consumes JVM heap memory for term dictionaries, thread pools for searching, and holds open file descriptors.
    • The Math: A node can typically support 20 shards per GB of heap space. If you give Elasticsearch a 30GB heap, a node shouldn’t have more than 600 shards. Having thousands of 50MB shards exhausts the heap quickly.
    • Symptom: “OOM (Out of Memory)”, frequent long Garbage Collection (GC) pauses, and “Cluster State Explosion” leading to slow master operations.
  2. Too Large (> 50GB):
    • Recovery Hell: Moving a shard across the network during a node failure or rebalancing operation takes time. A 100GB shard might take hours to transfer, even on a 10Gbps link, due to disk I/O and CPU overhead.
    • Search Latency: Searching across a massive single shard requires more memory for aggregations and caching, often leading to cache thrashing.
    • Symptom: “Cluster Red” status persists for hours after a node reboot, and p99 search latencies spike during heavy aggregation queries.

2. Hot-Warm-Cold Architecture

When dealing with time-series data (like logs or metrics), you don’t treat all data equally. Instead, you map hardware constraints to the data lifecycle.

  • Hot Nodes (NVMe SSD, High CPU): Active writes and frequent, latency-sensitive searches. Typically the last 7-14 days of data.
  • Warm Nodes (HDD/Cheap SSD): Read-only data. Queries here are infrequent and can tolerate slightly higher latencies (e.g., Day 15 to 30).
  • Cold Nodes (S3/Snapshots): “Frozen” indices for long-term compliance storage. Very slow searches, but massive cost savings.

Automating the Lifecycle (ILM)

Elasticsearch automates this transition using Index Lifecycle Management (ILM) policies. ILM handles “Rollover” (creating a new hot index when the current one reaches 50GB or 30 days) and moving older indices to cheaper nodes.

Java

// Java: Creating an ILM Policy using the official Java API Client
PutLifecycleRequest request = PutLifecycleRequest.of(b -> b
  .name("logs_policy")
  .policy(p -> p
    .phases(ph -> ph
      .hot(h -> h
        .actions(a -> a
          .rollover(r -> r
            .maxSize("50gb")
            .maxAge(Time.of(t -> t.time("30d")))
          )
        )
      )
      .warm(w -> w
        .minAge(Time.of(t -> t.time("30d")))
        .actions(a -> a
          .forcemerge(fm -> fm.maxNumSegments(1))
          .shrink(s -> s.numberOfShards(1))
          .allocate(al -> al.require(req -> req.put("data", "warm")))
        )
      )
    )
  )
);

client.ilm().putLifecycle(request);

Go

// Go: Creating an ILM Policy using the official Go client
// (Using raw JSON for the request body for simplicity in the generic client)
policyJSON := `{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "30d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          },
          "allocate": {
            "require": {
              "data": "warm"
            }
          }
        }
      }
    }
  }
}`

req := esapi.ILMPutLifecycleRequest{
  Policy: "logs_policy",
  Body:   strings.NewReader(policyJSON),
}

res, err := req.Do(context.Background(), esClient)
if err != nil {
  log.Fatalf("Error creating ILM policy: %s", err)
}
defer res.Body.Close()

3. Interactive: Capacity Calculator

How many nodes do you need? Formula: Total Data * (1 + Replicas) * 1.2 (Overhead)

Total Storage Needed

0 TB

Total Primary Shards (Daily)

0
Aim for ~2-3 shards/day (40GB each)

Nodes Required

0

4. The 85% Watermark

Elasticsearch monitors disk usage to prevent nodes from running out of space.

  • Low Watermark (85%): Elasticsearch stops allocating new shards to the node.
  • High Watermark (90%): Elasticsearch proactively attempts to move existing shards away from this node to others.
  • Flood Stage Watermark (95%): Crisis mode. Elasticsearch enforces a read_only_allow_delete block on every index that has a shard on this node. Your application will start receiving HTTP 403 Forbidden errors on writes.

Lesson: Always provision a 15-20% free space buffer. The usable space on a 1TB drive is functionally only ~850GB before cluster operations degrade.