Sharding: Horizontal Scaling

Replication solves high availability, but it doesn’t solve capacity. If your dataset grows to 10TB or your write throughput exceeds what a single CPU can handle, you need Sharding.

1. Sharding Architecture

Sharding splits data across multiple machines. In MongoDB, a sharded cluster consists of three components:

  1. Shards: The data servers. Each shard is a Replica Set that holds a subset of the data.
  2. Config Servers: Store the cluster’s metadata (which chunk lives on which shard). This is the “brain” of the cluster.
  3. Mongos: The query router. Application clients connect to mongos, not the shards directly. It consults the Config Servers to route queries to the correct shard.
App Mongos Router Config Servers Metadata Shard 1 Replica Set Data: A-M Shard 2 Replica Set Data: N-Z Metadata Check

2. The Shard Key

The most critical decision in a sharded cluster is selecting the Shard Key. This key determines how data is distributed.

Ranged Sharding

Partitions data based on ranges of shard key values (e.g., user_id 1-1000, 1001-2000).

  • Pros: Efficient for range queries (find({x: {$gt: 10}})).
  • Cons: Can lead to uneven distribution if data is inserted sequentially (e.g., timestamps).

Hashed Sharding

Computes a hash of the shard key and uses the hash to determine the chunk.

  • Pros: Ensures even distribution of data.
  • Cons: Range queries are inefficient because they must be broadcast to all shards.

Interactive: Shard Key Visualizer

See how different keys affect data distribution in real-time.

Shard 1
Shard 2
Shard 3
Select a strategy above to simulate writes...

3. Chunks and Balancing

MongoDB partitions data into Chunks.

  • Default chunk size is 64MB.
  • When a chunk grows too large, MongoDB splits it.
  • The Balancer migrates chunks between shards to keep the cluster even.

The Jumbo Chunk Problem

If a single shard key value appears so frequently that its data exceeds the chunk size (64MB), MongoDB cannot split the chunk because splitting requires distinct shard key values.

[!WARNING] Jumbo Chunk: An unmovable, unsplittable chunk that can cause significant imbalance. Fix: Improve Shard Key cardinality (add more randomness or fields).

Chunk A 30MB Chunk B 60MB JUMBO CHUNK Key: "USA" 250MB (Cannot Split!) Single key value contains too much data

4. Enabling Sharding (Code)

Sharding is configured at the database and collection level.

import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

public class EnableSharding {
    public static void main(String[] args) {
        // Connect to MONGOS, not a specific shard
        try (MongoClient client = MongoClients.create("mongodb://mongos1:27017")) {
            MongoDatabase adminDb = client.getDatabase("admin");

            // 1. Enable Sharding for the Database
            Document enableCmd = new Document("enableSharding", "myAppDb");
            adminDb.runCommand(enableCmd);

            // 2. Shard a Collection using a Hashed Key
            // We shard 'users' collection on 'user_id' field
            Document shardCmd = new Document("shardCollection", "myAppDb.users")
                .append("key", new Document("user_id", "hashed"));

            adminDb.runCommand(shardCmd);

            System.out.println("Collection sharded successfully!");
        }
    }
}
package main

import (
    "context"
    "fmt"
    "log"
    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
)

func main() {
    // Connect to MONGOS
    client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://mongos1:27017"))
    if err != nil { log.Fatal(err) }
    defer client.Disconnect(context.TODO())

    adminDb := client.Database("admin")

    // 1. Enable Sharding
    res := adminDb.RunCommand(context.TODO(), bson.D{
        {"enableSharding", "myAppDb"},
    })
    if res.Err() != nil { log.Fatal(res.Err()) }

    // 2. Shard Collection with Hashed Key
    // Note: To use bson.D literals, we wrap in Liquid tags
    // to prevent Liquid errors in this documentation.
    res = adminDb.RunCommand(context.TODO(), bson.D{
        {"shardCollection", "myAppDb.users"},
        {"key", bson.D{{"user_id", "hashed"}}},
    })
    if res.Err() != nil { log.Fatal(res.Err()) }

    fmt.Println("Collection sharded successfully!")
}

5. Summary

  • Sharding distributes data across multiple Replica Sets (Shards).
  • Mongos routes queries; Config Servers store the map.
  • Shard Key choice is permanent and critical.
  • Use Hashed keys for write distribution, Ranged keys for query locality.
  • Watch out for Monotonic Keys (Hot Shards) and Low Cardinality (Jumbo Chunks).