Sharding: Horizontal Scaling
Replication solves high availability, but it doesn’t solve capacity. If your dataset grows to 10TB or your write throughput exceeds what a single CPU can handle, you need Sharding.
1. Sharding Architecture
Sharding splits data across multiple machines. In MongoDB, a sharded cluster consists of three components:
- Shards: The data servers. Each shard is a Replica Set that holds a subset of the data.
- Config Servers: Store the cluster’s metadata (which chunk lives on which shard). This is the “brain” of the cluster.
- Mongos: The query router. Application clients connect to
mongos, not the shards directly. It consults the Config Servers to route queries to the correct shard.
2. The Shard Key
The most critical decision in a sharded cluster is selecting the Shard Key. This key determines how data is distributed.
Ranged Sharding
Partitions data based on ranges of shard key values (e.g., user_id 1-1000, 1001-2000).
- Pros: Efficient for range queries (
find({x: {$gt: 10}})). - Cons: Can lead to uneven distribution if data is inserted sequentially (e.g., timestamps).
Hashed Sharding
Computes a hash of the shard key and uses the hash to determine the chunk.
- Pros: Ensures even distribution of data.
- Cons: Range queries are inefficient because they must be broadcast to all shards.
Interactive: Shard Key Visualizer
See how different keys affect data distribution in real-time.
3. Chunks and Balancing
MongoDB partitions data into Chunks.
- Default chunk size is 64MB.
- When a chunk grows too large, MongoDB splits it.
- The Balancer migrates chunks between shards to keep the cluster even.
The Jumbo Chunk Problem
If a single shard key value appears so frequently that its data exceeds the chunk size (64MB), MongoDB cannot split the chunk because splitting requires distinct shard key values.
[!WARNING] Jumbo Chunk: An unmovable, unsplittable chunk that can cause significant imbalance. Fix: Improve Shard Key cardinality (add more randomness or fields).
4. Enabling Sharding (Code)
Sharding is configured at the database and collection level.
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
public class EnableSharding {
public static void main(String[] args) {
// Connect to MONGOS, not a specific shard
try (MongoClient client = MongoClients.create("mongodb://mongos1:27017")) {
MongoDatabase adminDb = client.getDatabase("admin");
// 1. Enable Sharding for the Database
Document enableCmd = new Document("enableSharding", "myAppDb");
adminDb.runCommand(enableCmd);
// 2. Shard a Collection using a Hashed Key
// We shard 'users' collection on 'user_id' field
Document shardCmd = new Document("shardCollection", "myAppDb.users")
.append("key", new Document("user_id", "hashed"));
adminDb.runCommand(shardCmd);
System.out.println("Collection sharded successfully!");
}
}
}
package main
import (
"context"
"fmt"
"log"
"go.mongodb.org/mongo-driver/bson"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
// Connect to MONGOS
client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://mongos1:27017"))
if err != nil { log.Fatal(err) }
defer client.Disconnect(context.TODO())
adminDb := client.Database("admin")
// 1. Enable Sharding
res := adminDb.RunCommand(context.TODO(), bson.D{
{"enableSharding", "myAppDb"},
})
if res.Err() != nil { log.Fatal(res.Err()) }
// 2. Shard Collection with Hashed Key
// Note: To use bson.D literals, we wrap in Liquid tags
// to prevent Liquid errors in this documentation.
res = adminDb.RunCommand(context.TODO(), bson.D{
{"shardCollection", "myAppDb.users"},
{"key", bson.D{{"user_id", "hashed"}}},
})
if res.Err() != nil { log.Fatal(res.Err()) }
fmt.Println("Collection sharded successfully!")
}
5. Summary
- Sharding distributes data across multiple Replica Sets (Shards).
- Mongos routes queries; Config Servers store the map.
- Shard Key choice is permanent and critical.
- Use Hashed keys for write distribution, Ranged keys for query locality.
- Watch out for Monotonic Keys (Hot Shards) and Low Cardinality (Jumbo Chunks).