High Availability & Consistency
Imagine you are building the backend for a high-frequency trading platform. A user transfers $10,000 from their savings to their checking account. The system confirms the transfer. Two seconds later, the database server crashes. When it reboots, the $10,000 is gone from savings, but hasn’t appeared in checking.
Welcome to the harsh reality of distributed systems. You are forced to choose between consistency (safety) and latency (speed). MongoDB doesn’t make this choice for you; instead, it gives you granular control over this trade-off via Write Concern and Read Concern.
1. Write Concern: When is a Write “Safe”?
Write Concern determines exactly when MongoDB considers a write operation to be “successful” and acknowledges it to the client application.
The Levels of Acknowledgment
w: 1(The Default): MongoDB acknowledges the write as soon as the Primary node applies it to memory.- Pros: Extremely fast (low latency).
- Cons: Risky. If the Primary crashes before this data is replicated to a Secondary, the data is permanently lost (rolled back).
w: "majority"(The Gold Standard): MongoDB acknowledges the write only after it has propagated to a majority of nodes in the Replica Set (e.g., 2 out of 3 nodes).- Pros: Safe against rollbacks. Even if the Primary dies, the data survives.
- Cons: Higher latency, as it involves network round-trips to Secondary nodes.
j: true(Disk Journaling): Wait for the write to be flushed to the on-disk Journal before acknowledging.- Pros: Ensures durability even if the server loses power completely and immediately.
- Cons: Maximum latency (disk I/O is slow).
⚔️ War Story: The Phantom Order
An e-commerce company used the default w: 1 for checkout processing. During Black Friday, a user placed an order for a $2,000 laptop. The Primary node confirmed the order in memory, and the application sent a "Success" email. Half a second later, the Primary node experienced a kernel panic and died before replicating the order. A Secondary took over, but it didn't have the order data. The user was charged, but the warehouse never received the shipping request. They switched to w: "majority" the very next day.
2. Read Concern: Avoiding the Ghosts of Data Past
While Write Concern dictates how safely data is written, Read Concern dictates how “fresh” and “permanent” the data you read is.
In a distributed database, just because you can read a piece of data doesn’t mean it’s permanent. If you read from a node that has data which hasn’t yet replicated to the majority, that data might be rolled back if a crash occurs.
local: Return the most recent data available on the specific node you are querying. Warning: This data might be rolled back if it hasn’t achieved majority replication yet.majority: Only return data that has been fully acknowledged by a majority of the replica set. This guarantees the data you read is durable and will never be rolled back.linearizable: Guarantees that the read returns the absolute latest data, reflecting all successful writes that completed before the read started. This involves checking with the majority of nodes at read-time and is highly expensive in terms of latency.
Step-by-Step Example: The Dirty Read
- Time T=0: Primary processes a write
balance = $500withw: 1. - Time T=1: Application reads the Primary with
readConcern: local. It seesbalance = $500. - Time T=2: Primary crashes before replicating to Secondaries.
- Time T=3: Secondary is promoted to Primary. It never received the
$500update. The balance reverts to the old value (e.g.,$100). - Result: Your application read data that effectively never existed. Using
readConcern: majoritywould have prevented this by forcing the read to wait or return the older, safe value.
3. Interactive: Consistency vs. Latency Simulator
Experience the trade-off firsthand. Adjust the slider to see how increasing data safety impacts system latency.
4. Case Study: Global Financial Ledger (PEDALS Framework)
Let’s apply these concepts using the PEDALS framework to design a globally available financial ledger.
- P - Process Requirements: The system must process financial transactions across North America, Europe, and Asia. Absolute data safety is required (zero financial loss). System must remain available even if an entire AWS region goes offline.
- E - Estimate: 10,000 transactions per second (TPS). High read-to-write ratio (users checking balances 10x more than transferring).
- D - Data Model: A
transactionscollection and anaccountscollection. - A - Architecture:
- Deploy a multi-region Replica Set spanning
us-east,eu-west, andap-south. - Set Write Concern to
majorityto ensure no transaction is ever lost. - Set Read Concern to
majorityto ensure users never see “ghost” balances.
- Deploy a multi-region Replica Set spanning
- L - Localized Details (Zone Sharding):
- Zone Sharding allows pinning data to specific geographic regions.
- Tag Shard A as “EU” and define a shard key range (
region: "EU") to enforce GDPR compliance and keep local EU traffic low-latency.
- S - Scale: As TPS grows, add more Shards within each region, maintaining the multi-region replica set architecture for high availability.
5. Coding Consistency
You can strictly enforce these concerns at the Client, Database, Collection, or individual Operation level in your application code.
Java Implementation
import com.mongodb.WriteConcern;
import com.mongodb.ReadConcern;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoCollection;
import org.bson.Document;
public class ConsistencyLevels {
public static void main(String[] args) {
try (MongoClient client = MongoClients.create("mongodb://localhost:27017")) {
// Enforce Majority Rules at the Collection Level
MongoCollection<Document> safeCol = client.getDatabase("bank")
.getCollection("transfers")
.withWriteConcern(WriteConcern.MAJORITY)
.withReadConcern(ReadConcern.MAJORITY);
// This insert completely blocks until acknowledged by a majority of nodes
safeCol.insertOne(new Document("amount", 100));
System.out.println("Secure write fully replicated and complete.");
}
}
}
Go Implementation
package main
import (
"context"
"log"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
"go.mongodb.org/mongo-driver/mongo/writeconcern"
"go.mongodb.org/mongo-driver/mongo/readconcern"
)
func main() {
// Connect to the cluster
client, _ := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://localhost:27017"))
// Define strict consistency options
wc := writeconcern.New(writeconcern.WMajority())
rc := readconcern.Majority()
opts := options.Collection().
SetWriteConcern(wc).
SetReadConcern(rc)
// Apply options to the collection
coll := client.Database("bank").Collection("transfers", opts)
// This insert is durable and safe from rollbacks
_, err := coll.InsertOne(context.TODO(), map[string]interface{}{"amount": 100})
if err != nil {
log.Fatal("Write failed to replicate safely:", err)
}
log.Println("Secure write fully replicated and complete.")
}
6. Summary: The Golden Rules of Consistency
- Write Concern (
w: majority) is your shield against data loss during unexpected crashes. It is the gold standard for production systems handling sensitive data. - Read Concern (
majority) ensures your application only reads truth that will not be erased by a rollback. - Zone Sharding is the ultimate architectural tool for Multi-Region deployments, ensuring both legal compliance (e.g., GDPR) and low-latency access by pinning data to geographically local shards.
- Always deliberately balance Latency (user experience) against Data Safety (business survival). Defaulting blindly to
w: 1is a recipe for disaster in financial or critical path systems.