Architecture & Scaling
DynamoDB is not just “another NoSQL database.” It is a distributed system designed to solve a specific physics problem: How do you store infinite data with single-digit millisecond latency?
The answer lies in its architecture: Shared-Nothing Partitioning.
In this chapter, we will derive DynamoDB’s architecture from first principles, visualize how data is physically stored, and write the code to define your table’s schema.
1. The Problem with Vertical Scaling
Traditional relational databases (RDBMS) like PostgreSQL or MySQL scale vertically. To handle more traffic, you buy a bigger server (more CPU, more RAM).
- Limit: There is a physical limit to how big a single server can get.
- Cost: Costs grow exponentially with machine size.
- Failure: A single massive server is a single point of failure.
DynamoDB scales horizontally. It doesn’t use one super-computer; it uses thousands of commodity servers.
2. Primary Keys: The Routing Logic
To distribute data across thousands of servers, DynamoDB needs a deterministic way to know exactly which server holds your data without querying a central registry (which would be a bottleneck).
It uses your Primary Key to route data.
1. Simple Primary Key (Partition Key Only)
- Structure: One attribute (e.g.,
UserId,SessionId). - Behavior: Input to an internal hash function. The output determines the physical partition.
- Use Case: Key-value lookups (e.g., Cache, Session Store).
2. Composite Primary Key (Partition Key + Sort Key)
- Structure: Two attributes (e.g.,
Author+BookTitle). - Partition Key: Determines the physical location (which server).
- Sort Key: Determines the order within that partition (B-Tree).
- Use Case: One-to-Many relationships (e.g., “Get all books by Author X, sorted by Title”).
Items with the same Partition Key are stored physically together. This is crucial for performance (Data Locality). You can retrieve all items for a partition key in a single efficient Query operation.
3. Consistent Hashing (The “Secret Sauce”)
DynamoDB uses Consistent Hashing to map your Partition Key to a storage node.
- Input: Your Partition Key (e.g., “user_123”).
- Hash: MD5 hash (128-bit integer).
- Token Ring: The hash space is treated as a ring (0 to 2128-1).
- Placement: The item belongs to the first partition encountered moving clockwise on the ring.
Interactive: Partition Ring Visualizer
See how your data is distributed. Enter different keys to see where they land on the ring.
Hash Visualizer
4. Scaling Mechanics
How does this architecture scale to petabytes?
As your data grows, a single partition might become too big (storage limit) or too hot (throughput limit). DynamoDB handles this automatically by Splitting.
- Storage Split: If a partition exceeds ~10GB, it splits into two child partitions.
- Redistribution: The hash range is divided. Metadata is updated to point new writes to the child partitions.
- Zero Downtime: This happens in the background. Your application is unaware.
If you choose a Partition Key with low cardinality (e.g., Gender, Status), you force all traffic into a small number of partitions. This is a "Hot Partition" and will limit your throughput to ~3000 RCUs / 1000 WCUs total. Always choose high-cardinality keys like UserId or UUID.
5. Code Implementation: Creating a Table
Here is the correct way to define a schema in Java and Go.
Java (v2 SDK)
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.model.*;
public class CreateTable {
public static void main(String[] args) {
DynamoDbClient ddb = DynamoDbClient.create();
CreateTableRequest request = CreateTableRequest.builder()
.tableName("Library")
// Define Attributes
.attributeDefinitions(
AttributeDefinition.builder()
.attributeName("Author").attributeType(ScalarAttributeType.S).build(),
AttributeDefinition.builder()
.attributeName("Title").attributeType(ScalarAttributeType.S).build()
)
// Define Key Schema
.keySchema(
KeySchemaElement.builder()
.attributeName("Author").keyType(KeyType.HASH).build(), // Partition Key
KeySchemaElement.builder()
.attributeName("Title").keyType(KeyType.RANGE).build() // Sort Key
)
// Provision Throughput (or use PAY_PER_REQUEST)
.provisionedThroughput(
ProvisionedThroughput.builder()
.readCapacityUnits(5L).writeCapacityUnits(5L).build()
)
.build();
ddb.createTable(request);
}
}
Go (v2 SDK)
package main
import (
"context"
"log"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/dynamodb"
"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
)
func main() {
cfg, _ := config.LoadDefaultConfig(context.TODO())
svc := dynamodb.NewFromConfig(cfg)
input := &dynamodb.CreateTableInput{
TableName: aws.String("Library"),
AttributeDefinitions: []types.AttributeDefinition{
{
AttributeName: aws.String("Author"),
AttributeType: types.ScalarAttributeTypeS,
},
{
AttributeName: aws.String("Title"),
AttributeType: types.ScalarAttributeTypeS,
},
},
KeySchema: []types.KeySchemaElement{
{
AttributeName: aws.String("Author"),
KeyType: types.KeyTypeHash, // Partition Key
},
{
AttributeName: aws.String("Title"),
KeyType: types.KeyTypeRange, // Sort Key
},
},
ProvisionedThroughput: &types.ProvisionedThroughput{
ReadCapacityUnits: aws.Int64(5),
WriteCapacityUnits: aws.Int64(5),
},
}
_, err := svc.CreateTable(context.TODO(), input)
if err != nil {
log.Fatalf("Error: %v", err)
}
}