Module Review: Data Modeling

This module review covers the essential principles of Cassandra data modeling, including query-driven design, keys, and denormalization.

Key Takeaways

Query-First: Always start with your application queries. Map 1 Query → 1 Table.
Partition Key: Determines which node stores the data. Must have high cardinality to avoid Hot Partitions.
Clustering Key: Determines the sort order of data on disk. Enables efficient range queries.
Denormalization: Duplicating data is necessary to achieve fast reads.
Write Amplification: Writing to multiple tables is cheaper than doing distributed JOINs.

Flashcards

Cheat Sheet

Primary Key Syntax

Syntax	Partition Key	Clustering Key
`PRIMARY KEY (a)`	`a`	None
`PRIMARY KEY (a, b)`	`a`	`b`
`PRIMARY KEY ((a, b), c)`	`a`, `b`	`c`
`PRIMARY KEY ((a), b, c)`	`a`	`b`, `c`

Modeling Do’s and Don’ts

Do	Don’t
✅ Start with Queries	❌ Start with Tables
✅ Duplicate Data	❌ Use client-side JOINs
✅ High Cardinality PK	❌ Low Cardinality PK (e.g., Boolean)
✅ Use Batches for Sync	❌ Use Batches for Bulk Load
✅ Order by Clustering Key	❌ Order by client-side sorting

Practice Scenario

Task: Design a schema for a “IoT Sensor Network”.

We have thousands of sensors.
We need to see the latest temperature for a specific sensor.
We need to see all temperature readings for a specific sensor for a specific day.

Solution:

CREATE TABLE sensor_readings_by_day (
    sensor_id uuid,
    date date,
    recorded_at timestamp,
    temperature decimal,
    PRIMARY KEY ((sensor_id, date), recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC);

Partition Key: (sensor_id, date) - Ensures that a single partition doesn’t grow indefinitely. Each day is a new partition.
Clustering Key: recorded_at - Sorts readings chronologically.

Module Review: Data Modeling

Module Review: Data Modeling

Key Takeaways

Flashcards

What determines data distribution?

The Partition Key

What is the purpose of the Clustering Key?

Sorting & Range Queries

Why do we Denormalize?

To Optimize Reads

What is a Hot Partition?

Uneven Data Distribution

How do we ensure consistency across tables?

Logged Batches

Cheat Sheet

Primary Key Syntax

Modeling Do’s and Don’ts

Practice Scenario

Found this lesson helpful?