Collections & UDTs
In a relational database, if you want to store a user’s phone numbers, you create a separate phone_numbers table and JOIN it. In Cassandra, joins are forbidden. Instead, we use Collections to nest data directly inside the row.
1. Collections: Set, List, Map
Collections allow you to store multiple values in a single column.
| Type | Description | Use Case |
|---|---|---|
| Set | Unordered unique values. | Tags ({'java', 'cql'}), unique IDs. |
| List | Ordered list of values. Allows duplicates. | Chronological events, prioritized items. |
| Map | Key-Value pairs. | JSON-like attributes ({'color': 'red', 'size': 'L'}). |
The “Read-Before-Write” Penalty
[!WARNING] Updating a collection (adding/removing items) usually requires reading the existing collection internally to merge changes, or creating tombstones. Huge collections (>64KB) are a performance anti-pattern.
Interactive: Collection Internals
Visualize how collections are stored on disk. Unlike a blob, each element in a non-frozen collection is a separate cell.
Logical View (JSON)
Physical Cells (On Disk)
2. User Defined Types (UDTs)
UDTs allow you to create structured data types. They are essentially a mini-table inside a column.
CREATE TYPE address (
street text,
city text,
zip int
);
CREATE TABLE users (
user_id uuid PRIMARY KEY,
home_address frozen<address>
);
3. Frozen vs Non-Frozen
You will see the keyword frozen<> often. It is critical to understand.
Non-Frozen (Default for collections)
- Behavior: Each element is stored as a separate cell.
- Pros: You can add/remove individual items (
UPDATE users SET tags = tags + {'new_tag'}). - Cons: Higher overhead. Cannot be part of a Primary Key.
Frozen
- Behavior: The entire value is serialized into a single binary blob.
- Pros: Fast to read (one seek). Can be used in Primary Keys.
- Cons: Immutable. To update one field in a frozen UDT, you must overwrite the entire UDT.
[!TIP] Always use
frozenfor UDTs unless you have a very specific reason not to. It reduces storage overhead and simplifies the read path.
4. Implementation: Java & Go
Java (DataStax Driver)
import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.data.UdtValue;
import com.datastax.oss.driver.api.core.type.UserDefinedType;
import java.util.Set;
public class ProfileManager {
public void insertProfile(CqlSession session, UUID userId) {
// 1. Get UDT definition
UserDefinedType addressType = session.getMetadata()
.getKeyspace("ecommerce").get()
.getUserDefinedType("address").get();
// 2. Create UDT Value
UdtValue address = addressType.newValue()
.setString("street", "123 Code Ln")
.setString("city", "Tech City")
.setInt("zip", 90210);
// 3. Insert with Set and UDT
session.execute(session.prepare(
"INSERT INTO users (user_id, tags, address) VALUES (?, ?, ?)")
.bind(userId, Set.of("premium", "active"), address));
}
}
Go (Gocql)
package main
import (
"github.com/gocql/gocql"
)
type Address struct {
Street string `cql:"street"`
City string `cql:"city"`
Zip int `cql:"zip"`
}
func insertProfile(session *gocql.Session, id gocql.UUID) {
// Go struct maps directly to UDT
addr := Address{
Street: "123 Code Ln",
City: "Tech City",
Zip: 90210,
}
tags := []string{"premium", "active"}
// Gocql handles marshaling automatically
err := session.Query(`
INSERT INTO users (user_id, tags, address) VALUES (?, ?, ?)`,
id, tags, addr).Exec()
if err != nil {
panic(err)
}
}