Collections & UDTs

In a relational database, if you want to store a user’s phone numbers, you create a separate phone_numbers table and JOIN it. In Cassandra, joins are forbidden. Instead, we use Collections to nest data directly inside the row.


1. Collections: Set, List, Map

Collections allow you to store multiple values in a single column.

Type Description Use Case
Set Unordered unique values. Tags ({'java', 'cql'}), unique IDs.
List Ordered list of values. Allows duplicates. Chronological events, prioritized items.
Map Key-Value pairs. JSON-like attributes ({'color': 'red', 'size': 'L'}).

The “Read-Before-Write” Penalty

[!WARNING] Updating a collection (adding/removing items) usually requires reading the existing collection internally to merge changes, or creating tombstones. Huge collections (>64KB) are a performance anti-pattern.

Interactive: Collection Internals

Visualize how collections are stored on disk. Unlike a blob, each element in a non-frozen collection is a separate cell.

Logical View (JSON)

[]

Physical Cells (On Disk)


2. User Defined Types (UDTs)

UDTs allow you to create structured data types. They are essentially a mini-table inside a column.

CREATE TYPE address (
  street text,
  city text,
  zip int
);

CREATE TABLE users (
  user_id uuid PRIMARY KEY,
  home_address frozen<address>
);

3. Frozen vs Non-Frozen

You will see the keyword frozen<> often. It is critical to understand.

Non-Frozen (Default for collections)

  • Behavior: Each element is stored as a separate cell.
  • Pros: You can add/remove individual items (UPDATE users SET tags = tags + {'new_tag'}).
  • Cons: Higher overhead. Cannot be part of a Primary Key.

Frozen

  • Behavior: The entire value is serialized into a single binary blob.
  • Pros: Fast to read (one seek). Can be used in Primary Keys.
  • Cons: Immutable. To update one field in a frozen UDT, you must overwrite the entire UDT.

[!TIP] Always use frozen for UDTs unless you have a very specific reason not to. It reduces storage overhead and simplifies the read path.


4. Implementation: Java & Go

Java (DataStax Driver)

import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.data.UdtValue;
import com.datastax.oss.driver.api.core.type.UserDefinedType;
import java.util.Set;

public class ProfileManager {
    public void insertProfile(CqlSession session, UUID userId) {
        // 1. Get UDT definition
        UserDefinedType addressType = session.getMetadata()
            .getKeyspace("ecommerce").get()
            .getUserDefinedType("address").get();

        // 2. Create UDT Value
        UdtValue address = addressType.newValue()
            .setString("street", "123 Code Ln")
            .setString("city", "Tech City")
            .setInt("zip", 90210);

        // 3. Insert with Set and UDT
        session.execute(session.prepare(
            "INSERT INTO users (user_id, tags, address) VALUES (?, ?, ?)")
            .bind(userId, Set.of("premium", "active"), address));
    }
}

Go (Gocql)

package main

import (
    "github.com/gocql/gocql"
)

type Address struct {
    Street string `cql:"street"`
    City   string `cql:"city"`
    Zip    int    `cql:"zip"`
}

func insertProfile(session *gocql.Session, id gocql.UUID) {
    // Go struct maps directly to UDT
    addr := Address{
        Street: "123 Code Ln",
        City:   "Tech City",
        Zip:    90210,
    }

    tags := []string{"premium", "active"}

    // Gocql handles marshaling automatically
    err := session.Query(`
        INSERT INTO users (user_id, tags, address) VALUES (?, ?, ?)`,
        id, tags, addr).Exec()

    if err != nil {
        panic(err)
    }
}