Collections & UDTs

In a relational database, if you want to store a user’s phone numbers, you create a separate phone_numbers table and JOIN it. In Cassandra, joins are forbidden. Instead, we use Collections to nest data directly inside the row.


1. Collections: Set, List, Map

Collections allow you to store multiple values in a single column.

Type Description Use Case
Set Unordered unique values. Tags ({'java', 'cql'}), unique IDs.
List Ordered list of values. Allows duplicates. Chronological events, prioritized items.
Map Key-Value pairs. JSON-like attributes ({'color': 'red', 'size': 'L'}).

The “Read-Before-Write” Penalty

[!WARNING] Updating a collection (adding/removing items) usually requires reading the existing collection internally to merge changes, or creating tombstones. Huge collections (>64KB) are a performance anti-pattern.

Interactive: Collection Internals

Visualize how collections are stored on disk. Unlike a blob, each element in a non-frozen collection is a separate cell.

Logical View (JSON)

[]

Physical Cells (On Disk)


2. User Defined Types (UDTs)

UDTs allow you to create structured data types. They are essentially a mini-table inside a column.

CREATE TYPE address (
  street text,
  city text,
  zip int
);

CREATE TABLE users (
  user_id uuid PRIMARY KEY,
  home_address frozen<address>
);

3. Frozen vs Non-Frozen

You will see the keyword frozen<> often. It is critical to understand.

Non-Frozen (Default for collections)

  • Behavior: Each element is stored as a separate cell.
  • Pros: You can add/remove individual items (UPDATE users SET tags = tags + {'new_tag'}).
  • Cons: Higher overhead. Cannot be part of a Primary Key.

Frozen

  • Behavior: The entire value is serialized into a single binary blob.
  • Pros: Fast to read (one seek). Can be used in Primary Keys.
  • Cons: Immutable. To update one field in a frozen UDT, you must overwrite the entire UDT.

[!TIP] Always use frozen for UDTs unless you have a very specific reason not to. It reduces storage overhead and simplifies the read path.


4. Implementation: Java & Go

Java
Go
import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.data.UdtValue;
import com.datastax.oss.driver.api.core.type.UserDefinedType;
import java.util.Set;

public class ProfileManager {
  public void insertProfile(CqlSession session, UUID userId) {
    // 1. Get UDT definition
    UserDefinedType addressType = session.getMetadata()
      .getKeyspace("ecommerce").get()
      .getUserDefinedType("address").get();

    // 2. Create UDT Value
    UdtValue address = addressType.newValue()
      .setString("street", "123 Code Ln")
      .setString("city", "Tech City")
      .setInt("zip", 90210);

    // 3. Insert with Set and UDT
    session.execute(session.prepare(
      "INSERT INTO users (user_id, tags, address) VALUES (?, ?, ?)")
      .bind(userId, Set.of("premium", "active"), address));
  }
}
package main

import (
  "github.com/gocql/gocql"
)

type Address struct {
  Street string `cql:"street"`
  City   string `cql:"city"`
  Zip    int    `cql:"zip"`
}

func insertProfile(session *gocql.Session, id gocql.UUID) {
  // Go struct maps directly to UDT
  addr := Address{
    Street: "123 Code Ln",
    City:   "Tech City",
    Zip:    90210,
  }

  tags := []string{"premium", "active"}

  // Gocql handles marshaling automatically
  err := session.Query(`
    INSERT INTO users (user_id, tags, address) VALUES (?, ?, ?)`,
    id, tags, addr).Exec()

  if err != nil {
    panic(err)
  }
}