Sparse Indexes

In a traditional relational database, an index usually contains an entry for every row in the table (unless it’s a filtered index). In DynamoDB, Global Secondary Indexes (GSIs) are sparse by default.

This means: DynamoDB only writes an item to the index if the item contains the index key attributes.

If an item is missing the GSI Partition Key (or Sort Key), it is simply ignored by the index. This behavior is incredibly powerful for creating highly efficient filters for “needle in a haystack” queries.

1. The Concept

Imagine you have a table of 10 million Users.

  • 9,990,000 are standard users.
  • 10,000 are Premium Members.

You want to find all Premium Members.

The Bad Way (Scan)

If you Scan the main table, you read 10 million items. This consumes massive Read Capacity Units (RCUs) and takes a long time.

The Good Way (Sparse Index)

You create a GSI with a Partition Key named IsPremium.

  • For standard users, you do not set the IsPremium attribute.
  • For premium members, you set IsPremium = "true".

Result: The GSI only contains the 10,000 premium members. Scanning the GSI consumes 1/1000th of the RCUs!

2. Interactive: The Filter Funnel

Visualize how DynamoDB filters data into a Sparse Index. Only items with the “Key” make it into the index bucket.

Base Table (All Items)
Count: 0
Sparse Index (Only Keyed)
Count: 0

3. Use Cases

  1. Filtering by State: Indexing only Status = "ERROR" to find failed jobs quickly without scanning millions of “SUCCESS” jobs.
  2. User Roles: Indexing IsAdmin = "true" to list administrators.
  3. Deleted Items: Using a DeletedAt attribute as a GSI key to implement a “Recycle Bin” pattern. You can query deleted items via the GSI, but they don’t clutter your main access patterns.

4. Code Implementation

Go: Creating Sparse Items

Simply omit the attribute if it doesn’t apply.


package main

import (
	"context"
	"fmt"
	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/service/dynamodb"
	"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
)

func putStandardUser(client *dynamodb.Client, table, userId string) {
	// Standard user: NO "IsPremium" attribute
	item := map[string]types.AttributeValue{
		"PK": &types.AttributeValueMemberS{Value: fmt.Sprintf("USER#%s", userId)},
		"SK": &types.AttributeValueMemberS{Value: "PROFILE"},
		// IsPremium is MISSING. This item will NOT appear in the Sparse Index.
	}
	client.PutItem(context.TODO(), &dynamodb.PutItemInput{
		TableName: aws.String(table),
		Item:      item,
	})
}

func putPremiumUser(client *dynamodb.Client, table, userId string) {
	// Premium user: HAS "IsPremium" attribute
	item := map[string]types.AttributeValue{
		"PK":        &types.AttributeValueMemberS{Value: fmt.Sprintf("USER#%s", userId)},
		"SK":        &types.AttributeValueMemberS{Value: "PROFILE"},
		"IsPremium": &types.AttributeValueMemberS{Value: "true"}, // This key puts it in the index
	}
	client.PutItem(context.TODO(), &dynamodb.PutItemInput{
		TableName: aws.String(table),
		Item:      item,
	})
}

Java: Scanning the Sparse Index

When you scan the sparse index, you only get the premium users.

import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.model.*;

public class SparseScan {
    public static void findPremiumUsers(DynamoDbClient ddb, String tableName) {
        // "PremiumIndex" has Partition Key = "IsPremium"
        ScanRequest scanRequest = ScanRequest.builder()
            .tableName(tableName)
            .indexName("PremiumIndex")
            .build();

        ScanResponse response = ddb.scan(scanRequest);

        // This response contains ONLY items where IsPremium exists
        System.out.println("Premium Users Found: " + response.count());

        for (Map<String, AttributeValue> item : response.items()) {
            System.out.println("Found Premium User: " + item.get("PK").s());
        }
    }
}

[!TIP] Pro Tip: Sparse Indexes are essentially “free” filtering. Since you only pay storage and write costs for the items that actually end up in the index, they are incredibly cost-effective for low-cardinality subsets of data.