Sparse Indexes
In a traditional relational database, an index usually contains an entry for every row in the table (unless it’s a filtered index). In DynamoDB, Global Secondary Indexes (GSIs) are sparse by default.
This means: DynamoDB only writes an item to the index if the item contains the index key attributes.
If an item is missing the GSI Partition Key (or Sort Key), it is simply ignored by the index. This behavior is incredibly powerful for creating highly efficient filters for “needle in a haystack” queries.
1. The Concept
Imagine you have a table of 10 million Users.
- 9,990,000 are standard users.
- 10,000 are Premium Members.
You want to find all Premium Members.
The Bad Way (Scan)
If you Scan the main table, you read 10 million items. This consumes massive Read Capacity Units (RCUs) and takes a long time.
The Good Way (Sparse Index)
You create a GSI with a Partition Key named IsPremium.
- For standard users, you do not set the
IsPremiumattribute. - For premium members, you set
IsPremium = "true".
Result: The GSI only contains the 10,000 premium members. Scanning the GSI consumes 1/1000th of the RCUs!
2. Interactive: The Filter Funnel
Visualize how DynamoDB filters data into a Sparse Index. Only items with the “Key” make it into the index bucket.
3. Use Cases
- Filtering by State: Indexing only
Status = "ERROR"to find failed jobs quickly without scanning millions of “SUCCESS” jobs. - User Roles: Indexing
IsAdmin = "true"to list administrators. - Deleted Items: Using a
DeletedAtattribute as a GSI key to implement a “Recycle Bin” pattern. You can query deleted items via the GSI, but they don’t clutter your main access patterns.
4. Code Implementation
Go: Creating Sparse Items
Simply omit the attribute if it doesn’t apply.
package main
import (
"context"
"fmt"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/dynamodb"
"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
)
func putStandardUser(client *dynamodb.Client, table, userId string) {
// Standard user: NO "IsPremium" attribute
item := map[string]types.AttributeValue{
"PK": &types.AttributeValueMemberS{Value: fmt.Sprintf("USER#%s", userId)},
"SK": &types.AttributeValueMemberS{Value: "PROFILE"},
// IsPremium is MISSING. This item will NOT appear in the Sparse Index.
}
client.PutItem(context.TODO(), &dynamodb.PutItemInput{
TableName: aws.String(table),
Item: item,
})
}
func putPremiumUser(client *dynamodb.Client, table, userId string) {
// Premium user: HAS "IsPremium" attribute
item := map[string]types.AttributeValue{
"PK": &types.AttributeValueMemberS{Value: fmt.Sprintf("USER#%s", userId)},
"SK": &types.AttributeValueMemberS{Value: "PROFILE"},
"IsPremium": &types.AttributeValueMemberS{Value: "true"}, // This key puts it in the index
}
client.PutItem(context.TODO(), &dynamodb.PutItemInput{
TableName: aws.String(table),
Item: item,
})
}
Java: Scanning the Sparse Index
When you scan the sparse index, you only get the premium users.
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.model.*;
public class SparseScan {
public static void findPremiumUsers(DynamoDbClient ddb, String tableName) {
// "PremiumIndex" has Partition Key = "IsPremium"
ScanRequest scanRequest = ScanRequest.builder()
.tableName(tableName)
.indexName("PremiumIndex")
.build();
ScanResponse response = ddb.scan(scanRequest);
// This response contains ONLY items where IsPremium exists
System.out.println("Premium Users Found: " + response.count());
for (Map<String, AttributeValue> item : response.items()) {
System.out.println("Found Premium User: " + item.get("PK").s());
}
}
}
[!TIP] Pro Tip: Sparse Indexes are essentially “free” filtering. Since you only pay storage and write costs for the items that actually end up in the index, they are incredibly cost-effective for low-cardinality subsets of data.