Kinesis Data Streams Integration

While DynamoDB Streams is excellent for simple triggers (like updating a cache or sending an email), it has limitations at scale. For high-throughput applications, long-term retention, or complex analytics, you should use Amazon Kinesis Data Streams.

1. DynamoDB Streams vs. Kinesis Data Streams

Both services provide ordered, sharded streams of data changes, but they serve different purposes.

Feature DynamoDB Streams Kinesis Data Streams
Retention Fixed at 24 hours. 24 hours to 365 days.
Consumers Max 2 consumers per shard. Up to 5 consumers (standard) or 20 (enhanced fan-out).
Cost Charged by Read Request Units (RRU). Charged by Shard Hour + Payload Units.
Ordering Strict ordering per Item Key. Strict ordering per Partition Key.
Integration Tightly coupled with the table. Decoupled; many producers can write to one stream.

[!TIP] Use Kinesis When: You need to fan-out data to multiple teams (Search Team, Fraud Team, Analytics Team) without them competing for read throughput on the DynamoDB stream shards.


2. Kinesis Data Streams for DynamoDB

AWS offers a feature called Kinesis Data Streams for DynamoDB. This allows you to replicate item-level changes from your table to a Kinesis stream without writing any code.

  • Zero Impact on Table Performance: The replication happens asynchronously in the background and does not consume your table’s RCU/WCU.
  • Precision: You can choose whether to replicate the entire item (NEW_IMAGE) or just keys.

3. The Analytics Pipeline Pattern

A common pattern in modern data architectures is to use DynamoDB for online transactions (OLTP) and S3/Athena for analytics (OLAP). Kinesis acts as the bridge.

  1. DynamoDB: Handles user requests (Sub-ms latency).
  2. Kinesis Data Stream: Receives change events.
  3. Kinesis Data Firehose: Buffers records (e.g., 128MB or 5 minutes) and writes them to S3.
  4. Amazon S3: Stores the raw JSON/Parquet data.
  5. Amazon Athena: Runs SQL queries on the S3 data for reporting.

Interactive: Data Pipeline Simulator

Visualize how a single write to DynamoDB propagates through the entire analytics pipeline.

DynamoDB
Kinesis
Stream
Firehose
Buffer
S3
Parquet
Athena
SQL
Ready...

4. Considerations & Costs

Shard Management

Unlike DynamoDB Streams (where shards are managed for you), Kinesis requires you to manage shards (or use On-Demand mode).

  • Provisioned Mode: You specify the number of shards. 1 Shard = 1MB/s write, 2MB/s read.
  • On-Demand Mode: Scales automatically but costs more per GB.

Ordering Guarantees

Kinesis guarantees order within a shard. The Partition Key you use when writing to Kinesis determines the shard. DynamoDB uses the item’s Partition Key automatically, preserving order for that item.


5. Summary

  • Kinesis Data Streams is the enterprise-grade sibling of DynamoDB Streams.
  • Use it for long retention (replaying history) or high fan-out (many consumers).
  • The Analytics Pipeline (DynamoDB → Kinesis → Firehose → S3) is the standard pattern for getting data out of DynamoDB for complex querying.

Next, review your knowledge with the Module Review.