Kinesis Data Streams Integration
While DynamoDB Streams is excellent for simple triggers (like updating a cache or sending an email), it has limitations at scale. For high-throughput applications, long-term retention, or complex analytics, you should use Amazon Kinesis Data Streams.
1. DynamoDB Streams vs. Kinesis Data Streams
Both services provide ordered, sharded streams of data changes, but they serve different purposes.
| Feature | DynamoDB Streams | Kinesis Data Streams |
|---|---|---|
| Retention | Fixed at 24 hours. | 24 hours to 365 days. |
| Consumers | Max 2 consumers per shard. | Up to 5 consumers (standard) or 20 (enhanced fan-out). |
| Cost | Charged by Read Request Units (RRU). | Charged by Shard Hour + Payload Units. |
| Ordering | Strict ordering per Item Key. | Strict ordering per Partition Key. |
| Integration | Tightly coupled with the table. | Decoupled; many producers can write to one stream. |
[!TIP] Use Kinesis When: You need to fan-out data to multiple teams (Search Team, Fraud Team, Analytics Team) without them competing for read throughput on the DynamoDB stream shards.
2. Kinesis Data Streams for DynamoDB
AWS offers a feature called Kinesis Data Streams for DynamoDB. This allows you to replicate item-level changes from your table to a Kinesis stream without writing any code.
- Zero Impact on Table Performance: The replication happens asynchronously in the background and does not consume your table’s RCU/WCU.
- Precision: You can choose whether to replicate the entire item (
NEW_IMAGE) or just keys.
3. The Analytics Pipeline Pattern
A common pattern in modern data architectures is to use DynamoDB for online transactions (OLTP) and S3/Athena for analytics (OLAP). Kinesis acts as the bridge.
- DynamoDB: Handles user requests (Sub-ms latency).
- Kinesis Data Stream: Receives change events.
- Kinesis Data Firehose: Buffers records (e.g., 128MB or 5 minutes) and writes them to S3.
- Amazon S3: Stores the raw JSON/Parquet data.
- Amazon Athena: Runs SQL queries on the S3 data for reporting.
Interactive: Data Pipeline Simulator
Visualize how a single write to DynamoDB propagates through the entire analytics pipeline.
4. Considerations & Costs
Shard Management
Unlike DynamoDB Streams (where shards are managed for you), Kinesis requires you to manage shards (or use On-Demand mode).
- Provisioned Mode: You specify the number of shards. 1 Shard = 1MB/s write, 2MB/s read.
- On-Demand Mode: Scales automatically but costs more per GB.
Ordering Guarantees
Kinesis guarantees order within a shard. The Partition Key you use when writing to Kinesis determines the shard. DynamoDB uses the item’s Partition Key automatically, preserving order for that item.
5. Summary
- Kinesis Data Streams is the enterprise-grade sibling of DynamoDB Streams.
- Use it for long retention (replaying history) or high fan-out (many consumers).
- The Analytics Pipeline (DynamoDB → Kinesis → Firehose → S3) is the standard pattern for getting data out of DynamoDB for complex querying.
Next, review your knowledge with the Module Review.