Module Review
In this module, we covered how to scale and manage data consumption in Kafka:
- Consumer Groups: Using the
group.idto automatically load-balance data across multiple instances of an application. - Offset Management: Tracking processing progress by committing offsets to
__consumer_offsets, and choosing between auto and manual commits. - Rebalance Protocols: Understanding how Kafka reassigns partitions and why modern Cooperative Sticky rebalancing is critical for low-latency systems.
1. Flash Quiz
1. What happens if you have 10 partitions and 15 consumers in the same group?
- 5 consumers will sit Idle (unassigned) because each partition can only be assigned to a single consumer in a group at a time.
2. Which commit strategy is the most reliable for preventing data loss in a high-stakes application?
- Manual Synchronous Commits (
commitSync()) performed after the business logic has successfully completed.
3. What is the difference between auto.offset.reset=earliest and latest?
earliest: Starts from the beginning of the topic (replays history).latest: Starts from the end (only new messages).
4. Why is the “Incremental Cooperative Rebalancing” better than the old “Eager” rebalancing?
- Because it doesn’t stop all consumers in the group. It only pauses the consumers that are actually losing or gaining a partition, keeping the rest of the application running.
5. How does Kafka track which message was the last one read?
- By storing an Offset (an integer) in the internal
__consumer_offsetstopic.
2. What’s Next?
Now that we can produce and consume data at scale, we need to learn how to transform it in real-time. In the next module, we explore Kafka Streams, where we learn to write stream-processing applications for filtering, joining, and aggregating data without needing an external cluster like Spark or Flink.