Module Review

In this module, we covered how to scale and manage data consumption in Kafka:

Consumer Groups: Using the group.id to automatically load-balance data across multiple instances of an application.
Offset Management: Tracking processing progress by committing offsets to __consumer_offsets, and choosing between auto and manual commits.
Rebalance Protocols: Understanding how Kafka reassigns partitions and why modern Cooperative Sticky rebalancing is critical for low-latency systems.

1. Flash Quiz

1. What happens if you have 10 partitions and 15 consumers in the same group?

5 consumers will sit Idle (unassigned) because each partition can only be assigned to a single consumer in a group at a time.

2. Which commit strategy is the most reliable for preventing data loss in a high-stakes application?

Manual Synchronous Commits (commitSync()) performed after the business logic has successfully completed.

3. What is the difference between auto.offset.reset=earliest and latest?

earliest: Starts from the beginning of the topic (replays history). latest: Starts from the end (only new messages).

4. Why is the “Incremental Cooperative Rebalancing” better than the old “Eager” rebalancing?

Because it doesn’t stop all consumers in the group. It only pauses the consumers that are actually losing or gaining a partition, keeping the rest of the application running.

5. How does Kafka track which message was the last one read?

By storing an Offset (an integer) in the internal __consumer_offsets topic.

2. What’s Next?

Now that we can produce and consume data at scale, we need to learn how to transform it in real-time. In the next module, we explore Kafka Streams, where we learn to write stream-processing applications for filtering, joining, and aggregating data without needing an external cluster like Spark or Flink.