Module Review

In this module, we covered how to scale and manage data consumption in Kafka:

  1. Consumer Groups: Using the group.id to automatically load-balance data across multiple instances of an application.
  2. Offset Management: Tracking processing progress by committing offsets to __consumer_offsets, and choosing between auto and manual commits.
  3. Rebalance Protocols: Understanding how Kafka reassigns partitions and why modern Cooperative Sticky rebalancing is critical for low-latency systems.

1. Flash Quiz

1. What happens if you have 10 partitions and 15 consumers in the same group?

  • 5 consumers will sit Idle (unassigned) because each partition can only be assigned to a single consumer in a group at a time.

2. Which commit strategy is the most reliable for preventing data loss in a high-stakes application?

  • Manual Synchronous Commits (commitSync()) performed after the business logic has successfully completed.

3. What is the difference between auto.offset.reset=earliest and latest?

  • earliest: Starts from the beginning of the topic (replays history). latest: Starts from the end (only new messages).

4. Why is the “Incremental Cooperative Rebalancing” better than the old “Eager” rebalancing?

  • Because it doesn’t stop all consumers in the group. It only pauses the consumers that are actually losing or gaining a partition, keeping the rest of the application running.

5. How does Kafka track which message was the last one read?

  • By storing an Offset (an integer) in the internal __consumer_offsets topic.

2. What’s Next?

Now that we can produce and consume data at scale, we need to learn how to transform it in real-time. In the next module, we explore Kafka Streams, where we learn to write stream-processing applications for filtering, joining, and aggregating data without needing an external cluster like Spark or Flink.