Module Review: Consumers

This module review consolidates your understanding of Kafka’s consumer architecture. By mastering consumer groups, offset tracking, and rebalancing protocols, you are now equipped to build highly scalable, fault-tolerant data pipelines that process messages reliably under diverse failure scenarios.

In this module, we covered how to scale and manage data consumption in Kafka:

Consumer Groups: Using the group.id to automatically load-balance data across multiple instances of an application.
Offset Management: Tracking processing progress by committing offsets to __consumer_offsets, and choosing between auto and manual commits.
Rebalance Protocols: Understanding how Kafka reassigns partitions and why modern Cooperative Sticky rebalancing is critical for low-latency systems.

1. Key Takeaways

Consumer Groups: Provide horizontal scaling. Partitions are split among consumers in a group.
Group Coordinator: Broker managing consumer group membership and rebalancing.
Offset Management: Tracking processing progress in __consumer_offsets.
Commit Strategies: enable.auto.commit vs commitSync() / commitAsync().
Rebalance Protocols: Eager (stop-the-world) vs Cooperative Sticky (incremental).

2. Flashcards

What happens if you have 10 partitions and 15 consumers in the same group?

5 consumers will sit Idle (unassigned) because each partition can only be assigned to a single consumer in a group at a time.

Which commit strategy is the most reliable for preventing data loss in a high-stakes application?

Manual Synchronous Commits (commitSync()) performed after the business logic has successfully completed.

What is the difference between auto.offset.reset=earliest and latest?

earliest: Starts from the beginning of the topic (replays history).
latest: Starts from the end (only new messages).

Why is "Incremental Cooperative Rebalancing" better than "Eager" rebalancing?

It doesn't stop all consumers in the group. It only pauses the consumers that are actually losing or gaining a partition, keeping the rest of the application running.

How does Kafka track which message was the last one read?

By storing an Offset (an integer) in the internal __consumer_offsets topic.

3. Cheat Sheet

Concept	Description	Key Mechanism
Consumer Group	Logical grouping of consumers	Automatically balances partitions
Offset Management	Tracking progress	Commits to `__consumer_offsets`
Rebalance Protocol	Partition reassignment logic	Eager vs Cooperative Sticky
Auto Commit	Background commit every interval	Fast, high risk of data loss
Manual Commit	Synchronous/Asynchronous user commit	Precise control

4. Quick Revision

Consumer Groups: Provide horizontal scaling. Partitions are split among consumers in a group.
Group Coordinator: Broker managing consumer group membership and rebalancing.
Offset Management: Tracking processing progress in __consumer_offsets.
Commit Strategies: enable.auto.commit vs commitSync() / commitAsync().
Rebalance Protocols: Eager (stop-the-world) vs Cooperative Sticky (incremental).

5. Next Steps

Now that we can produce and consume data at scale, we need to learn how to transform it in real-time. In the next module, we explore Kafka Streams, where we learn to write stream-processing applications for filtering, joining, and aggregating data without needing an external cluster like Spark or Flink.

Kafka Glossary