Consumer Groups
In a production environment, you rarely have just one instance of your application. You want multiple instances working together to process data faster. In Kafka, this is handled through Consumer Groups.
1. How They Work
When multiple consumers share the same group.id, Kafka automatically divides the partitions of a topic among them.
- 1:1 Mapping: Ideally, each consumer is assigned a unique set of partitions.
- Load Balancing: If you add a new consumer to the group, Kafka triggers a Rebalance, moving some partitions from the existing consumers to the new one.
- Failover: If a consumer crashes, its partitions are reassigned to the remaining healthy consumers.
2. Partition Assignment
The rule is simple: Each partition can be assigned to exactly one consumer within a group.
- If you have 4 partitions and 2 consumers: Each consumer gets 2 partitions.
- If you have 4 partitions and 4 consumers: Each consumer gets 1 partition.
- If you have 4 partitions and 5 consumers: One consumer will sit IDLE, as there are no more partitions to assign.
3. The Group Coordinator
Kafka uses a special broker known as the Group Coordinator to manage the group’s state.
- JoinGroup: Consumers send a request to join a group.
- SyncGroup: The coordinator assigns partitions to the consumers.
- Heartbeats: Consumers send periodic heartbeats to prove they are still alive. If a heartbeat is missed (e.g., due to a crash or a long GC pause), a rebalance is triggered.
4. Interactive: Group Scalability
Add consumers to the group and see how partitions are reassigned.
5. Summary
Consumer groups are the backbone of Kafka’s scalability. By simply adding more consumer instances with the same group.id, you can increase your processing capacity to match the number of partitions in your topic.