Offset Management
Kafka doesn’t track which messages have been read by which consumers. Instead, consumers are responsible for telling Kafka where they are in the stream. This is done by Committing Offsets to a special internal topic called __consumer_offsets.
1. The Offset pointer
An offset is a simple integer that marks the position of the last successfully processed message.
- Current Offset: Points to the next message the consumer will read.
- Committed Offset: Points to the last message the consumer has confirmed as “processed.”
2. Commit Strategies
Automatic Commits
By default, the consumer commits the latest offset every 5 seconds (enable.auto.commit=true).
- Pros: Easy to use.
- Cons: Risky. If your application crashes after reading a message but before processing it, those messages might be lost because the auto-commit happened in the background.
Manual Commits
For critical data, you should set enable.auto.commit=false and call .commitSync() or .commitAsync() yourself.
- Synchronous: Blocks your application until the broker acknowledges the commit. Safest but slowest.
- Asynchronous: Doesn’t block. Faster but harder to handle errors if the commit fails.
3. Offset Reset Policy
What happens if a new consumer joins and there are no existing offsets? Or if the offsets are too old?
- latest: Start reading from the end of the topic (Ignore history).
- earliest: Start reading from the very beginning of the topic (Replay all data).
- none: Throw an exception if no offsets are found.
4. Interactive: The Offset Gap
Watch the gap between the “Latest” and “Committed” offset during processing.
5. Why it Matters
Choosing the right commit strategy determines your delivery guarantee:
- At-Least-Once: Commit after processing. If you crash, you’ll re-read the last batch. (Most common).
- At-Most-Once: Commit before processing. If you crash, you skip the last batch.
- Exactly-Once: Use Kafka Transactions to commit the offset and the result of the processing atomically.