Kafka: The Infinite Tape Recorder

1. Queue vs Stream

Most developers confuse RabbitMQ (Queue) and Kafka (Stream). They solve different problems.

RabbitMQ (The Mailbox):
- Model: Smart Broker, Dumb Consumer.
- Behavior: You take a letter out, and it’s gone (deleted).
- Use Case: Task processing, Job Queues (e.g., “Resize this image”).
Kafka (The Tape Recorder):
- Model: Dumb Broker, Smart Consumer.
- Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
- Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC).

[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.

2. Architecture: The Log

Kafka’s core abstraction is the Log—an append-only sequence of records.

A. Topic & Partition

A Topic is a category (e.g., user_clicks). Since one server can’t hold all data, the Topic is split into Partitions.

Partition: An ordered, immutable sequence of messages.
Ordering: Guaranteed only within a partition. Global ordering across partitions is NOT guaranteed.
Hash Key: Partition = hash(Key) % NumPartitions. All messages for User:123 go to the same partition (e.g., P0).

B. The Offset

Each message in a partition has a unique ID called the Offset.

The Consumer tracks what it has read.
“I’ve read up to Offset 500 in Partition 0.”

C. Consumer Groups (The Secret Sauce)

How do we scale processing? We add consumers to a Consumer Group.

Rule: A partition is consumed by exactly one consumer in the group.
Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.

3. Interactive Demo: The Kafka Cluster Simulator

Visualize how Partitions distribute load and how Consumer Groups scale.

Scenario: High-volume clickstream data.
Components:
- Producers: Sending events (colored by Key/Partition).
- Brokers: Hosting 3 Partitions (P0, P1, P2).
- Consumer Group: Reading from partitions.
Action: Add Consumers to see Rebalancing. Watch the “Lag” build up if consumption is slow.

Producer Rate

Total Messages: 0
Group Lag: 0
Consumers: 0

4. Under the Hood: Why is Kafka Fast?

Kafka can handle millions of messages/sec on spinning hard disks. How?

A. Sequential I/O

Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.

Result: Disk speeds approach Network speeds.

B. Zero Copy

Standard File Send:

Disk -> OS Cache (Kernel)
OS Cache -> App Buffer (User Space)
App Buffer -> Socket Buffer (Kernel)
Socket Buffer -> NIC

Kafka uses the sendfile() system call (Zero Copy):

Disk -> OS Cache
OS Cache -> NIC
- Result: No Context Switches, no redundant copying.

5. Durability & Reliability

How do we ensure we never lose data?

A. Replication (ISR)

Each partition has one Leader and multiple Followers.

Leader: Handles all Reads and Writes.
Follower: Passively replicates data from the Leader.
ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.

B. Producer Acks

The producer decides how safe the write must be.

acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.

[!TIP] Trade-off: acks=all is slower (higher latency) but safest. Use it for Payments. Use acks=1 for Metrics.

6. Exactly-Once Semantics (EOS)

This is the “Holy Grail” of messaging. How do we ensure a message is processed exactly once, even if the producer retries or the consumer crashes?

1. Idempotent Producer

Kafka assigns a unique ID (PID) and Sequence Number to each producer message.

If the producer sends Msg 5 twice (due to network timeout), the Broker sees Seq: 5 again and discards the duplicate.
This solves duplication during sending.

2. Transactional Messaging (Read-Process-Write)

What if you read from Topic A, process it, and write to Topic B? If you crash in the middle, you might re-process the message. Kafka supports Atomic Transactions across multiple partitions.

Begin Transaction.
Consume from A (Offset X).
Produce to B.
Commit Transaction.
Only then is Offset X marked as “Consumed” and the message in B becomes visible to consumers (read_committed).

7. Advanced Concepts

A. Rebalancing

When a consumer joins or leaves, the group “rebalances”.

Stop-the-World: Traditionally, all consumption stopped during rebalance.
Incremental: Newer Kafka versions allow partial rebalancing.

B. Log Compaction

Kafka can act as a database. With Log Compaction, Kafka keeps only the latest value for each key.

Input: (Key: User1, Value: Alice), (Key: User1, Value: Bob)
Compacted: (Key: User1, Value: Bob)
Use Case: Restoring application state (e.g., KTable) after a crash without replaying the entire history.

8. Summary

Kafka is a Log: Think of it as a distributed file system, not a queue.
Partitions: The unit of scalability.
Zero Copy: The secret sauce behind its speed.
Consumer Groups: Enable horizontal scaling of processing.
Exactly-Once: Possible, but requires configuration (acks=all, idempotent producer, transactions).