Kafka: The Infinite Tape Recorder

1. Queue vs Stream

Most developers confuse RabbitMQ (Queue) and Kafka (Stream). They solve different problems.

  • RabbitMQ (The Mailbox):
    • Model: Smart Broker, Dumb Consumer.
    • Behavior: You take a letter out, and it’s gone (deleted).
    • Use Case: Task processing, Job Queues (e.g., “Resize this image”).
  • Kafka (The Tape Recorder):
    • Model: Dumb Broker, Smart Consumer.
    • Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
    • Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC).

[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.


2. Architecture: The Log

Kafka’s core abstraction is the Log—an append-only sequence of records.

A. Topic & Partition

A Topic is a category (e.g., user_clicks). Since one server can’t hold all data, the Topic is split into Partitions.

  • Partition: An ordered, immutable sequence of messages.
  • Ordering: Guaranteed only within a partition. Global ordering across partitions is NOT guaranteed.
  • Hash Key: Partition = hash(Key) % NumPartitions. All messages for User:123 go to the same partition (e.g., P0).

B. The Offset

Each message in a partition has a unique ID called the Offset.

  • The Consumer tracks what it has read.
  • “I’ve read up to Offset 500 in Partition 0.”

C. Consumer Groups (The Secret Sauce)

How do we scale processing? We add consumers to a Consumer Group.

  • Rule: A partition is consumed by exactly one consumer in the group.
  • Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.

3. Interactive Demo: The Kafka Cluster Simulator

Visualize how Partitions distribute load and how Consumer Groups scale.

  • Scenario: High-volume clickstream data.
  • Components:
    • Producers: Sending events (colored by Key/Partition).
    • Brokers: Hosting 3 Partitions (P0, P1, P2).
    • Consumer Group: Reading from partitions.
  • Action: Add Consumers to see Rebalancing. Watch the “Lag” build up if consumption is slow.
Producer Rate
Total Messages: 0
Group Lag: 0
Consumers: 0

4. Under the Hood: Why is Kafka Fast?

Kafka can handle millions of messages/sec on spinning hard disks. How?

A. Sequential I/O

Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.

  • Result: Disk speeds approach Network speeds.

B. Zero Copy

Standard File Send:

  1. Disk -> OS Cache (Kernel)
  2. OS Cache -> App Buffer (User Space)
  3. App Buffer -> Socket Buffer (Kernel)
  4. Socket Buffer -> NIC

Kafka uses the sendfile() system call (Zero Copy):

  1. Disk -> OS Cache
  2. OS Cache -> NIC
    • Result: No Context Switches, no redundant copying.

5. Durability & Reliability

How do we ensure we never lose data?

A. Replication (ISR)

Each partition has one Leader and multiple Followers.

  • Leader: Handles all Reads and Writes.
  • Follower: Passively replicates data from the Leader.
  • ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.

B. Producer Acks

The producer decides how safe the write must be.

  • acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
  • acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
  • acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.

[!TIP] Trade-off: acks=all is slower (higher latency) but safest. Use it for Payments. Use acks=1 for Metrics.

6. Exactly-Once Semantics (EOS)

This is the “Holy Grail” of messaging. How do we ensure a message is processed exactly once, even if the producer retries or the consumer crashes?

1. Idempotent Producer

Kafka assigns a unique ID (PID) and Sequence Number to each producer message.

  • If the producer sends Msg 5 twice (due to network timeout), the Broker sees Seq: 5 again and discards the duplicate.
  • This solves duplication during sending.

2. Transactional Messaging (Read-Process-Write)

What if you read from Topic A, process it, and write to Topic B? If you crash in the middle, you might re-process the message. Kafka supports Atomic Transactions across multiple partitions.

  • Begin Transaction.
  • Consume from A (Offset X).
  • Produce to B.
  • Commit Transaction.
  • Only then is Offset X marked as “Consumed” and the message in B becomes visible to consumers (read_committed).

7. Advanced Concepts

A. Rebalancing

When a consumer joins or leaves, the group “rebalances”.

  • Stop-the-World: Traditionally, all consumption stopped during rebalance.
  • Incremental: Newer Kafka versions allow partial rebalancing.

B. Log Compaction

Kafka can act as a database. With Log Compaction, Kafka keeps only the latest value for each key.

  • Input: (Key: User1, Value: Alice), (Key: User1, Value: Bob)
  • Compacted: (Key: User1, Value: Bob)
  • Use Case: Restoring application state (e.g., KTable) after a crash without replaying the entire history.

8. Summary

  • Kafka is a Log: Think of it as a distributed file system, not a queue.
  • Partitions: The unit of scalability.
  • Zero Copy: The secret sauce behind its speed.
  • Consumer Groups: Enable horizontal scaling of processing.
  • Exactly-Once: Possible, but requires configuration (acks=all, idempotent producer, transactions).