Module Review: Architecture

[!NOTE] This module explores the core principles of Kafka Architecture, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

You’ve mastered the core building blocks of Kafka. Here is a summary of what really matters for system design interviews and production engineering.

🔑 Key Takeaways

  • Kafka is a Log, not a Queue 🪵
  • Messages persist on disk. Consumers are just readers with a bookmark (Offset).
  • Benefit: Decouples producers and consumers completely.

  • Partitions = Scalability 🚀
  • A topic is split into P partitions. This allows P consumers to read in parallel.
  • Hardware Reality: Sequential I/O makes disk writes nearly as fast as network transfer.

  • Keys = Ordering 🔑
  • Messages with the same key (e.g., user_id) always go to the same partition.
  • Guarantee: Strict ordering is only guaranteed within a partition, not across the topic.

  • ISR = Availability + Consistency 🛡️
  • Only In-Sync Replicas (ISR) can become leaders.
  • min.insync.replicas protects against data loss when acks=all.

  • Zero Copy = Efficiency
  • Kafka uses sendfile to transfer data from Disk → NIC without copying to JVM heap.
  • This reduces CPU usage and GC pauses.

1. 🧠 Interactive Flashcards

Click a card to reveal the answer.

Unit of Scalability

What is the fundamental unit of parallelism in Kafka?

The Partition

Topics are split into partitions. This allows multiple consumers to read simultaneously.

Ordering Guarantee

How do you ensure strict ordering for a specific user's events?

Message Keys

Use the User ID as the key. Kafka hashes the key to ensure all events land in the same partition.

ISR vs Quorum

Why does Kafka use ISR instead of Quorum (Majority)?

Availability

ISR allows a cluster of N nodes to survive N-1 failures. Quorum requires N/2 + 1 nodes to be alive.

Zero Copy

What system call does Kafka use to avoid JVM copying?

sendfile()

It transfers data directly from the Page Cache to the NIC buffer.

Log Compaction

What does cleanup.policy=compact do?

State Restoration

It keeps only the latest value for every key. Useful for restoring state (e.g., KTable).

Page Cache

Where does Kafka write data first?

OS Page Cache

Kafka writes to RAM (Page Cache). The OS flushes to disk in the background.


2. 📝 Cheat Sheet

Concept Description Trade-off
Topic Logical stream of data. N/A
Partition Physical shard. Unit of parallelism. More partitions = Higher throughput, but higher open file limit.
Replication Factor Number of copies (default 3). Higher = Better durability, but 3x storage cost.
acks=0 Fire and forget. Max speed, High data loss risk.
acks=1 Leader confirmed. Fast, Medium risk (if Leader fails before sync).
acks=all Leader + ISR confirmed. Slowest, Zero data loss risk.
min.insync.replicas Min replicas required for acks=all. Sets availability vs consistency.
Segment File slice of a partition log. Smaller segments = faster deletion, more file handles.
Index Maps Offset → Position. Sparse index fits in RAM.

3. 🚀 Quick Revision

  • Kafka Core: A Distributed Commit Log, not a traditional message queue.
  • Partitions: The key to horizontal scalability and parallel consumption.
  • Ordering: Guaranteed only within a specific partition using message keys.
  • Replication Model: Leader-Follower architecture with In-Sync Replicas (ISR) ensuring no data loss.
  • Data Integrity: Controlled by producer ACKs (acks=all with min.insync.replicas for maximum safety).
  • Storage Efficiency: Utilizes sequential I/O, OS Page Cache, and Zero Copy (sendfile) for fast disk-to-network transfer.
  • Log Compaction: Keeps the latest state for keys instead of just deleting old data based on time or size.

4. 🔗 Next Steps

Now that you understand the architecture, let’s learn how to ingest data.