Module Review: Architecture
Youβve mastered the core building blocks of Kafka. Here is a summary of what really matters for system design interviews and production engineering.
π Key Takeaways
- Kafka is a Log, not a Queue πͺ΅
- Messages persist on disk. Consumers are just readers with a bookmark (Offset).
-
Benefit: Decouples producers and consumers completely.
- Partitions = Scalability π
- A topic is split into P partitions. This allows P consumers to read in parallel.
-
Hardware Reality: Sequential I/O makes disk writes nearly as fast as network transfer.
- Keys = Ordering π
- Messages with the same key (e.g.,
user_id) always go to the same partition. -
Guarantee: Strict ordering is only guaranteed within a partition, not across the topic.
- ISR = Availability + Consistency π‘οΈ
- Only In-Sync Replicas (ISR) can become leaders.
-
min.insync.replicasprotects against data loss whenacks=all. - Zero Copy = Efficiency β‘
- Kafka uses
sendfileto transfer data from Disk β NIC without copying to JVM heap. -
This reduces CPU usage and GC pauses.
Module Review: Kafka Architecture
[!NOTE] This module explores the core principles of Module Review: Kafka Architecture, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. π§ Interactive Flashcards
Click a card to reveal the answer.
Unit of Scalability
What is the fundamental unit of parallelism in Kafka?
The Partition
Topics are split into partitions. This allows multiple consumers to read simultaneously.
Ordering Guarantee
How do you ensure strict ordering for a specific user's events?
Message Keys
Use the User ID as the key. Kafka hashes the key to ensure all events land in the same partition.
ISR vs Quorum
Why does Kafka use ISR instead of Quorum (Majority)?
Availability
ISR allows a cluster of N nodes to survive N-1 failures. Quorum requires N/2 + 1 nodes to be alive.
Zero Copy
What system call does Kafka use to avoid JVM copying?
sendfile()
It transfers data directly from the Page Cache to the NIC buffer.
Log Compaction
What does cleanup.policy=compact do?
State Restoration
It keeps only the latest value for every key. Useful for restoring state (e.g., KTable).
Page Cache
Where does Kafka write data first?
OS Page Cache
Kafka writes to RAM (Page Cache). The OS flushes to disk in the background.
2. π Cheat Sheet
| Concept | Description | Trade-off |
|---|---|---|
| Topic | Logical stream of data. | N/A |
| Partition | Physical shard. Unit of parallelism. | More partitions = Higher throughput, but higher open file limit. |
| Replication Factor | Number of copies (default 3). | Higher = Better durability, but 3x storage cost. |
| acks=0 | Fire and forget. | Max speed, High data loss risk. |
| acks=1 | Leader confirmed. | Fast, Medium risk (if Leader fails before sync). |
| acks=all | Leader + ISR confirmed. | Slowest, Zero data loss risk. |
| min.insync.replicas | Min replicas required for acks=all. |
Sets availability vs consistency. |
| Segment | File slice of a partition log. | Smaller segments = faster deletion, more file handles. |
| Index | Maps Offset β Position. | Sparse index fits in RAM. |
3. π Next Steps
Now that you understand the architecture, letβs learn how to ingest data.
- Next Module: Producers & Ingestion
- Reference: Kafka Glossary