The Disk Structure
Kafka is famous for being able to saturate network saturation while writing to disk. How? By relying on sequential I/O and the OS Page Cache.
When you create a topic my-topic with 3 partitions, Kafka creates 3 directories on disk:
my-topic-0/my-topic-1/my-topic-2/
Inside each directory, data is split into Segments.
1. Log Segments
Kafka doesn’t store all messages in one massive file (which would be hard to purge). Instead, it “rolls” a new segment file every X bytes (default 1GB) or Y hours (default 7 days).
A segment consists of:
00000.log: The actual messages.00000.index: Maps Offsets → Physical Byte Position.00000.timeindex: Maps Timestamp → Offset.
[!TIP] Why Sequential I/O? Hard disks (and even SSDs) are fastest when writing linearly. Kafka is an “Append Only” log, which means it avoids slow random writes.
2. Hardware Reality: Zero Copy & Page Cache
How does Kafka send data so fast? It avoids copying bytes into the JVM (“User Space”).
The sendfile System Call
Traditional data transfer involves 4 copies:
- Disk → Kernel Buffer
- Kernel Buffer → Application Buffer (User Space)
- Application Buffer → Socket Buffer (Kernel Space)
- Socket Buffer → NIC Buffer
Kafka uses the sendfile (Zero Copy) system call:
- Disk → Kernel Buffer (Page Cache)
- Kernel Buffer → NIC Buffer
Result: Data never enters the JVM heap. This reduces CPU usage and Garbage Collection overhead significantly.
The Page Cache
Kafka relies heavily on the OS Page Cache (RAM).
- Write: Kafka writes to the filesystem cache. The OS flushes to disk in the background.
- Read: If consumers are caught up, they read directly from RAM (Page Cache).
- Impact: You don’t need a massive JVM Heap. Give memory to the OS instead!
3. The Index: Fast Lookups
If a consumer asks for “Offset 5000”, Kafka doesn’t scan the whole .log file.
- It checks the Offset Index (
.index). - The index is a Sparse Index. It does not have an entry for every message. It might have an entry for offset 4000, 4100, 4200.
- Kafka performs a Binary Search on the index to find the nearest position ≤ 5000 (e.g., 4900).
- It jumps to that byte position in the
.logfile and scans forward to find 5000.
Why Sparse? It keeps the index small enough to fit entirely in RAM (Page Cache).
4. Log Compaction
By default, Kafka deletes old segments (cleanup.policy=delete). But what if you want to keep the latest state for every key forever?
Log Compaction (cleanup.policy=compact) ensures that Kafka retains at least the last known value for each message key.
- Use Case: Restoring state (e.g., a “User Profile” table). You don’t care about the user’s address 5 years ago, only the current one.
- Mechanism: A background thread (Cleaner) scans the log and removes records where a newer record with the same key exists.
[!WARNING] Tombstones: To delete a key in a compacted topic, you write a message with
nullvalue (a tombstone). Kafka eventually removes the key entirely.
5. Interactive: Index Lookup Simulator
Simulate how Kafka finds a message. It performs a Binary Search on the sparse index to find the closest offset, then scans the log.
6. Code: Configuring Retention
Here is how you configure log compaction and retention policies.