Kafka & Streaming

In 2011, LinkedIn’s infrastructure team was drowning in data. Their Hadoop jobs processed user activity (profile views, connection clicks, skill endorsements) 24 hours after it happened using batch ETL pipelines. By the time the “People You May Know” recommendation updated, the underlying data was a day old. The team — Jay Kreps, Neha Narkhede, and Jun Rao — invented Kafka to solve this. Named after the author of “The Metamorphosis” (they liked the name for a “write-heavy” system), Kafka was a fundamentally different idea: instead of a queue that deletes messages after consumption, treat every event as a durable, replayable log. LinkedIn open-sourced it in 2011. By 2019, more than one-third of all Fortune 500 companies ran Kafka. In 2021, Confluent (founded by the Kafka inventors) went public at $9B. Kafka is now processed by over 80% of the companies doing large-scale stream processing — all from a problem LinkedIn had with stale recommendation data.

[!IMPORTANT] In this lesson, you will master:

The Log Abstraction: Why a simple append-only file is the most powerful weapon in high-throughput distributed systems.

Hardware Acceleration: Exploiting Sequential Disk I/O, Page Cache, and Zero-Copy to reach 1Gbps+ network saturation.

Distributed Coordination: Mastering Partitions, Consumer Group Rebalancing, and In-Sync Replicas (ISR) for high availability.

1. The Beginner’s Guide: What is Kafka?

If you’ve never used Kafka, the easiest way to understand it is to think of a Diary (or a Tape Recorder).

When you write in a diary:

You append new entries to the end of the book.
You never erase or modify old entries (they are historical facts).
Anyone can read your diary from page 1 to the end, or skip to chapter 5 and start reading from there.

This is exactly how Kafka works. It is not a traditional “queue” where messages are deleted after being read. It is a Distributed Append-Only Log.

Queue vs Stream

Most developers confuse traditional Message Queues (like RabbitMQ) with Event Streams (like Kafka). They solve fundamentally different problems.

RabbitMQ (The Mailbox/Queue):
Model: Smart Broker, Dumb Consumer.
Behavior: You take a letter out, and it’s gone (deleted).
Use Case: Task processing, Job Queues (e.g., “Resize this image”). The goal is to get a task done once.
Kafka (The Diary/Stream):
Model: Dumb Broker, Smart Consumer.
Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC). The goal is to record a history of facts so multiple different systems can react to them.

[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.

2. Architecture: The Log

Kafka’s core abstraction is the Log—an append-only sequence of records.

A. Topic & Partition

Hash Key: Partition = hash(Key) % NumPartitions. All messages for User:123 go to the same partition (e.g., P0).
Ordering Guarantee: Kafka only guarantees strict ordering within a single partition. There is no global order across the entire topic. If order matters (e.g., sequentially processing DEBIT and CREDIT events for user_id=123), you MUST use a Partition Key to ensure all events for that entity land in the same partition and are processed sequentially by the same consumer.

[!NOTE] Hardware-First Intuition: The “Partition Skew” Hotspot. While Kafka scales horizontally, each Partition is physically stored on one server. If you shard by Country and 90% of your users are from the US, one single hardware node (Disk + NIC) will handle 90% of the traffic, while the other 9 nodes sit idle. This is a Hardware Bottleneck caused by bad software logic. Always choose a high-cardinality key (like user_id) to ensure your hardware load is evenly distributed across the cluster’s physical backplane.

B. The Offset

Each message in a partition has a unique ID called the Offset.

The Consumer tracks what it has read.
“I’ve read up to Offset 500 in Partition 0.”

C. Consumer Groups (The Secret Sauce)

How do we scale processing? We add consumers to a Consumer Group.

Rule: A partition is consumed by exactly one consumer in the group.
Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.

3. Interactive Demo: The Kafka Cluster Simulator

Visualize how Partitions distribute load and how Consumer Groups scale.

[!TIP] Try it yourself: Add Consumers until you have 3 (one for each Partition). Then kill one and watch the Rebalancing process!

Producer Rate

REBALANCING GROUP...

Total Messages: 0

Group Lag: 0

Consumers: 0

4. Under the Hood: Why is Kafka Fast?

Kafka can handle millions of messages/sec on spinning hard disks. How?

A. Sequential I/O

Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.

Result: Disk speeds approach Network speeds.

B. Zero Copy

Standard File Send involves 4 copies and 4 context switches.

Disk → Kernel Buffer (Read)
Kernel → User Space (App Buffer)
User Space → Kernel Socket Buffer (Write)
Socket Buffer → NIC (Network Card)

Kafka uses the sendfile() system call (Zero Copy):

Disk → Kernel Buffer
Kernel Buffer → NIC (via Descriptor)

Visualizing the Savings

                |
                (DMA Copy only)

[!NOTE] Hardware-First Intuition: The “Page Cache” vs. “Heap” Strategy. Most Java apps store data in the JVM Heap, causing massive Garbage Collection (GC) pauses. Kafka is smarter: it uses the JVM only for control logic. The actual data (messages) is stored in the OS Page Cache (off-heap memory). When a consumer requests data, the OS sends it directly from the RAM to the NIC via DMA (Direct Memory Access). This means Kafka can handle 100GB of data with only a 4GB JVM heap, effectively bypassing the “Stop-the-world” hardware penalty.

5. Distributed Coordination: KRaft vs Zookeeper

For a decade, Kafka relied on Zookeeper to store cluster metadata (broker lists, partitions, offsets).

The Zookeeper Bottleneck: Zookeeper is a separate system with its own hardware requirements. When a large cluster rebalanced, the Zookeeper write throughput would become the bottleneck, leading to “split-brain” scenarios or indefinitely long rebalances.
KRaft (Kafka Raft): In Kafka 3+ (and 4.0 production-ready), Zookeeper is removed. Metadata is now stored directly inside Kafka in a special metadata partition.
- Consensus: It uses the Raft algorithm (similar to Paxos) to elect a “Metadata Leader” among the brokers.
- Scale: KRaft allows clusters to scale to millions of partitions, whereas Zookeeper capped out at around 200,000.

[!TIP] Staff Engineer Tip: The KRaft Advantage Upgrading to KRaft isn’t just about operational simplicity. It significantly reduces the Recovery Time Objective (RTO). In a Zookeeper-based cluster, a controller crash could take several minutes to recover metadata. In KRaft, the standby controllers are “hot,” meaning failover happens in milliseconds, keeping your hardware utilized even during catastrophic node failures.

6. Durability & Reliability

How do we ensure we never lose data?

A. Replication (ISR)

Each partition has one Leader and multiple Followers.

Leader: Handles all Reads and Writes.
Follower: Passively replicates data from the Leader.
ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.

B. Producer Acks

The producer decides how safe the write must be.

acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.

[!TIP] Trade-off: acks=all is slower (higher latency) but safest. Use it for Payments. Use acks=1 for Metrics.

7. Log Compaction: Keeping the Latest State

Kafka can act as a database (KTable). With Log Compaction, Kafka runs a background process that keeps only the latest value for each key.

Example: User Profiles

Input Stream:
1. Key: u_101, Value: {name: "Alice"}
2. Key: u_101, Value: {name: "Alice", city: "NYC"} (Update)
3. Key: u_102, Value: {name: "Bob"}
Compacted Log (What is saved):
1. Key: u_101, Value: {name: "Alice", city: "NYC"}
2. Key: u_102, Value: {name: "Bob"}
Use Case: Restoring application state after a crash without replaying years of history.

[!NOTE] War Story: The New York Times and Log Compaction When The New York Times re-architected their CMS, they didn’t just use Kafka as a temporary message queue. They used Kafka as the Source of Truth for every article published since 1851. By configuring a topic with Log Compaction, they created a permanent, replayable history. If they spin up a new search indexing microservice tomorrow, they simply point it to offset 0 of the compacted topic. It reads chronologically from 1851 to today, building its database without ever querying the primary monolithic database. Kafka isn’t just moving their data; it is their database.

8. Deep Dive: Exactly-Once Semantics (Transactions)

Kafka supports Exactly-Once Semantics (EOS) for critical use cases like payments, order processing, and financial transactions.

The Problem: Duplicates from Retries

Scenario: Producer sends OrderCreated event. Network hiccups. Producer retries. Kafka gets it twice.

Without EOS:

Producer → [OrderCreated] → Kafka (Ack Lost)
Producer → [OrderCreated] → Kafka (Retry, Duplicate!)
Consumer processes order TWICE → Double charge 💸

Solution 1: Idempotent Producer

Config: enable.idempotence=true (default in Kafka 3.0+)

How it Works:

Producer assigns each message a unique sequence number
Broker tracks (ProducerId, Partition, SeqNum) in memory
If duplicate arrives, broker silently discards it

Guarantee: At-Most-Once becomes At-Least-Once without duplicates

Limitation: Only works for single partition writes. If you write to multiple partitions, they’re not atomic.

Solution 2: Transactional API (Multi-Partition Atomicity)

Use Case: Transfer $100 from Account A to Account B. You need to:

Write DebitEvent to account-A topic
Write CreditEvent to account-B topic
Atomic: Both succeed or both fail

Code Example (Java):

Properties props = new Properties();
props.put("transactional.id", "payment-service-1"); // Unique per instance
props.put("enable.idempotence", "true");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions(); // Register with coordinator

try {
  producer.beginTransaction();

  // All-or-nothing batch
  producer.send(new ProducerRecord<>("account-A", "user:123", debitJson));
  producer.send(new ProducerRecord<>("account-B", "user:456", creditJson));
  producer.send(new ProducerRecord<>("orders", "order: 803", orderJson));

  producer.commitTransaction(); // Atomic commit
} catch (Exception e) {
  producer.abortTransaction(); // Rollback all
}

What Happens Under the Hood:

Producer writes to all partitions with a transaction marker
Broker holds messages in pending state (not visible to consumers yet)
On commitTransaction(), broker writes a COMMIT marker to a special __transaction_state topic
Messages become visible atomically

Isolation Levels (Consumer Side)

Config: isolation.level=read_committed (default: read_uncommitted)

Isolation Level	Behavior	Use Case
read_uncommitted	Reads all messages (even pending transactions)	High throughput, eventual consistency OK
read_committed	Only reads committed messages	Financial systems, strong consistency

Trade-off: read_committed adds ~10ms latency (waits for transaction markers)

Production Pattern: E-Commerce Order Flow

Without Transactions (Risky):

Deduct inventory (Kafka write)
Network fails
Charge payment (Kafka write succeeds)
Customer charged, but item still in stock → Oversold!

With Transactions (Safe):

producer.beginTransaction();

// Step 1: Reserve inventory
producer.send(inventoryTopic, "RESERVE:item-123:qty-1");

// Step 2: Charge payment
producer.send(paymentTopic, "CHARGE:user-789:$99");

// Step 3: Create order
producer.send(ordersTopic, "ORDER:order-456:CONFIRMED");

producer.commitTransaction(); // All 3 succeed or all 3 fail

Result: No partial state. Either order fully succeeds or fully fails.

[!IMPORTANT] Interview Insight: Kafka transactions are NOT like database ACID transactions. They guarantee atomicity within Kafka, but if your consumer crashes after reading, you still need idempotent processing downstream.

9. Deep Dive: Consumer Group Rebalancing Strategies

When a consumer joins or leaves a group, Kafka rebalances partition ownership. This can cause downtime.

Rebalancing Protocol Comparison

Strategy	Downtime	How It Works	Config	Use Case
Eager (Range/RoundRobin)	30s stop-the-world	All consumers stop, reassign all partitions, resume	`partition.assignment.strategy=range`	Small groups (<10 consumers)
Cooperative (Incremental)	0s (gradual)	Only affected partitions are reassigned	`partition.assignment.strategy=cooperative-sticky`	Large groups (>100 consumers)
Static Group Membership	0s (on planned restart)	Consumer keeps same ID across restarts	`group.instance.id=pod-name`	Kubernetes rolling deploys

Eager Rebalancing (Default, Legacy)

Flow:

Consumer C3 crashes
All consumers stop processing (revoke all partitions)
Coordinator reassigns partitions
All consumers resume

Downtime: ~10-30 seconds (depends on session.timeout.ms)

Problem: If you have 100 consumers and 1 crashes, all 100 stop. Overkill.

Cooperative Rebalancing (Kafka 2.4+, Recommended)

Flow:

Consumer C3 crashes (owned P5, P8)
Only P5 and P8 are revoked
Coordinator reassigns P5→C1, P8→C4
Other 97 consumers keep processing

Downtime: 0 seconds for unaffected partitions

Config:

partition.assignment.strategy=cooperative-sticky

[!TIP] Production Default: Always use cooperative-sticky for zero-downtime deployments.

Static Group Membership (Session Pinning)

Problem: Kubernetes rolling deploy restarts pods. Each restart triggers rebalance.

Solution: Assign a static group instance ID:

group.instance.id=payment-consumer-pod-7

Behavior:

Consumer restarts with same ID → No rebalance (partitions retained)
Coordinator waits session.timeout.ms (default: 45s) before reassigning
If pod comes back within 45s, it resumes seamlessly

Use Case: Kubernetes StatefulSets, planned maintenance

10. Production Configuration Guide

Broker-Level Tuning

Config	Description	Default	Tuned (High Throughput)
`num.network.threads`	Threads handling requests	3	8 (for high concurrency)
`num.io.threads`	Threads writing to disk	8	16 (SSD clusters)
`socket.send.buffer.bytes`	TCP send buffer	100KB	1MB (10Gbps networks)
`log.segment.bytes`	Max segment file size	1GB	1GB (keep default)

Producer Tuning: Latency vs Throughput

Config	Low Latency	High Throughput	Why
linger.ms	0	100	Wait time before sending batch (batching improves compression)
batch.size	16KB	1MB	Larger batches = better compression ratio
compression.type	none	zstd	`zstd` is fastest modern codec (Kafka 2.1+)
acks	1	all	`1`=leader only (fast), `all`=ISR quorum (durable)
buffer.memory	32MB	128MB	Total memory for pending sends

Compression Benchmark (1GB dataset):

none: 1000 MB, 100 MB/s
gzip: 350 MB, 40 MB/s (high CPU)
lz4: 450 MB, 90 MB/s (balanced)
zstd: 300 MB, 95 MB/s (best ratio + speed)

[!TIP] Interview Tip: Use linger.ms=10 and batch.size=256KB as a balanced starting point. Tune based on P99 latency requirements.

Consumer Tuning: Polling Behavior

Config	Description	Default	Tuned
fetch.min.bytes	Min data to fetch before returning	1 byte	1MB (reduce poll frequency)
fetch.max.wait.ms	Max wait if fetch.min.bytes not met	500ms	100ms (lower latency)
max.poll.records	Max records per poll()	500	100 (if processing is slow)
session.timeout.ms	Max time between heartbeats	45s	30s (faster failure detection)

Rule of Thumb:

Fast processing (e.g., logs): max.poll.records=500
Slow processing (e.g., ML inference): max.poll.records=10 (avoid rebalance timeout)

11. Summary

Kafka is a Log: Think of it as a distributed file system, not a queue.
Partitions: The unit of parallelism. Max Consumers = Max Partitions.
Durability: Ensured via ISR and Producer Acks (acks=all).
Zero Copy: The secret sauce behind its speed.
Consumer Groups: Enable horizontal scaling of processing.
Exactly-Once Semantics: Possible via transactions and idempotent producers.
Rebalancing: Prefer cooperative-sticky for zero-downtime deployments.
Production Tuning: Balance latency (linger.ms=0) vs throughput (linger.ms=100, compression=zstd).

Staff Engineer Tip: Watch out for Page Faults when consumers fall behind. If a consumer is reading “Fresh” data (at the tail of the log), it hits the RAM Page Cache (60ns latency). If the consumer lags and needs data from 1 hour ago, the OS must fetch it from the Physical Disk (10ms latency). This is the “Tail Latency Cliff.” When one consumer lags, it can cause the broker to start “thrashing” its disk I/O, potentially slowing down all other healthy consumers on that same hardware node.

Mnemonic — “Kafka = Tape Recorder, RabbitMQ = Mailbox”: RabbitMQ = Smart Broker (“I’ll route it for you”), message deleted after consumption, use for tasks. Kafka = Dumb Broker (“Just append it”), messages retained for days, use for events/streams. Kafka speed formula: Sequential I/O (no seek) + Zero Copy (sendfile syscall, 4 copies → 2) + Page Cache (RAM, not Disk). Partition = unit of parallelism. Max Consumers = Max Partitions. ISR = safety net. acks=all = strongest durability. Consumer Lag > 0 = watch for Page Faults cliff.

Kafka & Streaming

Kafka & Streaming

1. The Beginner’s Guide: What is Kafka?

Queue vs Stream

2. Architecture: The Log

A. Topic & Partition

B. The Offset

C. Consumer Groups (The Secret Sauce)

3. Interactive Demo: The Kafka Cluster Simulator

4. Under the Hood: Why is Kafka Fast?

A. Sequential I/O

B. Zero Copy

Visualizing the Savings

5. Distributed Coordination: KRaft vs Zookeeper

6. Durability & Reliability

A. Replication (ISR)

B. Producer Acks

7. Log Compaction: Keeping the Latest State

Example: User Profiles

8. Deep Dive: Exactly-Once Semantics (Transactions)

The Problem: Duplicates from Retries

Solution 1: Idempotent Producer

Solution 2: Transactional API (Multi-Partition Atomicity)

Isolation Levels (Consumer Side)

Production Pattern: E-Commerce Order Flow

9. Deep Dive: Consumer Group Rebalancing Strategies

Rebalancing Protocol Comparison

Eager Rebalancing (Default, Legacy)

Cooperative Rebalancing (Kafka 2.4+, Recommended)

Static Group Membership (Session Pinning)

10. Production Configuration Guide

Broker-Level Tuning

Producer Tuning: Latency vs Throughput

Consumer Tuning: Polling Behavior

11. Summary

Found this lesson helpful?