Kafka & Streaming

[!NOTE] This module explores the core principles of Kafka & Streaming, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Queue vs Stream

Most developers confuse RabbitMQ (Queue) and Kafka (Stream). They solve different problems.

RabbitMQ (The Mailbox):
Model: Smart Broker, Dumb Consumer.
Behavior: You take a letter out, and it’s gone (deleted).
Use Case: Task processing, Job Queues (e.g., “Resize this image”).
Kafka (The Tape Recorder):
Model: Dumb Broker, Smart Consumer.
Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC).

[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.

2. Architecture: The Log

Kafka’s core abstraction is the Log—an append-only sequence of records.

A. Topic & Partition

A Topic is a category (e.g., user_clicks). Since one server can’t hold all data, the Topic is split into Partitions.

Partition: An ordered, immutable sequence of messages.
Ordering: Guaranteed only within a partition. Global ordering across partitions is NOT guaranteed.
Hash Key: Partition = hash(Key) % NumPartitions. All messages for User:123 go to the same partition (e.g., P0).

B. The Offset

Each message in a partition has a unique ID called the Offset.

The Consumer tracks what it has read.
“I’ve read up to Offset 500 in Partition 0.”

C. Consumer Groups (The Secret Sauce)

How do we scale processing? We add consumers to a Consumer Group.

Rule: A partition is consumed by exactly one consumer in the group.
Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.

3. Interactive Demo: The Kafka Cluster Simulator

Visualize how Partitions distribute load and how Consumer Groups scale.

[!TIP] Try it yourself: Add Consumers until you have 3 (one for each Partition). Then kill one and watch the Rebalancing process!

Producer Rate

REBALANCING GROUP...

Total Messages: 0

Group Lag: 0

Consumers: 0

4. Under the Hood: Why is Kafka Fast?

Kafka can handle millions of messages/sec on spinning hard disks. How?

A. Sequential I/O

Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.

Result: Disk speeds approach Network speeds.

B. Zero Copy

Standard File Send involves 4 copies and 4 context switches.

Disk → Kernel Buffer (Read)
Kernel → User Space (App Buffer)
User Space → Kernel Socket Buffer (Write)
Socket Buffer → NIC (Network Card)

Kafka uses the sendfile() system call (Zero Copy):

Disk → Kernel Buffer
Kernel Buffer → NIC (via Descriptor)

Visualizing the Savings

STANDARD I/O (Wasteful)

  [Disk] --> [Kernel] --> [User App] --> [Kernel] --> [NIC]
                ^           |

                |___________|
                 CPU Copy

  ZERO COPY (Efficient)

  [Disk] --> [Kernel] ---------------------------> [NIC]

                |
                (DMA Copy only)

5. Durability & Reliability

How do we ensure we never lose data?

A. Replication (ISR)

Each partition has one Leader and multiple Followers.

Leader: Handles all Reads and Writes.
Follower: Passively replicates data from the Leader.
ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.

B. Producer Acks

The producer decides how safe the write must be.

acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.

[!TIP] Trade-off: acks=all is slower (higher latency) but safest. Use it for Payments. Use acks=1 for Metrics.

6. Log Compaction: Keeping the Latest State

Kafka can act as a database (KTable). With Log Compaction, Kafka runs a background process that keeps only the latest value for each key.

Example: User Profiles

Input Stream:
1. Key: u_101, Value: {name: "Alice"}
2. Key: u_101, Value: {name: "Alice", city: "NYC"} (Update)
3. Key: u_102, Value: {name: "Bob"}
Compacted Log (What is saved):
1. Key: u_101, Value: {name: "Alice", city: "NYC"}
2. Key: u_102, Value: {name: "Bob"}
Use Case: Restoring application state after a crash without replaying years of history.

7. Deep Dive: Exactly-Once Semantics (Transactions)

Kafka supports Exactly-Once Semantics (EOS) for critical use cases like payments, order processing, and financial transactions.

The Problem: Duplicates from Retries

Scenario: Producer sends OrderCreated event. Network hiccups. Producer retries. Kafka gets it twice.

Without EOS:

Producer → [OrderCreated] → Kafka (Ack Lost)
Producer → [OrderCreated] → Kafka (Retry, Duplicate!)
Consumer processes order TWICE → Double charge 💸

Solution 1: Idempotent Producer

Config: enable.idempotence=true (default in Kafka 3.0+)

How it Works:

Producer assigns each message a unique sequence number
Broker tracks (ProducerId, Partition, SeqNum) in memory
If duplicate arrives, broker silently discards it

Guarantee: At-Most-Once becomes At-Least-Once without duplicates

Limitation: Only works for single partition writes. If you write to multiple partitions, they’re not atomic.

Solution 2: Transactional API (Multi-Partition Atomicity)

Use Case: Transfer $100 from Account A to Account B. You need to:

Write DebitEvent to account-A topic
Write CreditEvent to account-B topic
Atomic: Both succeed or both fail

Code Example (Java):

Properties props = new Properties();
props.put("transactional.id", "payment-service-1"); // Unique per instance
props.put("enable.idempotence", "true");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions(); // Register with coordinator

try {
  producer.beginTransaction();

  // All-or-nothing batch
  producer.send(new ProducerRecord<>("account-A", "user:123", debitJson));
  producer.send(new ProducerRecord<>("account-B", "user:456", creditJson));
  producer.send(new ProducerRecord<>("orders", "order: 803", orderJson));

  producer.commitTransaction(); // Atomic commit
} catch (Exception e) {
  producer.abortTransaction(); // Rollback all
}

What Happens Under the Hood:

Producer writes to all partitions with a transaction marker
Broker holds messages in pending state (not visible to consumers yet)
On commitTransaction(), broker writes a COMMIT marker to a special __transaction_state topic
Messages become visible atomically

Isolation Levels (Consumer Side)

Config: isolation.level=read_committed (default: read_uncommitted)

Isolation Level	Behavior	Use Case
read_uncommitted	Reads all messages (even pending transactions)	High throughput, eventual consistency OK
read_committed	Only reads committed messages	Financial systems, strong consistency

Trade-off: read_committed adds ~10ms latency (waits for transaction markers)

Production Pattern: E-Commerce Order Flow

Without Transactions (Risky):

Deduct inventory (Kafka write)
Network fails
Charge payment (Kafka write succeeds)
Customer charged, but item still in stock → Oversold!

With Transactions (Safe):

producer.beginTransaction();

// Step 1: Reserve inventory
producer.send(inventoryTopic, "RESERVE:item-123:qty-1");

// Step 2: Charge payment
producer.send(paymentTopic, "CHARGE:user-789:$99");

// Step 3: Create order
producer.send(ordersTopic, "ORDER:order-456:CONFIRMED");

producer.commitTransaction(); // All 3 succeed or all 3 fail

Result: No partial state. Either order fully succeeds or fully fails.

[!IMPORTANT] Interview Insight: Kafka transactions are NOT like database ACID transactions. They guarantee atomicity within Kafka, but if your consumer crashes after reading, you still need idempotent processing downstream.

8. Deep Dive: Consumer Group Rebalancing Strategies

When a consumer joins or leaves a group, Kafka rebalances partition ownership. This can cause downtime.

Rebalancing Protocol Comparison

Strategy	Downtime	How It Works	Config	Use Case
Eager (Range/RoundRobin)	🔴 30s stop-the-world	All consumers stop, reassign all partitions, resume	`partition.assignment.strategy=range`	Small groups (<10 consumers)
Cooperative (Incremental)	🟢 0s (gradual)	Only affected partitions are reassigned	`partition.assignment.strategy=cooperative-sticky`	Large groups (>100 consumers)
Static Group Membership	🟢 0s (on planned restart)	Consumer keeps same ID across restarts	`group.instance.id=pod-name`	Kubernetes rolling deploys

Eager Rebalancing (Default, Legacy)

Flow:

Consumer C3 crashes
All consumers stop processing (revoke all partitions)
Coordinator reassigns partitions
All consumers resume

Downtime: ~10-30 seconds (depends on session.timeout.ms)

Problem: If you have 100 consumers and 1 crashes, all 100 stop. Overkill.

Cooperative Rebalancing (Kafka 2.4+, Recommended)

Flow:

Consumer C3 crashes (owned P5, P8)
Only P5 and P8 are revoked
Coordinator reassigns P5→C1, P8→C4
Other 97 consumers keep processing

Downtime: 0 seconds for unaffected partitions

Config:

partition.assignment.strategy=cooperative-sticky

[!TIP] Production Default: Always use cooperative-sticky for zero-downtime deployments.

Static Group Membership (Session Pinning)

Problem: Kubernetes rolling deploy restarts pods. Each restart triggers rebalance.

Solution: Assign a static group instance ID:

group.instance.id=payment-consumer-pod-7

Behavior:

Consumer restarts with same ID → No rebalance (partitions retained)
Coordinator waits session.timeout.ms (default: 45s) before reassigning
If pod comes back within 45s, it resumes seamlessly

Use Case: Kubernetes StatefulSets, planned maintenance

9. Production Configuration Guide

Broker-Level Tuning

Config	Description	Default	Tuned (High Throughput)
`num.network.threads`	Threads handling requests	3	8 (for high concurrency)
`num.io.threads`	Threads writing to disk	8	16 (SSD clusters)
`socket.send.buffer.bytes`	TCP send buffer	100KB	1MB (10Gbps networks)
`log.segment.bytes`	Max segment file size	1GB	1GB (keep default)

Producer Tuning: Latency vs Throughput

Config	Low Latency	High Throughput	Why
linger.ms	0	100	Wait time before sending batch (batching improves compression)
batch.size	16KB	1MB	Larger batches = better compression ratio
compression.type	none	zstd	`zstd` is fastest modern codec (Kafka 2.1+)
acks	1	all	`1`=leader only (fast), `all`=ISR quorum (durable)
buffer.memory	32MB	128MB	Total memory for pending sends

Compression Benchmark (1GB dataset):

none: 1000 MB, 100 MB/s
gzip: 350 MB, 40 MB/s (high CPU)
lz4: 450 MB, 90 MB/s (balanced)
zstd: 300 MB, 95 MB/s (best ratio + speed)

[!TIP] Interview Tip: Use linger.ms=10 and batch.size=256KB as a balanced starting point. Tune based on P99 latency requirements.

Consumer Tuning: Polling Behavior

Config	Description	Default	Tuned
fetch.min.bytes	Min data to fetch before returning	1 byte	1MB (reduce poll frequency)
fetch.max.wait.ms	Max wait if fetch.min.bytes not met	500ms	100ms (lower latency)
max.poll.records	Max records per poll()	500	100 (if processing is slow)
session.timeout.ms	Max time between heartbeats	45s	30s (faster failure detection)

Rule of Thumb:

Fast processing (e.g., logs): max.poll.records=500
Slow processing (e.g., ML inference): max.poll.records=10 (avoid rebalance timeout)

10. Summary

Kafka is a Log: Think of it as a distributed file system, not a queue.
Partitions: The unit of parallelism. Max Consumers = Max Partitions.
Durability: Ensured via ISR and Producer Acks (acks=all).
Zero Copy: The secret sauce behind its speed.
Consumer Groups: Enable horizontal scaling of processing.
Exactly-Once Semantics: Possible via transactions and idempotent producers.
Rebalancing: Prefer cooperative-sticky for zero-downtime deployments.
Production Tuning: Balance latency (linger.ms=0) vs throughput (linger.ms=100, compression=zstd).