Kafka & Streaming
[!NOTE] This module explores the core principles of Kafka & Streaming, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Queue vs Stream
Most developers confuse RabbitMQ (Queue) and Kafka (Stream). They solve different problems.
- RabbitMQ (The Mailbox):
- Model: Smart Broker, Dumb Consumer.
- Behavior: You take a letter out, and it’s gone (deleted).
- Use Case: Task processing, Job Queues (e.g., “Resize this image”).
- Kafka (The Tape Recorder):
- Model: Dumb Broker, Smart Consumer.
- Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
- Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC).
[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.
2. Architecture: The Log
Kafka’s core abstraction is the Log—an append-only sequence of records.
A. Topic & Partition
A Topic is a category (e.g., user_clicks). Since one server can’t hold all data, the Topic is split into Partitions.
- Partition: An ordered, immutable sequence of messages.
- Ordering: Guaranteed only within a partition. Global ordering across partitions is NOT guaranteed.
- Hash Key:
Partition = hash(Key) % NumPartitions. All messages forUser:123go to the same partition (e.g., P0).
B. The Offset
Each message in a partition has a unique ID called the Offset.
- The Consumer tracks what it has read.
- “I’ve read up to Offset 500 in Partition 0.”
C. Consumer Groups (The Secret Sauce)
How do we scale processing? We add consumers to a Consumer Group.
- Rule: A partition is consumed by exactly one consumer in the group.
- Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.
3. Interactive Demo: The Kafka Cluster Simulator
Visualize how Partitions distribute load and how Consumer Groups scale.
[!TIP] Try it yourself: Add Consumers until you have 3 (one for each Partition). Then kill one and watch the Rebalancing process!
4. Under the Hood: Why is Kafka Fast?
Kafka can handle millions of messages/sec on spinning hard disks. How?
A. Sequential I/O
Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.
- Result: Disk speeds approach Network speeds.
B. Zero Copy
Standard File Send involves 4 copies and 4 context switches.
- Disk → Kernel Buffer (Read)
- Kernel → User Space (App Buffer)
- User Space → Kernel Socket Buffer (Write)
- Socket Buffer → NIC (Network Card)
Kafka uses the sendfile() system call (Zero Copy):
- Disk → Kernel Buffer
- Kernel Buffer → NIC (via Descriptor)
Visualizing the Savings
5. Durability & Reliability
How do we ensure we never lose data?
A. Replication (ISR)
Each partition has one Leader and multiple Followers.
- Leader: Handles all Reads and Writes.
- Follower: Passively replicates data from the Leader.
- ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.
B. Producer Acks
The producer decides how safe the write must be.
- acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
- acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
- acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.
[!TIP] Trade-off:
acks=allis slower (higher latency) but safest. Use it for Payments. Useacks=1for Metrics.
6. Log Compaction: Keeping the Latest State
Kafka can act as a database (KTable). With Log Compaction, Kafka runs a background process that keeps only the latest value for each key.
Example: User Profiles
- Input Stream:
Key: u_101, Value: {name: "Alice"}Key: u_101, Value: {name: "Alice", city: "NYC"}(Update)Key: u_102, Value: {name: "Bob"}
- Compacted Log (What is saved):
Key: u_101, Value: {name: "Alice", city: "NYC"}Key: u_102, Value: {name: "Bob"}
- Use Case: Restoring application state after a crash without replaying years of history.
7. Deep Dive: Exactly-Once Semantics (Transactions)
Kafka supports Exactly-Once Semantics (EOS) for critical use cases like payments, order processing, and financial transactions.
The Problem: Duplicates from Retries
Scenario: Producer sends OrderCreated event. Network hiccups. Producer retries. Kafka gets it twice.
Without EOS:
Producer → [OrderCreated] → Kafka (Ack Lost)
Producer → [OrderCreated] → Kafka (Retry, Duplicate!)
Consumer processes order TWICE → Double charge 💸
Solution 1: Idempotent Producer
Config: enable.idempotence=true (default in Kafka 3.0+)
How it Works:
- Producer assigns each message a unique sequence number
- Broker tracks
(ProducerId, Partition, SeqNum)in memory - If duplicate arrives, broker silently discards it
Guarantee: At-Most-Once becomes At-Least-Once without duplicates
Limitation: Only works for single partition writes. If you write to multiple partitions, they’re not atomic.
Solution 2: Transactional API (Multi-Partition Atomicity)
Use Case: Transfer $100 from Account A to Account B. You need to:
- Write
DebitEventtoaccount-Atopic - Write
CreditEventtoaccount-Btopic - Atomic: Both succeed or both fail
Code Example (Java):
Properties props = new Properties();
props.put("transactional.id", "payment-service-1"); // Unique per instance
props.put("enable.idempotence", "true");
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions(); // Register with coordinator
try {
producer.beginTransaction();
// All-or-nothing batch
producer.send(new ProducerRecord<>("account-A", "user:123", debitJson));
producer.send(new ProducerRecord<>("account-B", "user:456", creditJson));
producer.send(new ProducerRecord<>("orders", "order: 803", orderJson));
producer.commitTransaction(); // Atomic commit
} catch (Exception e) {
producer.abortTransaction(); // Rollback all
}
What Happens Under the Hood:
- Producer writes to all partitions with a transaction marker
- Broker holds messages in pending state (not visible to consumers yet)
- On
commitTransaction(), broker writes a COMMIT marker to a special__transaction_statetopic - Messages become visible atomically
Isolation Levels (Consumer Side)
Config: isolation.level=read_committed (default: read_uncommitted)
| Isolation Level | Behavior | Use Case |
|---|---|---|
| read_uncommitted | Reads all messages (even pending transactions) | High throughput, eventual consistency OK |
| read_committed | Only reads committed messages | Financial systems, strong consistency |
Trade-off: read_committed adds ~10ms latency (waits for transaction markers)
Production Pattern: E-Commerce Order Flow
Without Transactions (Risky):
1. Deduct inventory (Kafka write)
2. Network fails
3. Charge payment (Kafka write succeeds)
4. Customer charged, but item still in stock → Oversold!
With Transactions (Safe):
producer.beginTransaction();
// Step 1: Reserve inventory
producer.send(inventoryTopic, "RESERVE:item-123:qty-1");
// Step 2: Charge payment
producer.send(paymentTopic, "CHARGE:user-789:$99");
// Step 3: Create order
producer.send(ordersTopic, "ORDER:order-456:CONFIRMED");
producer.commitTransaction(); // All 3 succeed or all 3 fail
Result: No partial state. Either order fully succeeds or fully fails.
[!IMPORTANT] Interview Insight: Kafka transactions are NOT like database ACID transactions. They guarantee atomicity within Kafka, but if your consumer crashes after reading, you still need idempotent processing downstream.
8. Deep Dive: Consumer Group Rebalancing Strategies
When a consumer joins or leaves a group, Kafka rebalances partition ownership. This can cause downtime.
Rebalancing Protocol Comparison
| Strategy | Downtime | How It Works | Config | Use Case |
|---|---|---|---|---|
| Eager (Range/RoundRobin) | 🔴 30s stop-the-world | All consumers stop, reassign all partitions, resume | partition.assignment.strategy=range |
Small groups (<10 consumers) |
| Cooperative (Incremental) | 🟢 0s (gradual) | Only affected partitions are reassigned | partition.assignment.strategy=cooperative-sticky |
Large groups (>100 consumers) |
| Static Group Membership | 🟢 0s (on planned restart) | Consumer keeps same ID across restarts | group.instance.id=pod-name |
Kubernetes rolling deploys |
Eager Rebalancing (Default, Legacy)
Flow:
- Consumer C3 crashes
- All consumers stop processing (revoke all partitions)
- Coordinator reassigns partitions
- All consumers resume
Downtime: ~10-30 seconds (depends on session.timeout.ms)
Problem: If you have 100 consumers and 1 crashes, all 100 stop. Overkill.
Cooperative Rebalancing (Kafka 2.4+, Recommended)
Flow:
- Consumer C3 crashes (owned P5, P8)
- Only P5 and P8 are revoked
- Coordinator reassigns P5→C1, P8→C4
- Other 97 consumers keep processing
Downtime: 0 seconds for unaffected partitions
Config:
partition.assignment.strategy=cooperative-sticky
[!TIP] Production Default: Always use
cooperative-stickyfor zero-downtime deployments.
Static Group Membership (Session Pinning)
Problem: Kubernetes rolling deploy restarts pods. Each restart triggers rebalance.
Solution: Assign a static group instance ID:
group.instance.id=payment-consumer-pod-7
Behavior:
- Consumer restarts with same ID → No rebalance (partitions retained)
- Coordinator waits
session.timeout.ms(default: 45s) before reassigning - If pod comes back within 45s, it resumes seamlessly
Use Case: Kubernetes StatefulSets, planned maintenance
9. Production Configuration Guide
Broker-Level Tuning
| Config | Description | Default | Tuned (High Throughput) |
|---|---|---|---|
num.network.threads |
Threads handling requests | 3 | 8 (for high concurrency) |
num.io.threads |
Threads writing to disk | 8 | 16 (SSD clusters) |
socket.send.buffer.bytes |
TCP send buffer | 100KB | 1MB (10Gbps networks) |
log.segment.bytes |
Max segment file size | 1GB | 1GB (keep default) |
Producer Tuning: Latency vs Throughput
| Config | Low Latency | High Throughput | Why |
|---|---|---|---|
| linger.ms | 0 | 100 | Wait time before sending batch (batching improves compression) |
| batch.size | 16KB | 1MB | Larger batches = better compression ratio |
| compression.type | none | zstd | zstd is fastest modern codec (Kafka 2.1+) |
| acks | 1 | all | 1=leader only (fast), all=ISR quorum (durable) |
| buffer.memory | 32MB | 128MB | Total memory for pending sends |
Compression Benchmark (1GB dataset):
- none: 1000 MB, 100 MB/s
- gzip: 350 MB, 40 MB/s (high CPU)
- lz4: 450 MB, 90 MB/s (balanced)
- zstd: 300 MB, 95 MB/s (best ratio + speed)
[!TIP] Interview Tip: Use
linger.ms=10andbatch.size=256KBas a balanced starting point. Tune based on P99 latency requirements.
Consumer Tuning: Polling Behavior
| Config | Description | Default | Tuned |
|---|---|---|---|
| fetch.min.bytes | Min data to fetch before returning | 1 byte | 1MB (reduce poll frequency) |
| fetch.max.wait.ms | Max wait if fetch.min.bytes not met | 500ms | 100ms (lower latency) |
| max.poll.records | Max records per poll() | 500 | 100 (if processing is slow) |
| session.timeout.ms | Max time between heartbeats | 45s | 30s (faster failure detection) |
Rule of Thumb:
- Fast processing (e.g., logs):
max.poll.records=500 - Slow processing (e.g., ML inference):
max.poll.records=10(avoid rebalance timeout)
10. Summary
- Kafka is a Log: Think of it as a distributed file system, not a queue.
- Partitions: The unit of parallelism. Max Consumers = Max Partitions.
- Durability: Ensured via ISR and Producer Acks (
acks=all). - Zero Copy: The secret sauce behind its speed.
- Consumer Groups: Enable horizontal scaling of processing.
- Exactly-Once Semantics: Possible via transactions and idempotent producers.
- Rebalancing: Prefer
cooperative-stickyfor zero-downtime deployments. - Production Tuning: Balance latency (
linger.ms=0) vs throughput (linger.ms=100,compression=zstd).