Kafka & Streaming

[!NOTE] This module explores the core principles of Kafka & Streaming, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Queue vs Stream

Most developers confuse RabbitMQ (Queue) and Kafka (Stream). They solve different problems.

  • RabbitMQ (The Mailbox):
  • Model: Smart Broker, Dumb Consumer.
  • Behavior: You take a letter out, and it’s gone (deleted).
  • Use Case: Task processing, Job Queues (e.g., “Resize this image”).
  • Kafka (The Tape Recorder):
  • Model: Dumb Broker, Smart Consumer.
  • Behavior: Messages are appended to a log. You can listen, rewind, and listen again. Messages stick around for days (retention).
  • Use Case: Event Sourcing, Analytics, Metrics, Change Data Capture (CDC).

[!TIP] Kafka is essentially a distributed Write-Ahead Log (WAL). If you understand how a database persists data (see WAL), you understand Kafka.


2. Architecture: The Log

Kafka’s core abstraction is the Log—an append-only sequence of records.

A. Topic & Partition

A Topic is a category (e.g., user_clicks). Since one server can’t hold all data, the Topic is split into Partitions.

  • Partition: An ordered, immutable sequence of messages.
  • Ordering: Guaranteed only within a partition. Global ordering across partitions is NOT guaranteed.
  • Hash Key: Partition = hash(Key) % NumPartitions. All messages for User:123 go to the same partition (e.g., P0).

B. The Offset

Each message in a partition has a unique ID called the Offset.

  • The Consumer tracks what it has read.
  • “I’ve read up to Offset 500 in Partition 0.”

C. Consumer Groups (The Secret Sauce)

How do we scale processing? We add consumers to a Consumer Group.

  • Rule: A partition is consumed by exactly one consumer in the group.
  • Implication: If you have 10 partitions, you can have AT MOST 10 active consumers. The 11th consumer will be idle.

3. Interactive Demo: The Kafka Cluster Simulator

Visualize how Partitions distribute load and how Consumer Groups scale.

[!TIP] Try it yourself: Add Consumers until you have 3 (one for each Partition). Then kill one and watch the Rebalancing process!

Producer Rate
REBALANCING GROUP...
Total Messages: 0
Group Lag: 0
Consumers: 0

4. Under the Hood: Why is Kafka Fast?

Kafka can handle millions of messages/sec on spinning hard disks. How?

A. Sequential I/O

Random Disk Access (seeking) is slow (~100 seeks/sec). Sequential Access is fast (~100s MB/sec). Kafka only appends to the end of the file. It never seeks.

  • Result: Disk speeds approach Network speeds.

B. Zero Copy

Standard File Send involves 4 copies and 4 context switches.

  1. Disk → Kernel Buffer (Read)
  2. Kernel → User Space (App Buffer)
  3. User Space → Kernel Socket Buffer (Write)
  4. Socket Buffer → NIC (Network Card)

Kafka uses the sendfile() system call (Zero Copy):

  1. Disk → Kernel Buffer
  2. Kernel Buffer → NIC (via Descriptor)

Visualizing the Savings

STANDARD I/O (Wasteful)
[Disk] --> [Kernel] --> [User App] --> [Kernel] --> [NIC] ^ | |___________| CPU Copy
ZERO COPY (Efficient)
[Disk] --> [Kernel] ---------------------------> [NIC] | (DMA Copy only)

5. Durability & Reliability

How do we ensure we never lose data?

A. Replication (ISR)

Each partition has one Leader and multiple Followers.

  • Leader: Handles all Reads and Writes.
  • Follower: Passively replicates data from the Leader.
  • ISR (In-Sync Replicas): The set of followers that are “caught up” with the leader. If a leader dies, only an ISR member can become the new leader.

B. Producer Acks

The producer decides how safe the write must be.

  • acks=0 (Fire & Forget): Send and don’t wait. Fastest, but data loss possible if broker crashes immediately.
  • acks=1 (Leader Ack): Wait for Leader to write to disk. Safe if Leader stays alive.
  • acks=all (Strongest): Wait for Leader AND all ISRs to acknowledge. Zero data loss.

[!TIP] Trade-off: acks=all is slower (higher latency) but safest. Use it for Payments. Use acks=1 for Metrics.

6. Log Compaction: Keeping the Latest State

Kafka can act as a database (KTable). With Log Compaction, Kafka runs a background process that keeps only the latest value for each key.

Example: User Profiles

  • Input Stream:
    1. Key: u_101, Value: {name: "Alice"}
    2. Key: u_101, Value: {name: "Alice", city: "NYC"} (Update)
    3. Key: u_102, Value: {name: "Bob"}
  • Compacted Log (What is saved):
    1. Key: u_101, Value: {name: "Alice", city: "NYC"}
    2. Key: u_102, Value: {name: "Bob"}
  • Use Case: Restoring application state after a crash without replaying years of history.

7. Deep Dive: Exactly-Once Semantics (Transactions)

Kafka supports Exactly-Once Semantics (EOS) for critical use cases like payments, order processing, and financial transactions.

The Problem: Duplicates from Retries

Scenario: Producer sends OrderCreated event. Network hiccups. Producer retries. Kafka gets it twice.

Without EOS:

Producer → [OrderCreated] → Kafka (Ack Lost)
Producer → [OrderCreated] → Kafka (Retry, Duplicate!)
Consumer processes order TWICE → Double charge 💸

Solution 1: Idempotent Producer

Config: enable.idempotence=true (default in Kafka 3.0+)

How it Works:

  • Producer assigns each message a unique sequence number
  • Broker tracks (ProducerId, Partition, SeqNum) in memory
  • If duplicate arrives, broker silently discards it

Guarantee: At-Most-Once becomes At-Least-Once without duplicates

Limitation: Only works for single partition writes. If you write to multiple partitions, they’re not atomic.


Solution 2: Transactional API (Multi-Partition Atomicity)

Use Case: Transfer $100 from Account A to Account B. You need to:

  1. Write DebitEvent to account-A topic
  2. Write CreditEvent to account-B topic
  3. Atomic: Both succeed or both fail

Code Example (Java):

Properties props = new Properties();
props.put("transactional.id", "payment-service-1"); // Unique per instance
props.put("enable.idempotence", "true");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions(); // Register with coordinator

try {
  producer.beginTransaction();

  // All-or-nothing batch
  producer.send(new ProducerRecord<>("account-A", "user:123", debitJson));
  producer.send(new ProducerRecord<>("account-B", "user:456", creditJson));
  producer.send(new ProducerRecord<>("orders", "order: 803", orderJson));

  producer.commitTransaction(); // Atomic commit
} catch (Exception e) {
  producer.abortTransaction(); // Rollback all
}

What Happens Under the Hood:

  1. Producer writes to all partitions with a transaction marker
  2. Broker holds messages in pending state (not visible to consumers yet)
  3. On commitTransaction(), broker writes a COMMIT marker to a special __transaction_state topic
  4. Messages become visible atomically

Isolation Levels (Consumer Side)

Config: isolation.level=read_committed (default: read_uncommitted)

Isolation Level Behavior Use Case
read_uncommitted Reads all messages (even pending transactions) High throughput, eventual consistency OK
read_committed Only reads committed messages Financial systems, strong consistency

Trade-off: read_committed adds ~10ms latency (waits for transaction markers)


Production Pattern: E-Commerce Order Flow

Without Transactions (Risky):

1. Deduct inventory (Kafka write)
2. Network fails
3. Charge payment (Kafka write succeeds)
4. Customer charged, but item still in stock → Oversold!

With Transactions (Safe):

producer.beginTransaction();

// Step 1: Reserve inventory
producer.send(inventoryTopic, "RESERVE:item-123:qty-1");

// Step 2: Charge payment
producer.send(paymentTopic, "CHARGE:user-789:$99");

// Step 3: Create order
producer.send(ordersTopic, "ORDER:order-456:CONFIRMED");

producer.commitTransaction(); // All 3 succeed or all 3 fail

Result: No partial state. Either order fully succeeds or fully fails.

[!IMPORTANT] Interview Insight: Kafka transactions are NOT like database ACID transactions. They guarantee atomicity within Kafka, but if your consumer crashes after reading, you still need idempotent processing downstream.


8. Deep Dive: Consumer Group Rebalancing Strategies

When a consumer joins or leaves a group, Kafka rebalances partition ownership. This can cause downtime.

Rebalancing Protocol Comparison

Strategy Downtime How It Works Config Use Case
Eager (Range/RoundRobin) 🔴 30s stop-the-world All consumers stop, reassign all partitions, resume partition.assignment.strategy=range Small groups (<10 consumers)
Cooperative (Incremental) 🟢 0s (gradual) Only affected partitions are reassigned partition.assignment.strategy=cooperative-sticky Large groups (>100 consumers)
Static Group Membership 🟢 0s (on planned restart) Consumer keeps same ID across restarts group.instance.id=pod-name Kubernetes rolling deploys

Eager Rebalancing (Default, Legacy)

Flow:

  1. Consumer C3 crashes
  2. All consumers stop processing (revoke all partitions)
  3. Coordinator reassigns partitions
  4. All consumers resume

Downtime: ~10-30 seconds (depends on session.timeout.ms)

Problem: If you have 100 consumers and 1 crashes, all 100 stop. Overkill.


Flow:

  1. Consumer C3 crashes (owned P5, P8)
  2. Only P5 and P8 are revoked
  3. Coordinator reassigns P5→C1, P8→C4
  4. Other 97 consumers keep processing

Downtime: 0 seconds for unaffected partitions

Config:

partition.assignment.strategy=cooperative-sticky

[!TIP] Production Default: Always use cooperative-sticky for zero-downtime deployments.


Static Group Membership (Session Pinning)

Problem: Kubernetes rolling deploy restarts pods. Each restart triggers rebalance.

Solution: Assign a static group instance ID:

group.instance.id=payment-consumer-pod-7

Behavior:

  • Consumer restarts with same ID → No rebalance (partitions retained)
  • Coordinator waits session.timeout.ms (default: 45s) before reassigning
  • If pod comes back within 45s, it resumes seamlessly

Use Case: Kubernetes StatefulSets, planned maintenance


9. Production Configuration Guide

Broker-Level Tuning

Config Description Default Tuned (High Throughput)
num.network.threads Threads handling requests 3 8 (for high concurrency)
num.io.threads Threads writing to disk 8 16 (SSD clusters)
socket.send.buffer.bytes TCP send buffer 100KB 1MB (10Gbps networks)
log.segment.bytes Max segment file size 1GB 1GB (keep default)

Producer Tuning: Latency vs Throughput

Config Low Latency High Throughput Why
linger.ms 0 100 Wait time before sending batch (batching improves compression)
batch.size 16KB 1MB Larger batches = better compression ratio
compression.type none zstd zstd is fastest modern codec (Kafka 2.1+)
acks 1 all 1=leader only (fast), all=ISR quorum (durable)
buffer.memory 32MB 128MB Total memory for pending sends

Compression Benchmark (1GB dataset):

  • none: 1000 MB, 100 MB/s
  • gzip: 350 MB, 40 MB/s (high CPU)
  • lz4: 450 MB, 90 MB/s (balanced)
  • zstd: 300 MB, 95 MB/s (best ratio + speed)

[!TIP] Interview Tip: Use linger.ms=10 and batch.size=256KB as a balanced starting point. Tune based on P99 latency requirements.


Consumer Tuning: Polling Behavior

Config Description Default Tuned
fetch.min.bytes Min data to fetch before returning 1 byte 1MB (reduce poll frequency)
fetch.max.wait.ms Max wait if fetch.min.bytes not met 500ms 100ms (lower latency)
max.poll.records Max records per poll() 500 100 (if processing is slow)
session.timeout.ms Max time between heartbeats 45s 30s (faster failure detection)

Rule of Thumb:

  • Fast processing (e.g., logs): max.poll.records=500
  • Slow processing (e.g., ML inference): max.poll.records=10 (avoid rebalance timeout)

10. Summary

  • Kafka is a Log: Think of it as a distributed file system, not a queue.
  • Partitions: The unit of parallelism. Max Consumers = Max Partitions.
  • Durability: Ensured via ISR and Producer Acks (acks=all).
  • Zero Copy: The secret sauce behind its speed.
  • Consumer Groups: Enable horizontal scaling of processing.
  • Exactly-Once Semantics: Possible via transactions and idempotent producers.
  • Rebalancing: Prefer cooperative-sticky for zero-downtime deployments.
  • Production Tuning: Balance latency (linger.ms=0) vs throughput (linger.ms=100, compression=zstd).