Delivery Semantics & Reliability

This module explores the core principles of Kafka delivery semantics and reliability, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

Imagine you are running a high-frequency trading platform. If you lose a “buy” order, you lose money. If you duplicate a “buy” order, you buy twice as much stock and risk bankruptcy. But if you are just tracking user clicks on a website, losing a few clicks is fine, and grinding the system to a halt to guarantee every single click is recorded is a terrible tradeoff.

In distributed systems, the network is fundamentally unreliable—packets drop, nodes crash, and disks fail. Kafka provides three distinct levels of delivery guarantees, allowing you to tune the tradeoff between throughput and data safety. Let’s break them down.

1. The Three Levels of Guarantee

Guarantee Description Use Case
At-Most-Once Messages may be lost, but are never redelivered. Metrics, IoT Sensor data (speed over accuracy).
At-Least-Once Messages are never lost, but may be redelivered (duplicates). Standard microservices, logging.
Exactly-Once Messages are processed exactly once, even if failures occur. Financial transactions, inventory management.

2. Acks (Acknowledgements)

The producer acks setting controls how many replicas must acknowledge a write for it to be considered successful.

acks=0 (None)

  • The producer sends the message and immediately considers it “sent”.
  • Risk: If the broker crashes before writing to disk, data is lost.
  • Performance: Highest throughput.

acks=1 (Leader Only)

  • The producer waits for the Leader to acknowledge the write.
  • Risk: If the Leader crashes after acknowledging but before replication, data is lost.
  • Performance: Balanced.

acks=all (All ISR)

  • The producer waits for the Leader AND all In-Sync Replicas (ISR) to acknowledge.
  • Risk: Zero data loss as long as one ISR survives.
  • Performance: Lowest throughput (latency = max(replica_latency)).

3. Retries and Idempotence

Retries

Network glitches happen. By default, the producer retries transient failures (e.g., NotEnoughReplicasException).

  • retries: Set to Integer.MAX_VALUE by default.
  • delivery.timeout.ms: Controls how long (e.g., 2 minutes) the producer will keep retrying before giving up.

The Problem with Retries: Duplicates

If a producer sends a message, the broker writes it, but the ACK is lost on the network, the producer will retry. The broker writes the message again. Result: Duplicate Message.

The Solution: Idempotence (enable.idempotence=true)

When enabled, the producer assigns a Sequence Number to every message.

  1. Producer sends Msg(Seq=1).
  2. Broker writes Msg(Seq=1).
  3. ACK is lost.
  4. Producer retries Msg(Seq=1).
  5. Broker sees Seq=1 already exists from this producer ID.
  6. Broker drops the duplicate and returns success.
Important

Always set enable.idempotence=true in production (default in Kafka ≥ 3.0). It gives you Exactly-Once semantics per partition with zero performance penalty.


4. Interactive: Delivery Simulator

Simulate a network failure and see how acks and retries affect data integrity.

Producer
Broker
Ready to send.

5. Transactions (Atomic Writes)

Idempotence only works for a single partition. If you need to update multiple partitions atomically (e.g., “Consume from A, Transform, Produce to B”), you need Transactions.

The Transactional API ensures that a group of messages across multiple topics/partitions are either ALL visible to consumers or NONE are.

Java

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "my-transactional-id"); // Required

KafkaProducer<String, String> producer = new KafkaProducer<>(props);

// 1. Init
producer.initTransactions();

try {
  // 2. Start
  producer.beginTransaction();

  // 3. Send to multiple topics
  producer.send(new ProducerRecord<>("orders", "key", "value"));
  producer.send(new ProducerRecord<>("payments", "key", "value"));

  // 4. Commit
  producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
  // Fatal errors
  producer.close();
} catch (KafkaException e) {
  // Abort on error
  producer.abortTransaction();
}

Go

// Kafka-go does not have a high-level Transaction API like Java client.
// However, Sarama (another Go library) supports it.
// Here is how you configure Idempotence (Exactly-Once per partition) in Go.

w := &kafka.Writer{
  Addr:     kafka.TCP("localhost:9092"),
  Topic:    "orders",
  Balancer: &kafka.Hash{},

  // Enable Idempotence
  // Note: kafka-go enables this by default if RequiredAcks is All
  RequiredAcks: kafka.RequireAll,

  // Ensure retries are high
  MaxAttempts: 10,
}
Note

For full multi-topic transactions in Go, you typically need to use the Shopify/sarama library or confluent-kafka-go (wrapper around C librdkafka), as segmentio/kafka-go focuses on simplicity and stream processing patterns often handled differently in Go.