AOF: The Append-Only Log

If RDB is a “Save Game” feature, then AOF (Append Only File) is the “Replay” feature.

Instead of taking a snapshot of the memory, AOF logs every single write command (like SET, INCR, LPUSH) that Redis receives.

The Chess Match Analogy

  • RDB (Snapshot): Taking a photograph of the chessboard every 5 minutes. If someone bumps the table, you can restore the board to how it looked in the last photo.
  • AOF (Log): Writing down the official chess notation for every single move as it happens (e.g., 1. e4 e5...). You can reconstruct the exact state of the board by replaying the moves from the very beginning.

When Redis restarts, it simply re-plays this log from start to finish to reconstruct the dataset.

1. Why AOF?

RDB snapshots are great, but they have a flaw: Data Loss Window. If you snapshot every 5 minutes and Redis crashes at minute 4:59, you lose 4 minutes and 59 seconds of data.

AOF solves this by logging writes as they happen.

2. The Cost of Durability: fsync Policies

Writing to disk is slow. Writing to memory is fast. If Redis wrote to disk synchronously for every command, it would be terribly slow.

Redis uses the OS buffer cache and the fsync system call to balance speed and safety. You control this with the appendfsync configuration:

Policy Description Performance Safety
always Call fsync after every write command. 🐢 Very Slow ✅ Zero Data Loss
everysec Call fsync once per second (Default). 🚀 Fast ⚠️ 1 Second Loss
no Let the OS decide when to flush (usually every 30s). 🚀🚀 Fastest ❌ High Risk

[!TIP] Use everysec. It is the default for a reason. You get near-memory performance, and in the worst-case crash (power failure), you lose only 1 second of data.

🛡️ War Story: The fsync=always Trap

A fast-growing startup once experienced sudden, severe latency spikes in their Redis cluster during peak traffic. Engineers noticed that operations normally taking microseconds were now taking milliseconds. The culprit? They had set appendfsync always in an attempt to guarantee zero data loss for their impression counters.

Because disk I/O is orders of magnitude slower than memory, forcing an OS-level synchronous flush on every single counter increment bottlenecked the entire single-threaded Redis engine. By switching to appendfsync everysec, they restored sub-millisecond response times while accepting a heavily calculated risk of losing just one second of data in the event of a total power failure.

3. The Problem: The Log Grows Forever

If you run INCR counter 1 million times, your AOF file will contain 1 million entries.

INCR counter
INCR counter
INCR counter
...

But the final state is just counter = 1000000. We only need one command to restore it: SET counter 1000000.

This is where AOF Rewrite comes in.

The Anatomy of an AOF Rewrite

A critical misconception is that Redis parses the existing massive AOF file to figure out what to delete. It does not. Reading a 100GB text file to simplify it would be incredibly slow.

Instead, Redis looks directly at its current memory state.

The Step-by-Step BGREWRITEAOF Process:

  1. Fork Child Process: Redis creates a background child process, capturing a point-in-time snapshot of memory (utilizing OS Copy-on-Write).
  2. Build Base Log: The child process iterates through memory and writes the shortest possible sequence of commands (e.g., SET counter 1000000) into a new temporary AOF file.
  3. Buffer New Writes: While the child is writing, the main Redis process continues to serve traffic. It writes new incoming commands to both the old AOF file (for safety) and an in-memory AOF Rewrite Buffer.
  4. Append & Swap: Once the child finishes the base log, the main process flushes the Rewrite Buffer onto the end of the new AOF file, and atomically swaps it in to replace the old one.

Interactive: AOF Rewrite Visualizer

See how Redis compacts the AOF log. Enter a sequence of commands and watch the rewrite process eliminate redundant operations.

Original AOF Log (Size: 0 bytes)

Rewritten AOF (Size: 0 bytes)

Reduction: 0%

4. Configuring AOF in Production

You typically enable AOF in redis.conf:

appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec

Automatic Rewrite

Redis automatically triggers a rewrite when the AOF file grows too large.

# Rewrite if AOF size is 100% larger (2x) than the last rewrite size
auto-aof-rewrite-percentage 100
# ... but only if the file is at least 64MB
auto-aof-rewrite-min-size 64mb

Modern Redis: Multi-Part AOF (Redis 7.0+)

In Redis 7.0, the AOF architecture was overhauled. Instead of a single massive file, Redis now uses Multi-Part AOF (MP-AOF). The AOF is split into:

  1. Base AOF: The compacted snapshot from the last rewrite.
  2. Incremental AOFs: Smaller files containing writes since the last rewrite.
  3. Manifest File: Tracks the active files.

This eliminates the legacy overhead of the in-memory AOF Rewrite Buffer, significantly reducing memory spikes during rewrites.

5. AOF in Java & Go

import redis.clients.jedis.Jedis;

public class RedisAOF {
    public static void main(String[] args) {
        // Connect to local Redis
        try (Jedis jedis = new Jedis("localhost", 6379)) {
            // Trigger BGREWRITEAOF
            String response = jedis.bgrewriteaof();
            System.out.println("Rewrite started: " + response);
        }
    }
}
package main

import (
    "context"
    "fmt"
    "github.com/redis/go-redis/v9"
)

func main() {
    ctx := context.Background()
    rdb := redis.NewClient(&redis.Options{Addr: "localhost:6379"})

    // Trigger BGREWRITEAOF
    status := rdb.BgRewriteAOF(ctx)
    if err := status.Err(); err != nil {
        panic(err)
    }
    fmt.Println("Rewrite started:", status.Val())
}

6. RDB vs AOF: The Verdict

Feature RDB AOF
Durability Low (Minutes lost) High (1 second lost)
File Size Compact (Binary) Large (Text logs)
Startup Speed Fast Slow (Replay takes time)
Integrity Robust Can be corrupted (fixable)

Can we have the best of both worlds? Yes. That is what we will cover in the next chapter: Hybrid Persistence.