Data Survival: Beyond the Container

Using a Volume is step one. But what if the host disk fails? What if you accidentally delete the volume? True persistence requires a Backup Strategy.

1. The “fsync” Promise

Databases (Postgres, MySQL) are designed to be durable (ACID). They use a system call named fsync() to force the OS to flush data from RAM to the physical disk platter.

Think of RAM as a whiteboard and the physical disk as a permanent notebook. Writing on the whiteboard (RAM) is extremely fast, but if the power goes out, the data is lost. fsync() is the act of stopping, taking out a pen, and meticulously copying the whiteboard contents into the permanent notebook (Disk). It takes longer, but it survives a power outage.

  • Docker Volume: Respects fsync. When Postgres thinks data is safe, it is actually safe on the host disk.
  • Container Layer: Overlay2 also respects fsync, but with a massive performance penalty due to Copy-on-Write (CoW).

[!NOTE] War Story: The “It’s on a Volume” Fallacy A common mistake among junior engineers is assuming that mapping a Docker volume (-v /var/lib/postgresql/data) to the host disk is a complete data persistence strategy. In one notable incident, a startup lost days of user data because their EC2 instance was terminated. The data was safely fsynced to the container’s volume—which lived on the ephemeral EC2 instance store, not an EBS volume or off-site backup. A volume only protects against container termination, not host termination.

[!IMPORTANT] Backup Rule: A volume on the host is still a single point of failure. You must export it off-site (S3, NAS).


2. Backup Strategies

When designing a backup system for Docker containers, you must balance uptime requirements with data consistency. Copying a database file while the database is actively writing to it will result in a corrupted backup.

Strategy Speed Consistency Uptime Impact Best For
Stop and Copy Fast Guaranteed High (Downtime) Small apps / Dev environments
Volume Mounting Fast Unpredictable* Low Static assets, logs
Database Dump Slower Guaranteed Zero (Hot Backup) Production databases

*Copying raw files of an active database can cause corruption.

1. Stop and Copy (Cold Backup)

The safest method, but requires downtime.

  1. Stop the container (docker stop db).
  2. Tar the volume directory.
  3. Start the container.

2. Volume Mounting (Hot/Warm Backup)

Run a temporary container that mounts the volume and a backup directory. This is useful for static files but dangerous for active databases unless they are locked first. docker run --rm -v db-data:/data -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /data

3. Database Dump (Logical Backup)

Use the database’s own tools (pg_dump, mysqldump). This is often better than raw file copies because it ensures consistency without stopping the database, by taking a snapshot of the data at a specific point in time.


3. Interactive: Data Survival Lab

Simulate a catastrophe and see if your data survives.

System Normal
Data: Safe
Waiting...

4. Code Example: Automating Backups

package main

import (
    "os/exec"
    "fmt"
)

func main() {
    // Hot Backup Strategy:
    // Run a temporary container to tar the volume content
    volumeName := "db-data"
    backupFile := "/backups/db-snap.tar"

    cmd := exec.Command("docker", "run", "--rm",
        "-v", volumeName + ":/data",
        "-v", "/backups:/backup",
        "alpine", "tar", "cf", "/backup/db-snap.tar", "/data")

    output, err := cmd.CombinedOutput()
    if err != nil {
        fmt.Printf("Backup Failed: %s\n", string(output))
    } else {
        fmt.Printf("Backup Success: %s created\n", backupFile)
    }
}
import java.io.BufferedReader;
import java.io.InputStreamReader;

public class BackupManager {
    public static void main(String[] args) {
        try {
            // Equivalent to: docker run --rm ...
            ProcessBuilder pb = new ProcessBuilder(
                "docker", "run", "--rm",
                "-v", "db-data:/data",
                "-v", "/backups:/backup",
                "alpine", "tar", "cf", "/backup/db-snap.tar", "/data"
            );

            Process p = pb.start();
            int exitCode = p.waitFor();

            if (exitCode == 0) {
                System.out.println("Backup complete.");
            } else {
                BufferedReader reader = new BufferedReader(
                    new InputStreamReader(p.getErrorStream()));
                System.out.println("Error: " + reader.readLine());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}