Union File Systems (OverlayFS)

[!NOTE] This module explores the core principles of Union File Systems (OverlayFS), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Illusion of a Filesystem

When you pull a 1GB Docker image, and then start 100 containers from it, does it use 100GB of disk space?

No. It uses almost zero extra space.

This magic is possible due to Union File Systems (specifically OverlayFS). They allow multiple directories (layers) to be stacked on top of each other and presented as a single unified view.

OverlayFS Architecture

OverlayFS uses four directories:

  1. LowerDir: Read-Only layers (The Image). These can be stacked deep (Layer 1, Layer 2…).
  2. UpperDir: Read-Write layer (The Container). Changes happen here.
  3. MergedDir: The Unified View (What the process sees).
  4. WorkDir: An empty directory used for atomic copy operations.

2. Interactive: Layer Stack Visualizer

See exactly what happens when you read or write a file.

The Stack
Merged View (Virtual)
UpperDir (Read-Write)
LowerDir (Read-Only Image)
File Operations
Select an operation...

3. Code Examples

1. Go Implementation (Mounting Overlay)

This is how container runtimes assemble the filesystem before starting the container process.

package main

import (
	"fmt"
	"os"
	"syscall"
)

func main() {
	// 1. Create Directories
	dirs := []string{"lower", "upper", "work", "merged"}
	for _, d := range dirs {
		os.Mkdir(d, 0755)
	}

	// 2. Create a file in LowerDir (The "Image")
	os.WriteFile("lower/base-file.txt", []byte("I am from the image!"), 0644)

	// 3. Mount OverlayFS
	// mount -t overlay overlay -o lowerdir=...,upperdir=...,workdir=... merged
	fstype := "overlay"
	flags := uintptr(0)
	data := "lowerdir=lower,upperdir=upper,workdir=work"

	err := syscall.Mount("overlay", "merged", fstype, flags, data)
	if err != nil {
		fmt.Printf("Mount failed (Requires sudo): %v\n", err)
		return
	}
	defer syscall.Unmount("merged", 0)

	fmt.Println("Overlay mounted at ./merged")
	// Any write to ./merged/base-file.txt will copy it to ./upper/base-file.txt
}

2. Java Implementation (Transparency)

To a Java application, OverlayFS is completely transparent. It’s just a filesystem. However, understanding it is critical for performance.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class CoWDemo {

    public static void main(String[] args) throws IOException {
        Path file = Paths.get("/app/config/settings.properties");

        // 1. First Write: Slow (Trigger CoW)
        long start = System.nanoTime();
        Files.write(file, "updated=true".getBytes(), StandardOpenOption.APPEND);
        long duration = System.nanoTime() - start;

        System.out.printf("First write took: %d ns (CoW overhead)\n", duration);

        // 2. Second Write: Fast (Already in UpperDir)
        start = System.nanoTime();
        Files.write(file, "ver=2".getBytes(), StandardOpenOption.APPEND);
        duration = System.nanoTime() - start;

        System.out.printf("Second write took: %d ns (Direct write)\n", duration);
    }
}

[!IMPORTANT] Performance Impact: The first write to a large file (e.g., a 1GB log file in an image) can stall the application because the entire file must be copied to the UpperDir layer before the write can proceed.

4. First Principles: Why Copy-On-Write?

Why not just copy the whole image for every container?

  1. Space Efficiency: 100 containers running nginx share the exact same 50MB of read-only image data on disk.
  2. Startup Speed: No copying happens at startup. The container starts instantly because the “copy” is virtual.
  3. Page Cache Sharing: Since all containers read the same physical files (inodes) in LowerDir, the Kernel can cache them in RAM once and share them across all containers.