Image Layers & UnionFS

[!NOTE] This module explores the core principles of Image Layers & UnionFS, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Illusion of a Single File

When you run docker run ubuntu, it feels like you’re booting a virtual machine with a single disk image. But under the hood, Docker images are not single files—they are a stack of read-only layers united by a magical filesystem called UnionFS.

Understanding this architecture is critical for:

Minimizing Image Size: Knowing what adds weight.
Optimizing Build Speed: Leveraging the build cache.
Debugging Storage Issues: Understanding write latency.

2. Anatomy of an Image

A Docker image is an ordered collection of root filesystem changes. Each instruction in a Dockerfile that modifies the filesystem creates a new layer.

The Layer Stack

Imagine a stack of transparencies (overhead projector sheets).

Base Layer: The OS files (e.g., Ubuntu rootfs).
Middle Layers: Added files or modifications (e.g., apt-get install python).
Top Layer (Container Layer): The only writable layer.

When you look down from the top, you see the combined result. If a top layer has a file /etc/app.conf, it obscures any /etc/app.conf in the layers below it.

3. Deep Dive: OverlayFS

Most modern Linux systems use OverlayFS as the storage driver. It uses specific terminology to describe how it merges directories:

LowerDir (Read-Only): The image layers. There can be many of these.
UpperDir (Read-Write): The container layer where your runtime changes go.
Merged (Unified View): What the container process actually sees.
WorkDir: An internal directory used by OverlayFS for atomic operations.

Interactive: The Copy-on-Write Mechanism

UpperDir (Read-Write Container)

LowerDir 3 /app/config.json

LowerDir 2 Python Runtime

LowerDir 1 Base OS (Ubuntu)

Action: Edit Config

The app tries to write to /app/config.json.

              Waiting for input...
          

4. Copy-on-Write (CoW) Performance

The CoW strategy is brilliant for efficiency (you share the base Ubuntu image across 100 containers), but it comes with a write penalty.

When you modify a file for the first time:

Search: The storage driver searches through the layers.
Copy: It copies the entire file from the lower layer to the upper layer.
Write: It applies the change to the copy.

[!WARNING] Performance Hit: If you have a 1GB log file in a lower layer and you append 1 byte to it, Docker must copy the entire 1GB file up to the container layer first. Always store heavy-write data (databases, logs) in Volumes, which bypass the storage driver completely.

5. Inspecting Layers with `docker history`

You can see exactly how your image was built and the size of each layer.

docker history my-app:latest

Output Breakdown:

IMAGE	CREATED	CREATED BY	SIZE
`a3b4c5...`	2 mins ago	`CMD ["./app"]`	0B
`d1e2f3...`	2 mins ago	`COPY . .`	15MB
`123456...`	4 weeks ago	`/bin/sh -c apt-get update...`	120MB

Size 0B: Metadata changes (like CMD, ENV, WORKDIR) do not create a new filesystem layer. They only modify the image configuration JSON.
Missing ID: Intermediate layers in modern Docker builds might show <missing> if they were built on a different machine or if BuildKit is used, as it handles caching differently.

6. Summary

Immutability: Lower layers are read-only and shared.
Efficiency: Containers only take up space for the diff (UpperDir).
Performance: Write-heavy workloads belong in Volumes, not the container writable layer.

Image Layers & UnionFS

Image Layers & UnionFS

1. The Illusion of a Single File

2. Anatomy of an Image

The Layer Stack

3. Deep Dive: OverlayFS

Interactive: The Copy-on-Write Mechanism

Action: Edit Config

4. Copy-on-Write (CoW) Performance

5. Inspecting Layers with docker history

6. Summary

5. Inspecting Layers with `docker history`