Containers vs Virtual Machines: The First Principles

[!NOTE] Key Takeaway: A Virtual Machine virtualizes the Hardware. A Container virtualizes the Operating System.

1. The Fundamental Difference

When you run a Virtual Machine (VM), you are essentially running a computer inside a computer. The Hypervisor mimics the CPU, RAM, and Disk, tricking the Guest OS into thinking it has its own hardware. This is heavy: every VM needs a full kernel, init system, and drivers.

When you run a Container, you are just running a Process (like Chrome or Spotify). The magic is that the Linux Kernel lies to this process, telling it: “You are the only process on this machine.”

Interactive: The Layer Visualizer

Click the tabs below to visualize the architectural difference. Notice how the Container stack skips the Guest OS and Hypervisor layers, sitting directly on the Host Kernel.

App A
Libs / Bins
Guest OS Kernel
Hypervisor
Host Infrastructure

2. Linux Primitives: The Secret Sauce

Docker is not magic. It is a user-friendly wrapper around two Linux Kernel features: Namespaces and Control Groups (cgroups).

1. Namespaces (Isolation)

Namespaces answer the question: “What can I see?” They partition kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set.

  • PID Namespace: “I am PID 1.” (Even if the host sees you as PID 14355)
  • NET Namespace: “I have my own eth0 and IP address.”
  • MNT Namespace: “I see a different root file system.”
  • UTS Namespace: “I have my own hostname.”

Code: Proving Isolation with Go

Let’s use Go’s syscall package to create a process in a new UTS (Unix Time Sharing) namespace. This allows the child process to change its hostname without affecting the host.

// main.go
package main

import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
)

func main() {
	if len(os.Args) < 2 {
		panic("missing command")
	}

	switch os.Args[1] {
	case "run":
		run()
	case "child":
		child()
	default:
		panic("help")
	}
}

func run() {
	// /proc/self/exe is a special file pointing to the currently running executable
	cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)

	// CLONE_NEWUTS: Create a new UTS namespace (hostname isolation)
	// CLONE_NEWPID: Create a new PID namespace
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
	}

	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Printf("Error: %v\n", err)
		os.Exit(1)
	}
}

func child() {
	fmt.Printf("Running %v as PID %d\n", os.Args[2:], os.Getpid())

	// Change hostname INSIDE the container
	syscall.Sethostname([]byte("container"))

	cmd := exec.Command(os.Args[2], os.Args[3:]...)
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	if err := cmd.Run(); err != nil {
		fmt.Printf("Error: %v\n", err)
		os.Exit(1)
	}
}

[!TIP] Run this on Linux: go run main.go run /bin/bash. Inside the new shell, type hostname. You’ll see “container”. Exit and check the host’s hostname—it remains unchanged! This is exactly how Docker works.

2. Control Groups (Cgroups) (Limits)

Cgroups answer the question: “What can I use?” They limit, account for, and isolate the resource usage (CPU, memory, disk I/O, network) of a collection of processes.

  • Memory Limit: Prevent a container from eating all RAM.
  • CPU Shares: Guarantee CPU time even under load.
  • OOM Killer: If a container exceeds its memory limit, the kernel kills only that container, not the whole system.

3. Images vs Containers

The distinction is identical to Class vs Object in Object-Oriented Programming.

Concept Docker Term Description
Class Image A read-only template. It contains the OS files, code, and libraries. Built from layers.
Object Container A runnable instance of an image. It adds a thin Read/Write layer on top of the image.

The Copy-On-Write (COW) Mechanism

Docker uses a Union File System (OverlayFS). When a container starts, it doesn’t copy the 500MB image. It just references it.

  • Read: Reads directly from the image layers.
  • Write: If the container needs to modify a file (e.g., /etc/nginx/nginx.conf), it copies that single file up to its writable layer and modifies it there.

This makes starting containers nearly instant. They don’t need to duplicate disk space until they modify data.