Linux Namespaces

[!NOTE] This module explores the core principles of Linux Namespaces, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Illusion of Isolation

To the Linux Kernel, a container is not a real object. It is an illusion created by two kernel features: Namespaces (what you see) and Cgroups (what you use).

Namespaces partition kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources.

The Kernel View: nsproxy

Every process in Linux is represented by a task_struct. This struct contains a pointer to nsproxy, which holds pointers to the specific namespaces the process belongs to.

  graph LR
      T1[task_struct (PID 123)] --> N1[nsproxy]
      T2[task_struct (PID 456)] --> N1
      T3[task_struct (PID 789)] --> N2[nsproxy (Container)]

      N1 --> M1[mnt_ns (Host)]
      N1 --> P1[pid_ns (Host)]
      N1 --> U1[uts_ns (Host)]

      N2 --> M2[mnt_ns (Container)]
      N2 --> P2[pid_ns (Container)]
      N2 --> U2[uts_ns (Container)]

      style T1 fill:var(--bg-card),stroke:var(--accent-main)
      style T2 fill:var(--bg-card),stroke:var(--accent-main)
      style T3 fill:var(--bg-card),stroke:var(--green-500)
      style N1 fill:var(--bg-soft),stroke:var(--text-muted)
      style N2 fill:var(--bg-soft),stroke:var(--text-muted)
  

When you run docker run, Docker simply asks the kernel to create a new process with new pointers in nsproxy.

2. The 7 Namespaces

Namespace Constant Isolates
PID CLONE_NEWPID Process IDs (PID 1 inside container)
NET CLONE_NEWNET Network devices, stacks, ports (own localhost)
MNT CLONE_NEWNS Mount points (own / filesystem)
UTS CLONE_NEWUTS Hostname and NIS domain name
IPC CLONE_NEWIPC System V IPC, POSIX message queues
USER CLONE_NEWUSER User and Group IDs (Root inside, Nobody outside)
CGROUP CLONE_NEWCGROUP Cgroup root directory view

3. Interactive: Namespace Explorer

Visualize how toggling namespaces changes the process’s view of the system.

Current Process View
$ ps aux
PID COMMAND 1 systemd 500 dockerd 999 bash (current)
$ ip addr
2: eth0: <BROADCAST> 192.168.1.50
$ ls /
bin boot dev etc home lib (Host Root)

4. Code Examples

1. Go Implementation (System Programming)

In Go, we use syscall.SysProcAttr to request new namespaces when cloning a process. This is exactly what runc does under the hood.

package main

import (
	"fmt"
	"os"
	"os/exec"
	"syscall"
)

func main() {
	// Re-run this binary with "child" argument to act as the container process
	if len(os.Args) > 1 && os.Args[1] == "child" {
		runChild()
		return
	}

	cmd := exec.Command("/proc/self/exe", "child")
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

	// REQUEST NEW NAMESPACES
	// CLONE_NEWUTS: New Hostname Namespace
	// CLONE_NEWPID: New PID Namespace
	// CLONE_NEWNS:  New Mount Namespace
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
	}

	if err := cmd.Run(); err != nil {
		fmt.Printf("Error running child: %v\n", err)
		os.Exit(1)
	}
}

func runChild() {
	// Set a new hostname visible ONLY in this namespace
	syscall.Sethostname([]byte("container-demo"))

	// Verify we are PID 1
	fmt.Printf("Running inside container as PID %d\n", os.Getpid())

	cmd := exec.Command("/bin/bash")
	cmd.Stdin = os.Stdin
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr
	cmd.Run()
}

2. Java Implementation (JVM Wrapper)

Java sits on top of the JVM, which sits on top of the OS. Java does not have direct access to clone(2) flags. To create a namespace from Java, we must invoke an external tool like unshare via ProcessBuilder.

This demonstrates the “Systems Programming Gap” in Java compared to Go or C.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

public class NamespaceDemo {

    public static void main(String[] args) throws Exception {
        // We cannot call clone() directly.
        // Instead, we wrap our command with 'unshare'.
        // unshare -u -p -f --mount-proc /bin/bash

        List<String> command = new ArrayList<>();
        command.add("sudo"); // Namespaces require CAP_SYS_ADMIN
        command.add("unshare");
        command.add("--uts");        // -u: UTS namespace
        command.add("--pid");        // -p: PID namespace
        command.add("--fork");       // -f: Fork new process
        command.add("--mount-proc"); // Mount /proc for PID visibility
        command.add("bash");         // The command to run inside

        ProcessBuilder pb = new ProcessBuilder(command);
        pb.inheritIO(); // Connect to our terminal

        System.out.println("Starting shell in new Namespace...");
        Process p = pb.start();
        int exitCode = p.waitFor();

        System.out.println("Container exited with code: " + exitCode);
    }
}

[!NOTE] Why Go? Docker was written in Go precisely because Go allows easy access to low-level Linux syscalls (syscall package) while maintaining high-level productivity. Java requires JNI (Java Native Interface) or external commands to achieve the same isolation.

5. First Principles: Why do we need nsproxy?

Why couldn’t Linux just add a “Container ID” to the process struct?

  1. Flexibility: Some processes need to share Network but hide PIDs (e.g., Kubernetes Pods). By having separate pointers for each namespace type in nsproxy, the kernel allows mixing and matching (the “sidecar” pattern).
  2. Legacy Compatibility: Applications don’t need to be rewritten. A web server just binds to port 80. It doesn’t know (or care) that “port 80” is virtualized inside a NET namespace.

This design decision enabled the entire container ecosystem to support existing software without modification.