Seccomp & AppArmor: Kernel Hardening

A container is just a process. By default, it can make almost any System Call (syscall) to the kernel. This is a massive attack surface. If an attacker finds a kernel bug in a syscall like keyctl or acct, they can crash the host or escape the container.

To lock this down, we use Seccomp (Syscall filtering) and AppArmor (Filesystem access control).

1. Seccomp: The Firewall for System Calls

Seccomp (Secure Computing Mode) allows you to whitelist or blacklist specific system calls.

  • Default Profile: Docker applies a default profile that blocks ~44 dangerously obscure syscalls (out of ~300+).
  • Custom Profile: You can create a stricter JSON profile to allow only what your app needs.

How it Works

  1. Container Process requests mkdir("foo").
  2. Kernel checks Seccomp filter.
  3. If mkdir is ALLOWED → Execute.
  4. If mkdir is BLOCKED → Return EPERM (Operation not permitted) or Kill process.

2. Interactive: Syscall Firewall Simulator

Act as the Kernel. Decide which system calls to Allow or Block based on the security policy.

Incoming Syscall:
waiting...
Click Start to begin
[KERNEL] System ready. Seccomp active.

3. Seccomp Implementation

1. Create a Profile (seccomp.json)

This profile whitelists specific calls and denies everything else (SCMP_ACT_ERRNO).

{
    "defaultAction": "SCMP_ACT_ERRNO",
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
    ],
    "syscalls": [
        {
            "names": [
                "accept",
                "bind",
                "clone",
                "execve",
                "exit",
                "exit_group",
                "listen",
                "read",
                "socket",
                "write"
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}

2. Apply the Profile

docker run --security-opt seccomp=seccomp.json nginx

If the container tries to run mkdir (which is not in the list), it will fail with “Operation not permitted”.


4. AppArmor: Filesystem Protection

While Seccomp restricts actions (syscalls), AppArmor restricts resources (files, networks). It uses profiles loaded into the kernel.

Example Profile (docker-nginx)

Save as /etc/apparmor.d/docker-nginx:

#include <tunables/global>

profile docker-nginx flags=(attach_disconnected,mediate_deleted) {
  # Include base Docker abstractions
  #include <abstractions/base>

  # Network Access
  network inet tcp,
  network inet udp,
  network inet icmp,

  # Deny writing to /etc/
  deny /etc/** w,

  # Allow reading web files
  /usr/share/nginx/html/ r,
  /usr/share/nginx/html/** r,

  # Allow writing logs
  /var/log/nginx/* w,

  # Allow pid file
  /run/nginx.pid w,
}

Apply AppArmor Profile

  1. Load the profile:
    sudo apparmor_parser -r -W /etc/apparmor.d/docker-nginx
    
  2. Run container with profile:
    docker run --security-opt apparmor=docker-nginx nginx
    

5. Comparison: Defense in Depth

Feature Seccomp AppArmor
Scope Kernel System Calls File Paths, Network, Capabilities
Granularity Low (allows/blocks entire syscall) High (allows specific file paths)
Mechanism BPF Filters MAC (Mandatory Access Control)
Example “Block reboot() “Block write to /etc/shadow

[!TIP] Use Both! They are complementary. Seccomp prevents kernel exploits. AppArmor prevents filesystem tampering. Together, they create a formidable sandbox.