File System Basics

[!NOTE] This module explores the core principles of File System Basics, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Persistent Abstraction

The Problem: Imagine you are building a database engine like Postgres. A running process is ephemeral—it lives entirely in RAM. When the process crashes or the server loses power, every byte of memory is wiped clean. How do you guarantee that a user’s transaction survives a sudden blackout?

To store data permanently and reliably, Operating Systems provide the File System abstraction. It bridges the gap between the chaotic, byte-addressable world of RAM and the slow, block-based reality of physical storage (HDDs/SSDs).

[!NOTE] “Everything is a file”: This Unix philosophy means that documents, directories, devices (keyboard, mouse), and even network sockets are represented as files. This unifies the API: you can read() from a file and read() from a socket using the same syscalls.

Anatomy of a File

At its core, a file is a linear array of bytes. The OS provides a cursor-based API, much like reading a book with a bookmark:

  1. Open: Creates a “session” (File Descriptor). The OS allocates a file table entry.
  2. Seek: Moves the cursor (Offset) to a specific byte.
  3. Read/Write: Transfers data and advances the cursor automatically.
  4. Close: Tears down the session, flushing buffers to disk.

Physical Reality vs Logical View

  • Logical View: A continuous, infinite stream of bytes (0 to N) that applications interact with.
  • Physical Reality: Data is scattered across the spinning platters of an HDD or the NAND flash chips of an SSD in discrete Blocks (usually 4KB). The File System’s primary job is mapping logical offsets (e.g., “Byte 5000”) to physical block addresses (e.g., “Block 42, Offset 904”).

This is a common interview topic. Understanding the difference requires understanding Inodes.

  • Inode (Index Node): The actual data structure on disk that describes the file (permissions, size, block locations). It has a unique number (Inode ID).
  • Filename: A string stored in a Directory that maps to an Inode ID.

A directory entry that points to an existing Inode.

  • If you have a file with 2 hard links, both names point to the same data.
  • Deleting one name just removes that entry. The data remains until the Reference Count hits 0.
  • Limitation: Cannot link across partitions (Inode numbers are local to a FS).

A special file that contains a path string to another file.

  • It has its own Inode.
  • If you delete the target, the link becomes “broken” (dangling).
  • Advantage: Can link across partitions and even to non-existent files.

Visualize how Hard Links and Soft Links interact with the Inode layer.

Directory Entries

file.txt → Inode 101
hard_link → Inode 101
soft_link → "file.txt"

Inode Table

Inode 101
Ref Count
1
Points to Data Blocks

System Ready. file.txt exists.

4. Code Examples: Managing Files

Modern languages provide abstractions, but they map to the same underlying syscalls (link, symlink, unlink).

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class FileLinks {
    public static void main(String[] args) throws IOException {
        Path target = Paths.get("data.txt");
        if (!Files.exists(target)) {
            Files.createFile(target);
        }

        // 1. Create a Hard Link
        // Both 'data.txt' and 'hard-link.txt' point to the same Inode
        Path hardLink = Paths.get("hard-link.txt");
        try {
            Files.createLink(hardLink, target);
            System.out.println("Hard link created");
        } catch (IOException e) {
            // Fails if on different partitions
            e.printStackTrace();
        }

        // 2. Create a Symbolic Link
        // 'soft-link.txt' points to the path "data.txt"
        Path softLink = Paths.get("soft-link.txt");
        Files.createSymbolicLink(softLink, target);
        System.out.println("Soft link created");

        // Cleanup
        Files.delete(target); // soft-link is now broken!
        // hard-link still works and holds the data.
    }
}
package main

import (
	"fmt"
	"os"
)

func main() {
	target := "data.txt"
	os.Create(target)

	// 1. Create a Hard Link
	// Corresponds to 'link()' syscall
	hardLink := "hard-link.txt"
	err := os.Link(target, hardLink)
	if err != nil {
		fmt.Println("Error creating hard link:", err)
	} else {
		fmt.Println("Hard link created")
	}

	// 2. Create a Symbolic Link
	// Corresponds to 'symlink()' syscall
	softLink := "soft-link.txt"
	err = os.Symlink(target, softLink)
	if err != nil {
		fmt.Println("Error creating soft link:", err)
	} else {
		fmt.Println("Soft link created")
	}

	// Cleanup
	os.Remove(target) // Soft link is broken
	// Data still exists via hardLink
}