Journaling File Systems

[!NOTE] This module explores the core principles of Journaling File Systems, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Crash Consistency Problem

File system operations are not atomic by default. Creating a file involves multiple disk writes:

  1. Allocate a new Inode (Write to Inode Bitmap).
  2. Update the Directory (Write to Directory Data Block).
  3. Initialize the Inode (Write to Inode Table).

The Scenario: The power fails after step 1 but before step 2.

  • Result: The Inode is marked “used”, but no directory points to it. It is a Leaked Resource.
  • Worse: If power fails while overwriting a pointer, you might point to garbage data.

The Old Solution: FSCK (File System Check)

On reboot, scan the entire disk.

  • Check every Inode, every Bitmap, every Directory.
  • Fix inconsistencies (e.g., free leaked inodes).
  • Problem: On a 10TB drive, this takes hours.

2. The Modern Solution: Journaling (WAL)

Write-Ahead Logging (WAL): Never modify the disk structure in place without first writing down what you are going to do.

The Protocol

  1. Journal Write: Write a transaction to a circular log (The Journal).
    • “TxBegin: Alloc Inode 10, Add ‘file.txt’ to Dir 5.”
  2. Journal Commit: Write a special “TxEnd” block. The transaction is now Durable.
  3. Checkpoint: Perform the actual updates to the main file system structures.
  4. Free: Mark the journal space as free.

Recovery Algorithm

On reboot, simply read the Journal.

  • If a transaction has TxBegin but no TxEnd: Discard it (It never happened).
  • If a transaction has TxEnd: Replay it (Redo the writes).

Recovery takes seconds, regardless of disk size.


3. Interactive: The Crash Simulator

Simulate a power failure during a file write operation.

1. Log Write
2. Commit
3. Checkpoint

The Journal (WAL)

Empty

Main Disk

Old Data: "AAA"

System Normal.

4. Code Example: Forcing Durability

In code, writing to a file (write) usually just writes to the OS Page Cache (Memory). It is NOT safe. To ensure the Journal Protocol is triggered, you must fsync.

package main

import (
	"log"
	"os"
)

func main() {
	// 1. Open file (O_RDWR | O_CREATE)
	f, err := os.OpenFile("important.log", os.O_RDWR|os.O_CREATE, 0644)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	// 2. Write data (Writes to Memory Buffer only!)
	f.WriteString("Transaction Data")

	// 3. Sync to Disk (Triggers Journal Commit)
	// This blocks until the data is physically on the platter/NAND.
	err = f.Sync()
	if err != nil {
		log.Fatal("Failed to persist to disk!")
	}

	log.Println("Data is Durable.")
}
import java.io.FileOutputStream;
import java.io.FileDescriptor;
import java.io.IOException;

public class DurableWrite {
    public static void main(String[] args) throws IOException {
        FileOutputStream fos = new FileOutputStream("important.log");

        // 1. Write Data (Buffered in memory)
        fos.write("Transaction Data".getBytes());

        // 2. Flush Java Buffer to OS Buffer
        fos.flush();

        // 3. Sync OS Buffer to Disk
        FileDescriptor fd = fos.getFD();
        fd.sync(); // The critical step for durability

        System.out.println("Data is Durable.");
        fos.close();
    }
}