Journaling File Systems
[!NOTE] This module explores the core principles of Journaling File Systems, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Crash Consistency Problem
File system operations are not atomic by default. Creating a file involves multiple disk writes:
- Allocate a new Inode (Write to Inode Bitmap).
- Update the Directory (Write to Directory Data Block).
- Initialize the Inode (Write to Inode Table).
The Scenario: The power fails after step 1 but before step 2.
- Result: The Inode is marked “used”, but no directory points to it. It is a Leaked Resource.
- Worse: If power fails while overwriting a pointer, you might point to garbage data.
The Old Solution: FSCK (File System Check)
On reboot, scan the entire disk.
- Check every Inode, every Bitmap, every Directory.
- Fix inconsistencies (e.g., free leaked inodes).
- Problem: On a 10TB drive, this takes hours.
2. The Modern Solution: Journaling (WAL)
Write-Ahead Logging (WAL): Never modify the disk structure in place without first writing down what you are going to do.
The Protocol
- Journal Write: Write a transaction to a circular log (The Journal).
- “TxBegin: Alloc Inode 10, Add ‘file.txt’ to Dir 5.”
- Journal Commit: Write a special “TxEnd” block. The transaction is now Durable.
- Checkpoint: Perform the actual updates to the main file system structures.
- Free: Mark the journal space as free.
Recovery Algorithm
On reboot, simply read the Journal.
- If a transaction has
TxBeginbut noTxEnd: Discard it (It never happened). - If a transaction has
TxEnd: Replay it (Redo the writes).
Recovery takes seconds, regardless of disk size.
3. Interactive: The Crash Simulator
Simulate a power failure during a file write operation.
The Journal (WAL)
Main Disk
4. Code Example: Forcing Durability
In code, writing to a file (write) usually just writes to the OS Page Cache (Memory). It is NOT safe. To ensure the Journal Protocol is triggered, you must fsync.
package main
import (
"log"
"os"
)
func main() {
// 1. Open file (O_RDWR | O_CREATE)
f, err := os.OpenFile("important.log", os.O_RDWR|os.O_CREATE, 0644)
if err != nil {
log.Fatal(err)
}
defer f.Close()
// 2. Write data (Writes to Memory Buffer only!)
f.WriteString("Transaction Data")
// 3. Sync to Disk (Triggers Journal Commit)
// This blocks until the data is physically on the platter/NAND.
err = f.Sync()
if err != nil {
log.Fatal("Failed to persist to disk!")
}
log.Println("Data is Durable.")
}
import java.io.FileOutputStream;
import java.io.FileDescriptor;
import java.io.IOException;
public class DurableWrite {
public static void main(String[] args) throws IOException {
FileOutputStream fos = new FileOutputStream("important.log");
// 1. Write Data (Buffered in memory)
fos.write("Transaction Data".getBytes());
// 2. Flush Java Buffer to OS Buffer
fos.flush();
// 3. Sync OS Buffer to Disk
FileDescriptor fd = fos.getFD();
fd.sync(); // The critical step for durability
System.out.println("Data is Durable.");
fos.close();
}
}