Mental Model

**Git is NOT just a Version Control System.** It is a content-addressable filesystem with a VCS user interface written on top of it.

The Core Concept: Content-Addressable Storage

Imagine you have a huge box of unmarked photographs. If you organize them by where you put them (e.g., “Row 3, Column 4”), that’s location-addressable storage (like a regular hard drive). But if you write a unique mathematical fingerprint of the image itself on the back of each photo, and use that fingerprint to find it—even if it moves—that’s content-addressable storage.

Most version control systems store changes as a list of file-based changes. Git thinks differently. It stores data as a series of snapshots of a miniature filesystem.

Git is a key-value store.

  • Key: The SHA-1 hash of the content.
  • Value: The compressed content itself.

If you have a file with the content “hello world”, Git calculates the SHA-1 hash of that content (plus a header) and uses that hash as the filename to store the compressed data in the .git/objects directory.

The SHA-1 Hash

Git uses the SHA-1 hashing algorithm to generate a 40-character hexadecimal string. Ideally, every piece of content has a unique hash.

SHA-1(“blob 11\0hello world”) → 95d09f2b10159347eece71399a7e2e907ea3df4f

Common Pitfall: Snapshots vs. Deltas

Many beginners assume Git stores only the differences (deltas) between versions of a file (e.g., "Line 5 changed from A to B"). This is false. Git stores a complete snapshot of the file every time you commit. If you commit a 10MB file, change one line, and commit again, Git creates a brand new 10MB Blob object representing the new state. We will explore how Git handles this efficiently later.

The Git Object Model

There are four primary types of objects in the Git database:

1. Blob (Binary Large Object)

  • Stores: File content ONLY.
  • Does NOT Store: Filename, permissions, or timestamp.
  • Analogy: The raw data of a photograph, without the filename “vacation.jpg”.

2. Tree

  • Stores: Directory structure.
  • Content: A list of pointers to Blobs (files) or other Trees (subdirectories). It attaches filenames and permissions to the blobs.
  • Analogy: A folder on your computer.
File Mode Type Hash Pointer Filename
100644 blob 95d09f2b10159347eece71399a7e2e907ea3df4f hello.txt
040000 tree 3a1b2c... src/

3. Commit

  • Stores: A pointer to the top-level Tree (the root of your project).
  • Metadata: Author, committer, timestamp, message.
  • Parent(s): Pointers to previous commit objects.
  • Analogy: A snapshot of the entire project at a specific moment.
Key Value
tree 3a1b2c... (Pointer to root Tree)
parent f8e7d6... (Pointer to previous commit)
author Alice <alice@example.com> 1620000000 -0400
committer Alice <alice@example.com> 1620000000 -0400
message Initial commit

4. Tag

  • Stores: A pointer to a specific commit.
  • Metadata: Tagger name, message, GPG signature.
  • Analogy: A sticky note on a specific page of a history book (e.g., “Version 1.0”).

Interactive: Git Object Hash Calculator

Type content below to see how Git generates the SHA-1 hash for a Blob object. Note how changing even one character completely changes the hash.

blob 11\0
Wait...

Visualizing the Object Graph

When you commit, you are creating a directed graph of these objects. Click the objects below to inspect their internal structure!

Commit e4f5g6h Tree 3a1b2c Blob README.md Blob main.go
Click an object above to inspect its internal Git representation.
  1. The Commit points to the root Tree.
  2. The Tree points to Blobs (files) and other Trees.
  3. The Blobs contain the actual data.
Important

If you change one byte in main.go, its SHA-1 hash changes. This forces a new Blob, which forces a new Tree, which forces a new Commit. This chain reaction guarantees the **integrity** of history.

Hardware Reality: How Git Stores Objects Efficiently

If Git takes a complete snapshot of every modified file for every commit, wouldn’t the repository size explode? To solve this, Git uses a two-phase storage mechanism:

  1. Loose Objects (Snapshots): When you create a file or modify it, Git stores it as a “loose object”—a complete, zlib-compressed snapshot of the file. This is fast for immediate operations.
  2. Packfiles (Deltas): Periodically (or when you run git gc or push/pull), Git runs a garbage collection process. It takes many loose objects and combines them into a single compressed packfile. During this packing phase, Git finally calculates the deltas (differences). It stores the most recent version of a file entirely, and stores older versions as reverse-deltas against the new one. This saves enormous amounts of disk space.

Check your own .git directory:

ls .git/objects
# 0a  1b  2c  info  pack

The 2-character directories are the first two characters of the SHA-1 hash. This “fan-out” strategy prevents having too many files in a single directory, which can slow down some filesystems.