**Git is NOT just a Version Control System.** It is a content-addressable filesystem with a VCS user interface written on top of it.
The Core Concept: Content-Addressable Storage
Imagine you have a huge box of unmarked photographs. If you organize them by where you put them (e.g., “Row 3, Column 4”), that’s location-addressable storage (like a regular hard drive). But if you write a unique mathematical fingerprint of the image itself on the back of each photo, and use that fingerprint to find it—even if it moves—that’s content-addressable storage.
Most version control systems store changes as a list of file-based changes. Git thinks differently. It stores data as a series of snapshots of a miniature filesystem.
Git is a key-value store.
- Key: The SHA-1 hash of the content.
- Value: The compressed content itself.
If you have a file with the content “hello world”, Git calculates the SHA-1 hash of that content (plus a header) and uses that hash as the filename to store the compressed data in the .git/objects directory.
The SHA-1 Hash
Git uses the SHA-1 hashing algorithm to generate a 40-character hexadecimal string. Ideally, every piece of content has a unique hash.
SHA-1(“blob 11\0hello world”) → 95d09f2b10159347eece71399a7e2e907ea3df4f
Many beginners assume Git stores only the differences (deltas) between versions of a file (e.g., "Line 5 changed from A to B"). This is false. Git stores a complete snapshot of the file every time you commit. If you commit a 10MB file, change one line, and commit again, Git creates a brand new 10MB Blob object representing the new state. We will explore how Git handles this efficiently later.
The Git Object Model
There are four primary types of objects in the Git database:
1. Blob (Binary Large Object)
- Stores: File content ONLY.
- Does NOT Store: Filename, permissions, or timestamp.
- Analogy: The raw data of a photograph, without the filename “vacation.jpg”.
2. Tree
- Stores: Directory structure.
- Content: A list of pointers to Blobs (files) or other Trees (subdirectories). It attaches filenames and permissions to the blobs.
- Analogy: A folder on your computer.
| File Mode | Type | Hash Pointer | Filename |
|---|---|---|---|
100644 |
blob | 95d09f2b10159347eece71399a7e2e907ea3df4f |
hello.txt |
040000 |
tree | 3a1b2c... |
src/ |
3. Commit
- Stores: A pointer to the top-level Tree (the root of your project).
- Metadata: Author, committer, timestamp, message.
- Parent(s): Pointers to previous commit objects.
- Analogy: A snapshot of the entire project at a specific moment.
| Key | Value |
|---|---|
tree |
3a1b2c... (Pointer to root Tree) |
parent |
f8e7d6... (Pointer to previous commit) |
author |
Alice <alice@example.com> 1620000000 -0400 |
committer |
Alice <alice@example.com> 1620000000 -0400 |
message |
Initial commit |
4. Tag
- Stores: A pointer to a specific commit.
- Metadata: Tagger name, message, GPG signature.
- Analogy: A sticky note on a specific page of a history book (e.g., “Version 1.0”).
Interactive: Git Object Hash Calculator
Type content below to see how Git generates the SHA-1 hash for a Blob object. Note how changing even one character completely changes the hash.
Visualizing the Object Graph
When you commit, you are creating a directed graph of these objects. Click the objects below to inspect their internal structure!
- The Commit points to the root Tree.
- The Tree points to Blobs (files) and other Trees.
- The Blobs contain the actual data.
If you change one byte in main.go, its SHA-1 hash changes. This forces a new Blob, which forces a new Tree, which forces a new Commit. This chain reaction guarantees the **integrity** of history.
Hardware Reality: How Git Stores Objects Efficiently
If Git takes a complete snapshot of every modified file for every commit, wouldn’t the repository size explode? To solve this, Git uses a two-phase storage mechanism:
- Loose Objects (Snapshots): When you create a file or modify it, Git stores it as a “loose object”—a complete, zlib-compressed snapshot of the file. This is fast for immediate operations.
- Packfiles (Deltas): Periodically (or when you run
git gcor push/pull), Git runs a garbage collection process. It takes many loose objects and combines them into a single compressedpackfile. During this packing phase, Git finally calculates the deltas (differences). It stores the most recent version of a file entirely, and stores older versions as reverse-deltas against the new one. This saves enormous amounts of disk space.
Check your own .git directory:
ls .git/objects
# 0a 1b 2c info pack
The 2-character directories are the first two characters of the SHA-1 hash. This “fan-out” strategy prevents having too many files in a single directory, which can slow down some filesystems.