[!TIP] Git is NOT just a Version Control System. It is a content-addressable filesystem with a VCS user interface written on top of it.
The Core Concept: Content-Addressable Storage
Most version control systems store changes as a list of file-based changes. Git thinks differently. It stores data as a series of snapshots of a miniature filesystem.
Git is a key-value store.
- Key: The SHA-1 hash of the content.
- Value: The compressed content itself.
If you have a file with the content “hello world”, Git calculates the SHA-1 hash of that content (plus a header) and uses that hash as the filename to store the compressed data in the .git/objects directory.
The SHA-1 Hash
Git uses the SHA-1 hashing algorithm to generate a 40-character hexadecimal string. Ideally, every piece of content has a unique hash.
\[\text{SHA-1("blob 11\0hello world")} \rightarrow \texttt{95d09f2b10159347eece71399a7e2e907ea3df4f}\]The Git Object Model
There are four primary types of objects in the Git database:
1. Blob (Binary Large Object)
- Stores: File content ONLY.
- Does NOT Store: Filename, permissions, or timestamp.
- Analogy: The raw data of a photograph, without the filename “vacation.jpg”.
2. Tree
- Stores: Directory structure.
- Content: A list of pointers to Blobs (files) or other Trees (subdirectories). It attaches filenames and permissions to the blobs.
- Analogy: A folder on your computer.
3. Commit
- Stores: A pointer to the top-level Tree (the root of your project).
- Metadata: Author, committer, timestamp, message.
- Parent(s): Pointers to previous commit objects.
- Analogy: A snapshot of the entire project at a specific moment.
4. Tag
- Stores: A pointer to a specific commit.
- Metadata: Tagger name, message, GPG signature.
- Analogy: A sticky note on a specific page of a history book (e.g., “Version 1.0”).
Interactive: Git Object Hash Calculator
Type content below to see how Git generates the SHA-1 hash for a Blob object. Note how changing even one character completely changes the hash.
Visualizing the Object Graph
When you commit, you are creating a directed graph of these objects.
- The Commit points to the root Tree.
- The Tree points to Blobs (files) and other Trees.
- The Blobs contain the actual data.
[!IMPORTANT] If you change one byte in
main.go, its SHA-1 hash changes. This forces a new Blob, which forces a new Tree, which forces a new Commit. This chain reaction guarantees the integrity of history.
Hardware Reality: How Git Stores Objects
Git doesn’t just dump these files on disk. To save space:
- Zlib Compression: Before storing, Git compresses the content with zlib.
- Packfiles: Periodically, Git combines many loose objects into a single compressed
packfileto save inodes and disk space.
Check your own .git directory:
ls .git/objects
# 0a 1b 2c info pack
The 2-character directories are the first two characters of the SHA-1 hash. This “fan-out” strategy prevents having too many files in a single directory, which can slow down some filesystems.