[!TIP] Git is NOT just a Version Control System. It is a content-addressable filesystem with a VCS user interface written on top of it.

The Core Concept: Content-Addressable Storage

Most version control systems store changes as a list of file-based changes. Git thinks differently. It stores data as a series of snapshots of a miniature filesystem.

Git is a key-value store.

  • Key: The SHA-1 hash of the content.
  • Value: The compressed content itself.

If you have a file with the content “hello world”, Git calculates the SHA-1 hash of that content (plus a header) and uses that hash as the filename to store the compressed data in the .git/objects directory.

The SHA-1 Hash

Git uses the SHA-1 hashing algorithm to generate a 40-character hexadecimal string. Ideally, every piece of content has a unique hash.

\[\text{SHA-1("blob 11\0hello world")} \rightarrow \texttt{95d09f2b10159347eece71399a7e2e907ea3df4f}\]

The Git Object Model

There are four primary types of objects in the Git database:

1. Blob (Binary Large Object)

  • Stores: File content ONLY.
  • Does NOT Store: Filename, permissions, or timestamp.
  • Analogy: The raw data of a photograph, without the filename “vacation.jpg”.

2. Tree

  • Stores: Directory structure.
  • Content: A list of pointers to Blobs (files) or other Trees (subdirectories). It attaches filenames and permissions to the blobs.
  • Analogy: A folder on your computer.

3. Commit

  • Stores: A pointer to the top-level Tree (the root of your project).
  • Metadata: Author, committer, timestamp, message.
  • Parent(s): Pointers to previous commit objects.
  • Analogy: A snapshot of the entire project at a specific moment.

4. Tag

  • Stores: A pointer to a specific commit.
  • Metadata: Tagger name, message, GPG signature.
  • Analogy: A sticky note on a specific page of a history book (e.g., “Version 1.0”).

Interactive: Git Object Hash Calculator

Type content below to see how Git generates the SHA-1 hash for a Blob object. Note how changing even one character completely changes the hash.

blob 11\0
Wait...

Visualizing the Object Graph

When you commit, you are creating a directed graph of these objects.

Commit e4f5g6h... Tree 3a1b2c... Blob README.md Blob main.go
  1. The Commit points to the root Tree.
  2. The Tree points to Blobs (files) and other Trees.
  3. The Blobs contain the actual data.

[!IMPORTANT] If you change one byte in main.go, its SHA-1 hash changes. This forces a new Blob, which forces a new Tree, which forces a new Commit. This chain reaction guarantees the integrity of history.

Hardware Reality: How Git Stores Objects

Git doesn’t just dump these files on disk. To save space:

  1. Zlib Compression: Before storing, Git compresses the content with zlib.
  2. Packfiles: Periodically, Git combines many loose objects into a single compressed packfile to save inodes and disk space.

Check your own .git directory:

ls .git/objects
# 0a  1b  2c  info  pack

The 2-character directories are the first two characters of the SHA-1 hash. This “fan-out” strategy prevents having too many files in a single directory, which can slow down some filesystems.