MVCC Internals: The Physics of Concurrency

[!NOTE] This module explores the core principles of the subject, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

PostgreSQL handles concurrency using MVCC (Multi-Version Concurrency Control). Unlike traditional locking systems where a reader must block a writer (or vice versa) to ensure consistency, MVCC allows multiple versions of the same row to exist simultaneously in the database.

[!IMPORTANT] The Golden Rule of Postgres Concurrency: Readers never block writers, and writers never block readers.

This is achieved by treating data as immutable at the tuple (row) level. When you “update” a row, Postgres doesn’t overwrite the data in place on the disk. Instead, it marks the old row as “dead” and inserts a brand new version of the row.

1. The Tuple Header (The Metadata)

Every row in a Postgres table (a Tuple) carries a hidden header containing metadata used for visibility checks. This header is 23 bytes (usually padded to 24) and comes before your actual column data.

The most critical fields for MVCC are:

Field Size Description
xmin 4 bytes The XID that created (inserted) this tuple version.
xmax 4 bytes The XID that deleted (or updated) this tuple version. If 0, the tuple is live.
cmin/cmax 4 bytes Command identifiers within a transaction (e.g., 1st statement, 2nd statement).
ctid 6 bytes The physical location of the next version of this tuple (Page ID, Offset). Used to chain updates.
infomask 2 bytes Bitmask of boolean flags (e.g., “Hint Bits” like XMIN_COMMITTED).

Visualizing the Tuple Header

xmin
xmax
cid
ctid
User Data (Columns)
23-byte Header (Metadata) + Actual Data

2. Interactive Visibility Simulator

How does Postgres decide if you (Transaction A) can see a specific row? It compares your Transaction ID (XID) and snapshot against the row’s xmin and xmax.

Experiment with this simulator to understand the visibility rules.

MVCC Time Machine

Current Global XID: 100

Scenario: You are an observer running SELECT * FROM users; inside Transaction 105.

Action

Observer (You)

Your XID: 105
You see rows where: 1. xmin < 105 (Committed) 2. xmax = 0 OR xmax ≥ 105
Table is empty.

3. Visibility Rules: The Logic

When your transaction starts, Postgres takes a snapshot. This snapshot contains the Transaction ID of your transaction and a list of all transactions that were active (not committed yet) at that moment.

For a tuple to be visible to you:

  1. Creation Check (xmin):
    • The transaction that created it (xmin) must have committed.
    • The xmin must be “in the past” relative to your snapshot.
  2. Deletion Check (xmax):
    • The tuple must not be deleted (xmax = 0).
    • OR, if it is deleted, the transaction that deleted it (xmax) must not be visible to you (it committed after your snapshot started or is still running).

The “Dirty” Secret of UPDATE

In Postgres, UPDATE is actually a syntactic sugar for DELETE + INSERT.

When you run UPDATE users SET name='Bob' WHERE id=1;:

  1. Postgres marks the old tuple (name=’Alice’) as dead by setting its xmax to the current transaction ID.
  2. Postgres inserts a new tuple (name=’Bob’) with xmin set to the current transaction ID.
  3. It updates the index to point to the new tuple.

This is why UPDATE heavy workloads can be expensive—they generate “garbage” (dead tuples) just like inserts and deletes.

4. VACUUM and Bloat

Since deleting a row only marks it as invisible (xmax), the data stays on disk. Over time, these dead tuples accumulate, causing Table Bloat.

The Cleanup Crew: VACUUM

The VACUUM process is responsible for reclaiming this space.

  • Standard VACUUM: Scans the table for dead tuples (where xmax is visible to everyone). It marks that space as “free” in the Free Space Map (FSM). The file size does not shrink, but the space is reused for future inserts.
  • VACUUM FULL: Rewrites the entire table to a new file, removing all bloat. This reclaims disk space but locks the table exclusively, blocking all reads and writes.
  • Autovacuum: A background daemon that wakes up periodically to run standard VACUUM on tables with significant activity.

[!WARNING] Never Disable Autovacuum Disabling autovacuum is a common production mistake. It leads to:

  1. Unbounded table growth (bloat).
  2. Index bloat (slower queries).
  3. Transaction ID Wraparound (database shutdown).

5. Transaction ID Wraparound (The Apocalypse)

xmin and xmax are 32-bit integers. This means there are only ~4 billion IDs available. If the counter overflows, old transactions might suddenly appear to be in the “future” (invisible), causing data loss.

To prevent this, Postgres uses a “frozen” XID concept and forces a special “Freeze” VACUUM to run on old data, replacing the xmin with a special FrozenTransactionId (2) to indicate “this is infinitely old and visible to everyone.”

6. Summary

  1. Readers don’t block writers: Because readers see an older version of the row.
  2. Immutability: Rows are never overwritten, only versioned.
  3. Space Cost: MVCC requires extra storage for xmin/xmax headers and dead tuples.
  4. Maintenance: Regular VACUUM is mandatory to recycle space occupied by dead tuples.