MVCC Internals: The Physics of Concurrency

PostgreSQL handles concurrency using MVCC (Multi-Version Concurrency Control). Unlike traditional locking systems where a reader must block a writer (or vice versa) to ensure consistency, MVCC allows multiple versions of the same row to exist simultaneously in the database.

[!IMPORTANT] The Golden Rule of Postgres Concurrency: Readers never block writers, and writers never block readers.

This is achieved by treating data as immutable at the tuple (row) level. When you “update” a row, Postgres doesn’t overwrite the data in place on the disk. Instead, it marks the old row as “dead” and inserts a brand new version of the row.

1. The Tuple Header (The Metadata)

Every row in a Postgres table (a Tuple) carries a hidden header containing metadata used for visibility checks. This header is 23 bytes (usually padded to 24) and comes before your actual column data.

The most critical fields for MVCC are:

Field Size Description
xmin 4 bytes The XID that created (inserted) this tuple version.
xmax 4 bytes The XID that deleted (or updated) this tuple version. If 0, the tuple is live.
cmin/cmax 4 bytes Command identifiers within a transaction (e.g., 1st statement, 2nd statement).
ctid 6 bytes The physical location of the next version of this tuple (Page ID, Offset). Used to chain updates.
infomask 2 bytes Bitmask of boolean flags (e.g., “Hint Bits” like XMIN_COMMITTED).

Visualizing the Tuple Header

xmin
xmax
cid
ctid
User Data (Columns)
23-byte Header (Metadata) + Actual Data
## 2. Interactive Visibility Simulator How does Postgres decide if *you* (Transaction A) can see a specific row? It compares your Transaction ID (XID) and snapshot against the row's `xmin` and `xmax`. Experiment with this simulator to understand the visibility rules.

MVCC Time Machine

Current Global XID: 100

Scenario: You are an observer running SELECT * FROM users; inside Transaction 105.

Action

Observer (You)

Your XID: 105
You see rows where: 1. xmin < 105 (Committed) 2. xmax = 0 OR xmax ≥ 105
Table is empty.
## 3. Visibility Rules: The Logic When your transaction starts, Postgres takes a snapshot. This snapshot contains the `Transaction ID` of your transaction and a list of all transactions that were **active** (not committed yet) at that moment. For a tuple to be visible to you: 1. **Creation Check (`xmin`)**: * The transaction that created it (`xmin`) must have committed. * The `xmin` must be "in the past" relative to your snapshot. 2. **Deletion Check (`xmax`)**: * The tuple must **not** be deleted (`xmax` = 0). * OR, if it is deleted, the transaction that deleted it (`xmax`) must **not** be visible to you (it committed after your snapshot started or is still running). ### The "Dirty" Secret of UPDATE In Postgres, `UPDATE` is actually a syntactic sugar for `DELETE` + `INSERT`. When you run `UPDATE users SET name='Bob' WHERE id=1;`: 1. Postgres marks the **old tuple** (name='Alice') as dead by setting its `xmax` to the current transaction ID. 2. Postgres inserts a **new tuple** (name='Bob') with `xmin` set to the current transaction ID. 3. It updates the index to point to the new tuple. This is why `UPDATE` heavy workloads can be expensive—they generate "garbage" (dead tuples) just like inserts and deletes. ## 4. VACUUM and Bloat Since deleting a row only *marks* it as invisible (`xmax`), the data stays on disk. Over time, these dead tuples accumulate, causing Table Bloat. ### The Cleanup Crew: VACUUM The **VACUUM** process is responsible for reclaiming this space. * **Standard VACUUM**: Scans the table for dead tuples (where `xmax` is visible to everyone). It marks that space as "free" in the Free Space Map (FSM). The file size **does not shrink**, but the space is reused for future inserts. * **VACUUM FULL**: Rewrites the entire table to a new file, removing all bloat. This reclaims disk space but **locks the table exclusively**, blocking all reads and writes. * **Autovacuum**: A background daemon that wakes up periodically to run standard VACUUM on tables with significant activity. > [!WARNING] > **Never Disable Autovacuum** > Disabling autovacuum is a common production mistake. It leads to: > 1. Unbounded table growth (bloat). > 2. Index bloat (slower queries). > 3. **Transaction ID Wraparound** (database shutdown). ## 5. Transaction ID Wraparound (The Apocalypse) `xmin` and `xmax` are 32-bit integers. This means there are only ~4 billion IDs available. If the counter overflows, old transactions might suddenly appear to be in the "future" (invisible), causing data loss. To prevent this, Postgres uses a "frozen" XID concept and forces a special "Freeze" VACUUM to run on old data, replacing the `xmin` with a special `FrozenTransactionId` (2) to indicate "this is infinitely old and visible to everyone." ## 6. Summary 1. **Readers don't block writers**: Because readers see an older version of the row. 2. **Immutability**: Rows are never overwritten, only versioned. 3. **Space Cost**: MVCC requires extra storage for `xmin`/`xmax` headers and dead tuples. 4. **Maintenance**: Regular `VACUUM` is mandatory to recycle space occupied by dead tuples.