Covering Indexes: The Holy Grail of Reads
In a standard index scan, Postgres does two things:
- Traverses the B-Tree to find the TID (pointer to the row).
- Visits the Heap (table) to fetch the actual row data.
Step 2 is random I/O and is slow. A Covering Index contains all the data required by a query, allowing Postgres to skip Step 2 entirely. This is called an Index-Only Scan.
1. The INCLUDE Clause
Before Postgres 11, to create a covering index, you had to add payload columns to the index key:
-- Old way (Pre-Postgres 11)
CREATE INDEX idx_old ON users (id, email, username);
This works, but it increases the size of the B-Tree’s internal nodes and can affect uniqueness constraints.
Postgres 11 introduced INCLUDE:
-- New way (Covering Index)
CREATE INDEX idx_users_cover
ON users (id)
INCLUDE (email, username);
Key Differences:
- Key Columns (
id): Used for tree navigation and uniqueness enforcing. - Included Columns (
email,username): Stored only in the leaf nodes as payload. They are not used for filtering or sorting, just for retrieval.
2. The Visibility Map: The Hidden Constraint
Even with a covering index, Postgres might still visit the heap. Why? MVCC.
Postgres needs to know if the row in the index is visible to your transaction (i.e., not deleted or updated by another active transaction). The index doesn’t store visibility information.
To avoid checking the heap for visibility, Postgres checks the Visibility Map.
- All Visible: If the bit for the page is set, Postgres knows all rows on that page are visible to everyone. Heap Skip!
- Not All Visible: If the bit is 0 (page is “dirty”), Postgres must visit the heap to check visibility.
The Role of VACUUM
VACUUM is responsible for updating the Visibility Map. If your table changes frequently and is not vacuumed often enough, Index-Only Scans will devolve into standard Index Scans.
3. Performance Comparison
Let’s look at EXPLAIN (ANALYZE, BUFFERS) output.
Scenario: Standard Index Scan
Query: SELECT email FROM users WHERE id = 5;
Index: CREATE INDEX ON users (id);
Index Scan using idx_users_id on users (cost=0.42..8.44 rows=1 width=32)
Index Cond: (id = 5)
Buffers: shared hit=1 read=1
hit=1: Index page.read=1: Heap page (Random I/O).
Scenario: Index-Only Scan
Index: CREATE INDEX ON users (id) INCLUDE (email);
Index Only Scan using idx_users_cover on users (cost=0.42..4.44 rows=1 width=32)
Index Cond: (id = 5)
Heap Fetches: 0
Buffers: shared hit=1
Heap Fetches: 0: The heap was never touched.read=0: Purely index access.
Hardware Reality: Random vs Sequential I/O
Why is skipping the Heap so important?
- Sequential I/O (Index Scan): Reading leaf nodes is mostly sequential. Modern NVMe SSDs can sustain 5-7 GB/s.
- Random I/O (Heap Fetch): Fetching row data from the heap requires jumping to random page locations. Even on SSDs, random 4K reads are drastically slower (e.g., 500 MB/s), and on HDDs, the seek latency (milliseconds) kills performance.
- The Difference: Eliminating heap fetches shifts your query from an IOPS-bound problem to a bandwidth-bound problem, which is orders of magnitude faster.
[!TIP] Pro Tip: FILLFACTOR for HOT Updates To keep your Visibility Map clean and sustain Index-Only Scans, set a lower
FILLFACTOR(e.g., 80-90%) on your table. This reserves free space on each page.
- Why? When you update a row, if the new version fits on the same page, Postgres performs a HOT (Heap-Only Tuple) Update.
- Benefit: HOT updates do not require modifying indexes, which keeps the index references valid and the Visibility Map stable.
4. When to Use Covering Indexes?
- Read-Heavy Workloads: Tables that are read frequently but updated less often.
- Specific Queries: You have a few queries that run millions of times a day (e.g., “Get User Profile by ID”).
- Avoid on Write-Heavy Tables: Adding columns to the index makes it larger and increases TOAST overhead if the values are large.
5. Covering Indexes: The Holy Grail of Reads
[!NOTE] This module explores the core principles of Covering Indexes, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
6. Visualizing the Flow
Diagram: Index Scan vs Index-Only Scan
Standard scan. Visits index, then visits heap for data.