Execution Plans
[!NOTE] This module explores the core principles of Execution Plans, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Introduction to Query Planning
When you submit a SQL query to Postgres, it doesn’t just execute it blindly. Instead, the Query Planner (or Optimizer) analyzes your query and determines the most efficient way to retrieve the requested data. The result of this process is an Execution Plan.
Understanding how to read and interpret these plans is the single most important skill for optimizing Postgres performance. It tells you how the database is executing your query, allowing you to identify bottlenecks like missing indexes, inefficient joins, or poor statistics.
2. The EXPLAIN Command
The tool we use to view these plans is the EXPLAIN command.
Basic Syntax
EXPLAIN SELECT * FROM users WHERE age > 25;
This returns a text description of the plan without actually running the query.
EXPLAIN ANALYZE
To see what actually happens when the query runs, use EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM users WHERE age > 25;
[!WARNING]
EXPLAIN ANALYZEexecutes the query. Be careful when running this withDELETE,UPDATE, orINSERTstatements, as data modifications will be committed! Use a transaction withROLLBACKto test safely:BEGIN; EXPLAIN ANALYZE DELETE FROM users WHERE last_login < '2020-01-01'; ROLLBACK;
Output Formats
You can request the plan in different formats like JSON, YAML, or XML for easier parsing by tools:
EXPLAIN (FORMAT JSON) SELECT * FROM users;
3. Anatomy of a Plan
A query plan is a tree structure of nodes. Each node represents an operation (like a table scan, a join, or a sort). The bottom nodes fetch data, and the upper nodes process it.
Here is a typical output:
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on users (cost=0.00..18.50 rows=850 width=36)
Filter: (age > 25)
Key Metrics
Every node in the plan provides key estimates:
- Cost (
cost=0.00..18.50):- Start-up Cost (
0.00): The cost to fetch the first row. For a sequential scan, this is usually 0. For a sort, it’s high because all rows must be processed before the first one is returned. - Total Cost (
18.50): The estimated cost to fetch all rows. The unit is arbitrary (disk page fetches), where 1.0 is typically the cost of reading one sequential page.
- Start-up Cost (
- Rows (
rows=850):- The estimated number of rows this node will output. This is derived from table statistics.
- Width (
width=36):- The estimated average size (in bytes) of each row output by this node.
4. Scan Types
The way Postgres retrieves data from a table is called a Scan.
1. Sequential Scan (Seq Scan)
The database reads the entire table from beginning to end.
- When it’s used: When fetching a large portion of the table (typically > 5-10%), or when the table is small enough that reading it all is faster than random index lookups.
- Pros: Efficient sequential I/O (can read many pages at once).
- Cons: Reads unnecessary data if you only need a few rows.
2. Index Scan (Index Scan)
The database traverses a B-Tree (or other) index to find pointers to the matching rows in the table (Heap), then fetches the row data.
- When it’s used: When the query selects a small fraction of rows (high selectivity) and an index exists on the filter column.
- Pros: Minimizes I/O by only reading relevant pages.
- Cons: Random I/O is slower than sequential I/O per page. Requires jumping between Index and Heap.
3. Index Only Scan (Index Only Scan)
Similar to an Index Scan, but the data needed is already present in the index itself. Postgres doesn’t need to visit the Heap table.
- Condition: All requested columns must be in the index (or “covering” index), and the Visibility Map must show that the page is visible to all transactions.
- Pros: Extremely fast; avoids Heap access entirely.
4. Bitmap Heap Scan (Bitmap Index Scan + Bitmap Heap Scan)
A hybrid approach used when the number of rows is too large for a pure Index Scan (too much random I/O) but too small for a Seq Scan.
- Bitmap Index Scan: Scans the index and builds a bitmap of pages that contain matching rows. This sorts the page requests by physical location.
- Bitmap Heap Scan: Reads the table pages listed in the bitmap in sequential order, avoiding random jumps.
5. Visualizing the Tree
Imagine a query that joins two tables and sorts the result. The plan is a tree where data flows from the leaves (bottom) to the root (top).
- Seq Scans: Fetch data from
usersandorders. - Hash Join: Matches rows from both tables.
- Sort: Orders the final result.
6. Interactive: The Cost Estimator
See how the Query Planner decides between a Sequential Scan and an Index Scan.
In this simulation, we use a simplified cost model:
- Sequential Scan Cost: 1.0 per page (Sequential I/O is fast).
- Index Scan Cost: 4.0 per page (Random I/O is slow) + CPU overhead.
Adjust the sliders to see the “tipping point” where Postgres switches strategies.
The Cost Estimator
Adjust the parameters to see how the planner estimates costs.
[!TIP] Why not always use an index? Random I/O (jumping around the disk) is significantly slower than Sequential I/O. If you need 50% of the table, reading the whole thing sequentially is often faster than jumping back and forth 50% of the time! The “tipping point” is often around 5-20% selectivity depending on your hardware (SSD vs HDD).
7. Next Steps
Now that you can read a plan, we need to understand how Postgres joins data together. Check out Join Algorithms.