Execution Plans

[!NOTE] This module explores the core principles of Execution Plans, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Introduction to Query Planning

When you submit a SQL query to Postgres, it doesn’t just execute it blindly. Instead, the Query Planner (or Optimizer) analyzes your query and determines the most efficient way to retrieve the requested data. The result of this process is an Execution Plan.

Understanding how to read and interpret these plans is the single most important skill for optimizing Postgres performance. It tells you how the database is executing your query, allowing you to identify bottlenecks like missing indexes, inefficient joins, or poor statistics.

2. The EXPLAIN Command

The tool we use to view these plans is the EXPLAIN command.

Basic Syntax

EXPLAIN SELECT * FROM users WHERE age > 25;

This returns a text description of the plan without actually running the query.

EXPLAIN ANALYZE

To see what actually happens when the query runs, use EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT * FROM users WHERE age > 25;

[!WARNING] EXPLAIN ANALYZE executes the query. Be careful when running this with DELETE, UPDATE, or INSERT statements, as data modifications will be committed! Use a transaction with ROLLBACK to test safely:

BEGIN;
EXPLAIN ANALYZE DELETE FROM users WHERE last_login < '2020-01-01';
ROLLBACK;

Output Formats

You can request the plan in different formats like JSON, YAML, or XML for easier parsing by tools:

EXPLAIN (FORMAT JSON) SELECT * FROM users;

3. Anatomy of a Plan

A query plan is a tree structure of nodes. Each node represents an operation (like a table scan, a join, or a sort). The bottom nodes fetch data, and the upper nodes process it.

Here is a typical output:

QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..18.50 rows=850 width=36)
   Filter: (age > 25)

Key Metrics

Every node in the plan provides key estimates:

  1. Cost (cost=0.00..18.50):
    • Start-up Cost (0.00): The cost to fetch the first row. For a sequential scan, this is usually 0. For a sort, it’s high because all rows must be processed before the first one is returned.
    • Total Cost (18.50): The estimated cost to fetch all rows. The unit is arbitrary (disk page fetches), where 1.0 is typically the cost of reading one sequential page.
  2. Rows (rows=850):
    • The estimated number of rows this node will output. This is derived from table statistics.
  3. Width (width=36):
    • The estimated average size (in bytes) of each row output by this node.

4. Scan Types

The way Postgres retrieves data from a table is called a Scan.

1. Sequential Scan (Seq Scan)

The database reads the entire table from beginning to end.

  • When it’s used: When fetching a large portion of the table (typically > 5-10%), or when the table is small enough that reading it all is faster than random index lookups.
  • Pros: Efficient sequential I/O (can read many pages at once).
  • Cons: Reads unnecessary data if you only need a few rows.

2. Index Scan (Index Scan)

The database traverses a B-Tree (or other) index to find pointers to the matching rows in the table (Heap), then fetches the row data.

  • When it’s used: When the query selects a small fraction of rows (high selectivity) and an index exists on the filter column.
  • Pros: Minimizes I/O by only reading relevant pages.
  • Cons: Random I/O is slower than sequential I/O per page. Requires jumping between Index and Heap.

3. Index Only Scan (Index Only Scan)

Similar to an Index Scan, but the data needed is already present in the index itself. Postgres doesn’t need to visit the Heap table.

  • Condition: All requested columns must be in the index (or “covering” index), and the Visibility Map must show that the page is visible to all transactions.
  • Pros: Extremely fast; avoids Heap access entirely.

4. Bitmap Heap Scan (Bitmap Index Scan + Bitmap Heap Scan)

A hybrid approach used when the number of rows is too large for a pure Index Scan (too much random I/O) but too small for a Seq Scan.

  1. Bitmap Index Scan: Scans the index and builds a bitmap of pages that contain matching rows. This sorts the page requests by physical location.
  2. Bitmap Heap Scan: Reads the table pages listed in the bitmap in sequential order, avoiding random jumps.

5. Visualizing the Tree

Imagine a query that joins two tables and sorts the result. The plan is a tree where data flows from the leaves (bottom) to the root (top).

Sort Hash Join Seq Scan Table: users Seq Scan Table: orders
  1. Seq Scans: Fetch data from users and orders.
  2. Hash Join: Matches rows from both tables.
  3. Sort: Orders the final result.

6. Interactive: The Cost Estimator

See how the Query Planner decides between a Sequential Scan and an Index Scan.

In this simulation, we use a simplified cost model:

  • Sequential Scan Cost: 1.0 per page (Sequential I/O is fast).
  • Index Scan Cost: 4.0 per page (Random I/O is slow) + CPU overhead.

Adjust the sliders to see the “tipping point” where Postgres switches strategies.

The Cost Estimator

Adjust the parameters to see how the planner estimates costs.

Lower % means fewer rows selected.
Rows to Fetch:
50,000
Seq Scan Cost 1,000,000
Index Scan Cost 200,000
Winner: Index Scan

[!TIP] Why not always use an index? Random I/O (jumping around the disk) is significantly slower than Sequential I/O. If you need 50% of the table, reading the whole thing sequentially is often faster than jumping back and forth 50% of the time! The “tipping point” is often around 5-20% selectivity depending on your hardware (SSD vs HDD).

7. Next Steps

Now that you can read a plan, we need to understand how Postgres joins data together. Check out Join Algorithms.