B-Tree vs GIN vs GiST: Choosing the Right Index
Postgres is unique among relational databases for its extensible indexing system. While B-Tree covers 90% of use cases, understanding when to reach for GIN or GiST is what separates a database user from a database engineer.
The “Phonebook, Index, and Map” Analogy
Before diving into the technical structures, it helps to ground these concepts in physical equivalents:
- B-Tree is the Phonebook: Ordered alphabetically. Perfect for finding a specific name (equality) or everyone whose name starts with “A” through “C” (range).
- GIN is the Book Index: Located at the back of a textbook. Perfect for finding every page (row) that mentions a specific keyword (containment/intersection) across massive JSON documents or arrays.
- GiST is the City Map: Helps you find spatial relationships. Perfect for answering questions like “What points are within this bounding box?” (nearest neighbor, overlap).
This chapter breaks down the internal structures of these indexes and provides a decision framework for choosing the correct one.
1. B-Tree: The Default Workhorse
If you run CREATE INDEX without specifying a type, you get a B-Tree (Balanced Tree). It is designed to handle equality and range queries on data that can be sorted.
How it Works
A B-Tree index maintains an ordered tree structure. The leaf nodes contain pointers (TIDs) to the actual table rows (Heap).
- Self-Balancing: The tree depth remains uniform, ensuring
O(log n)lookup time. - Ordered: Leaf nodes are linked via a Doubly Linked List, allowing efficient bidirectional range scans (
BETWEEN,>,<) and ordered retrieval (ORDER BY).
Rigor: Why O(log N)?
B-Trees are “fat” trees. Unlike a binary search tree (branching factor of 2), a B-Tree node can have hundreds of children.
- Fan-out: If a node holds 300 keys, the tree height grows very slowly.
- Math: Height
h≈log_fanout(N). For 1 billion rows with a fan-out of 360, the height is only 3 or 4 (360^4 > 10^9). - Conclusion: Almost every lookup requires only 3-4 random I/Os (pages), regardless of table size.
Hardware Reality: The Page Size
Postgres stores data in 8KB Pages.
- Node Locality: A B-Tree internal node is designed to fit exactly into a single 8KB page.
- Efficiency: One disk read brings in hundreds of child pointers. This maximizes the utility of CPU cache lines and minimizes expensive disk seeks.
When to Use B-Tree
- Primary Keys & Unique Constraints: B-Trees are the only index type that can enforce uniqueness.
- Equality:
WHERE id = 55 - Ranges:
WHERE created_at > '2023-01-01' - Sorting:
ORDER BY created_at DESC - Prefix Matching:
WHERE name LIKE 'Alice%'(only if the locale supports it)
SQL Example
-- Standard B-Tree creation
CREATE INDEX idx_users_email ON users (email);
-- Supports:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email > 'm';
SELECT * FROM users ORDER BY email;
3. GIN: Generalized Inverted Index
GIN indexes are “Inverted Indexes,” similar to those used by search engines (like Elasticsearch). They are designed for composite values where a single row might contain multiple keys.
How it Works
Instead of storing the composite value (like a JSON document or an array) in the tree, GIN extracts the elements (keys in JSON, words in text) and stores them as keys in a B-Tree-like structure. Each key points to a list of TIDs that contain that element.
- Structure:
Element -> List[TID1, TID2, TID3...] - Compression: The TID lists are compressed using posting lists, making GIN very space-efficient for repetitive data.
- Write Overhead: Inserting a new row with a JSON document might require updating many different GIN keys. Postgres uses a “Pending List” buffer to optimize this, but writes are still slower than B-Tree.
- War Story: A startup once attached a GIN index to a massive JSON payload column used for high-throughput logging. Write latency spiked by 300% because every inserted document contained 50+ keys, meaning a single
INSERTtriggered 50+ index updates. They resolved this by replacing the GIN index with targeted B-Tree indexes on specific, frequently-queried JSON paths (e.g.,CREATE INDEX ON logs ((payload->>'status'))).
- War Story: A startup once attached a GIN index to a massive JSON payload column used for high-throughput logging. Write latency spiked by 300% because every inserted document contained 50+ keys, meaning a single
When to Use GIN
- JSONB Containment:
WHERE data @> '{"role": "admin"}' - Array Intersection:
WHERE tags && ARRAY['urgent', 'todo'] - Full Text Search:
WHERE doc_vector @@ to_tsquery('postgres') - Trigram Matching:
WHERE name LIKE '%ostgre%'(withpg_trgmextension)
SQL Example
-- Create a table with JSONB data
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
payload JSONB
);
-- Create a GIN index on the JSONB column
CREATE INDEX idx_logs_payload ON logs USING GIN (payload);
-- Supports containment queries:
-- Find logs where payload contains key "level" with value "error"
SELECT * FROM logs
WHERE payload @> '{"level": "error"}';
5. GiST: Generalized Search Tree
GiST is not a single specific data structure but a framework for building them. It allows indexing of data types that cannot be strictly ordered (like 2D geometric shapes).
How it Works
GiST is a balanced tree, but instead of strict separation (all values < X go left), it uses Lossy Signatures or Bounding Boxes.
- Overlapping: In a spatial index (R-Tree), parent nodes contain a bounding box that covers all children. These boxes can overlap.
- Search: The search might need to traverse multiple branches of the tree if the query rectangle overlaps with multiple node boundaries.
- Lossy: GiST indexes might return “false positives” which the engine must filter out by checking the actual heap tuple.
When to Use GiST
- Geometric Data (PostGIS): “Find all points within 5km of this location.”
- Range Types: “Find all reservations that overlap with this time period.” (
tsrange,daterange) - Nearest Neighbor:
ORDER BY location <-> point(K-NN search). - Full Text Search: An alternative to GIN. Faster to build, smaller on disk, but slower reads.
SQL Example
-- Enable the btree_gist extension for mixing scalar and geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE reservations (
id SERIAL PRIMARY KEY,
room_id INT,
span TSRANGE
);
-- Create a GiST index to prevent overlapping bookings (Exclusion Constraint)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING GIST (
room_id WITH =,
span WITH &&
);
7. Comparison Summary
| Feature | B-Tree | GIN | GiST |
|---|---|---|---|
| Primary Use | Scalars (Int, Text, Date) | Composites (JSON, Array, Text) | Geometric, Ranges, Custom |
| Query Types | =, <, >, ORDER BY |
@>, ?, &&, @@ |
<<, &<, &&, <-> |
| Ordering | Yes (Can retrieve sorted) | No | No (except K-NN distance) |
| Uniqueness | Yes | No | No (supports Exclusion) |
| Write Speed | Fast | Slow (High overhead) | Moderate |
| Read Speed | Very Fast | Fast | Moderate |
| Size | Small | Compact (Compressed) | Moderate |
This module explores the core principles of B-Tree vs GIN vs GiST, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
9. Visualizing Structures
10. Interactive: Index Selector
Select your data type and query pattern below to see the recommended index strategy and implementation code.