B-Tree vs GIN vs GiST: Choosing the Right Index

Postgres is unique among relational databases for its extensible indexing system. While B-Tree covers 90% of use cases, understanding when to reach for GIN or GiST is what separates a database user from a database engineer.

The “Phonebook, Index, and Map” Analogy

Before diving into the technical structures, it helps to ground these concepts in physical equivalents:

B-Tree is the Phonebook: Ordered alphabetically. Perfect for finding a specific name (equality) or everyone whose name starts with “A” through “C” (range).
GIN is the Book Index: Located at the back of a textbook. Perfect for finding every page (row) that mentions a specific keyword (containment/intersection) across massive JSON documents or arrays.
GiST is the City Map: Helps you find spatial relationships. Perfect for answering questions like “What points are within this bounding box?” (nearest neighbor, overlap).

This chapter breaks down the internal structures of these indexes and provides a decision framework for choosing the correct one.

1. B-Tree: The Default Workhorse

If you run CREATE INDEX without specifying a type, you get a B-Tree (Balanced Tree). It is designed to handle equality and range queries on data that can be sorted.

How it Works

A B-Tree index maintains an ordered tree structure. The leaf nodes contain pointers (TIDs) to the actual table rows (Heap).

Self-Balancing: The tree depth remains uniform, ensuring O(log n) lookup time.
Ordered: Leaf nodes are linked via a Doubly Linked List, allowing efficient bidirectional range scans (BETWEEN, >, <) and ordered retrieval (ORDER BY).

Rigor: Why O(log N)?

B-Trees are “fat” trees. Unlike a binary search tree (branching factor of 2), a B-Tree node can have hundreds of children.

Fan-out: If a node holds 300 keys, the tree height grows very slowly.
Math: Height h ≈ log_fanout(N). For 1 billion rows with a fan-out of 360, the height is only 3 or 4 (360^4 > 10^9).
Conclusion: Almost every lookup requires only 3-4 random I/Os (pages), regardless of table size.

Hardware Reality: The Page Size

Postgres stores data in 8KB Pages.

Node Locality: A B-Tree internal node is designed to fit exactly into a single 8KB page.
Efficiency: One disk read brings in hundreds of child pointers. This maximizes the utility of CPU cache lines and minimizes expensive disk seeks.

When to Use B-Tree

Primary Keys & Unique Constraints: B-Trees are the only index type that can enforce uniqueness.
Equality: WHERE id = 55
Ranges: WHERE created_at > '2023-01-01'
Sorting: ORDER BY created_at DESC
Prefix Matching: WHERE name LIKE 'Alice%' (only if the locale supports it)

SQL Example

-- Standard B-Tree creation
CREATE INDEX idx_users_email ON users (email);

-- Supports:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email > 'm';
SELECT * FROM users ORDER BY email;

3. GIN: Generalized Inverted Index

GIN indexes are “Inverted Indexes,” similar to those used by search engines (like Elasticsearch). They are designed for composite values where a single row might contain multiple keys.

How it Works

Instead of storing the composite value (like a JSON document or an array) in the tree, GIN extracts the elements (keys in JSON, words in text) and stores them as keys in a B-Tree-like structure. Each key points to a list of TIDs that contain that element.

Structure: Element -> List[TID1, TID2, TID3...]
Compression: The TID lists are compressed using posting lists, making GIN very space-efficient for repetitive data.
Write Overhead: Inserting a new row with a JSON document might require updating many different GIN keys. Postgres uses a “Pending List” buffer to optimize this, but writes are still slower than B-Tree.
- War Story: A startup once attached a GIN index to a massive JSON payload column used for high-throughput logging. Write latency spiked by 300% because every inserted document contained 50+ keys, meaning a single INSERT triggered 50+ index updates. They resolved this by replacing the GIN index with targeted B-Tree indexes on specific, frequently-queried JSON paths (e.g., CREATE INDEX ON logs ((payload->>'status'))).

When to Use GIN

JSONB Containment: WHERE data @> '{"role": "admin"}'
Array Intersection: WHERE tags && ARRAY['urgent', 'todo']
Full Text Search: WHERE doc_vector @@ to_tsquery('postgres')
Trigram Matching: WHERE name LIKE '%ostgre%' (with pg_trgm extension)

SQL Example

-- Create a table with JSONB data
CREATE TABLE logs (
    id SERIAL PRIMARY KEY,
    payload JSONB
);

-- Create a GIN index on the JSONB column
CREATE INDEX idx_logs_payload ON logs USING GIN (payload);

-- Supports containment queries:
-- Find logs where payload contains key "level" with value "error"
SELECT * FROM logs
WHERE payload @> '{"level": "error"}';

5. GiST: Generalized Search Tree

GiST is not a single specific data structure but a framework for building them. It allows indexing of data types that cannot be strictly ordered (like 2D geometric shapes).

How it Works

GiST is a balanced tree, but instead of strict separation (all values < X go left), it uses Lossy Signatures or Bounding Boxes.

Overlapping: In a spatial index (R-Tree), parent nodes contain a bounding box that covers all children. These boxes can overlap.
Search: The search might need to traverse multiple branches of the tree if the query rectangle overlaps with multiple node boundaries.
Lossy: GiST indexes might return “false positives” which the engine must filter out by checking the actual heap tuple.

When to Use GiST

Geometric Data (PostGIS): “Find all points within 5km of this location.”
Range Types: “Find all reservations that overlap with this time period.” (tsrange, daterange)
Nearest Neighbor: ORDER BY location <-> point (K-NN search).
Full Text Search: An alternative to GIN. Faster to build, smaller on disk, but slower reads.

SQL Example

-- Enable the btree_gist extension for mixing scalar and geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE reservations (
    id SERIAL PRIMARY KEY,
    room_id INT,
    span TSRANGE
);

-- Create a GiST index to prevent overlapping bookings (Exclusion Constraint)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING GIST (
    room_id WITH =,
    span WITH &&
);

7. Comparison Summary

Feature	B-Tree	GIN	GiST
Primary Use	Scalars (Int, Text, Date)	Composites (JSON, Array, Text)	Geometric, Ranges, Custom
Query Types	`=`, `<`, `>`, `ORDER BY`	`@>`, `?`, `&&`, `@@`	`<<`, `&<`, `&&`, `<->`
Ordering	Yes (Can retrieve sorted)	No	No (except K-NN distance)
Uniqueness	Yes	No	No (supports Exclusion)
Write Speed	Fast	Slow (High overhead)	Moderate
Read Speed	Very Fast	Fast	Moderate
Size	Small	Compact (Compressed)	Moderate

Note

This module explores the core principles of B-Tree vs GIN vs GiST, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

9. Visualizing Structures

B-Tree Structure (Depth)

[ 50 ]

↙ ↘

[ 10 - 49 ]

[ 51 - 100 ]

↙ ↘ ↙ ↘

Data

GIN Structure (Inverted)

Key

Posting List (TIDs)

"admin"

[1, 5, 8, 12, ...]

"editor"

[2, 5, 9, 20, ...]

"viewer"

[3, 4, 6, 7, ...]

10. Interactive: Index Selector

Select your data type and query pattern below to see the recommended index strategy and implementation code.

1. Data Type

B-Tree vs GIN vs GiST

B-Tree vs GIN vs GiST: Choosing the Right Index

The “Phonebook, Index, and Map” Analogy

1. B-Tree: The Default Workhorse

How it Works

Rigor: Why O(log N)?

Hardware Reality: The Page Size

When to Use B-Tree

SQL Example

3. GIN: Generalized Inverted Index

How it Works

When to Use GIN

SQL Example

5. GiST: Generalized Search Tree

How it Works

When to Use GiST

SQL Example

7. Comparison Summary

9. Visualizing Structures

10. Interactive: Index Selector

Found this lesson helpful?