B-Tree vs GIN vs GiST: Choosing the Right Index

Postgres is unique among relational databases for its extensible indexing system. While B-Tree covers 90% of use cases, understanding when to reach for GIN or GiST is what separates a database user from a database engineer.

This chapter breaks down the internal structures of these indexes and provides a decision framework for choosing the correct one.

1. B-Tree: The Default Workhorse

If you run CREATE INDEX without specifying a type, you get a B-Tree (Balanced Tree). It is designed to handle equality and range queries on data that can be sorted.

How it Works

A B-Tree index maintains an ordered tree structure. The leaf nodes contain pointers (TIDs) to the actual table rows (Heap).

  • Self-Balancing: The tree depth remains uniform, ensuring O(log n) lookup time.
  • Ordered: Leaf nodes are linked via a Doubly Linked List, allowing efficient bidirectional range scans (BETWEEN, >, <) and ordered retrieval (ORDER BY).

Rigor: Why O(log N)?

B-Trees are “fat” trees. Unlike a binary search tree (branching factor of 2), a B-Tree node can have hundreds of children.

  • Fan-out: If a node holds 300 keys, the tree height grows very slowly.
  • Math: Height h &approx; log_fanout(N). For 1 billion rows with a fan-out of 360, the height is only 3 or 4 (360^4 > 10^9).
  • Conclusion: Almost every lookup requires only 3-4 random I/Os (pages), regardless of table size.

Hardware Reality: The Page Size

Postgres stores data in 8KB Pages.

  • Node Locality: A B-Tree internal node is designed to fit exactly into a single 8KB page.
  • Efficiency: One disk read brings in hundreds of child pointers. This maximizes the utility of CPU cache lines and minimizes expensive disk seeks.

When to Use B-Tree

  • Primary Keys & Unique Constraints: B-Trees are the only index type that can enforce uniqueness.
  • Equality: WHERE id = 55
  • Ranges: WHERE created_at > '2023-01-01'
  • Sorting: ORDER BY created_at DESC
  • Prefix Matching: WHERE name LIKE 'Alice%' (only if the locale supports it)

SQL Example

-- Standard B-Tree creation
CREATE INDEX idx_users_email ON users (email);

-- Supports:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email > 'm';
SELECT * FROM users ORDER BY email;

2. B-Tree vs GIN vs GiST: Choosing the Right Index

3. GIN: Generalized Inverted Index

GIN indexes are “Inverted Indexes,” similar to those used by search engines (like Elasticsearch). They are designed for composite values where a single row might contain multiple keys.

How it Works

Instead of storing the composite value (like a JSON document or an array) in the tree, GIN extracts the elements (keys in JSON, words in text) and stores them as keys in a B-Tree-like structure. Each key points to a list of TIDs that contain that element.

  • Structure: Element -> List[TID1, TID2, TID3...]
  • Compression: The TID lists are compressed using posting lists, making GIN very space-efficient for repetitive data.
  • Write Overhead: Inserting a new row with a JSON document might require updating many different GIN keys. Postgres uses a “Pending List” buffer to optimize this, but writes are still slower than B-Tree.

When to Use GIN

  • JSONB Containment: WHERE data @> '{"role": "admin"}'
  • Array Intersection: WHERE tags && ARRAY['urgent', 'todo']
  • Full Text Search: WHERE doc_vector @@ to_tsquery('postgres')
  • Trigram Matching: WHERE name LIKE '%ostgre%' (with pg_trgm extension)

SQL Example

-- Create a table with JSONB data
CREATE TABLE logs (
    id SERIAL PRIMARY KEY,
    payload JSONB
);

-- Create a GIN index on the JSONB column
CREATE INDEX idx_logs_payload ON logs USING GIN (payload);

-- Supports containment queries:
-- Find logs where payload contains key "level" with value "error"
SELECT * FROM logs
WHERE payload @> '{"level": "error"}';

4. B-Tree vs GIN vs GiST: Choosing the Right Index

5. GiST: Generalized Search Tree

GiST is not a single specific data structure but a framework for building them. It allows indexing of data types that cannot be strictly ordered (like 2D geometric shapes).

How it Works

GiST is a balanced tree, but instead of strict separation (all values < X go left), it uses Lossy Signatures or Bounding Boxes.

  • Overlapping: In a spatial index (R-Tree), parent nodes contain a bounding box that covers all children. These boxes can overlap.
  • Search: The search might need to traverse multiple branches of the tree if the query rectangle overlaps with multiple node boundaries.
  • Lossy: GiST indexes might return “false positives” which the engine must filter out by checking the actual heap tuple.

When to Use GiST

  • Geometric Data (PostGIS): “Find all points within 5km of this location.”
  • Range Types: “Find all reservations that overlap with this time period.” (tsrange, daterange)
  • Nearest Neighbor: ORDER BY location <-> point (K-NN search).
  • Full Text Search: An alternative to GIN. Faster to build, smaller on disk, but slower reads.

SQL Example

-- Enable the btree_gist extension for mixing scalar and geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE reservations (
    id SERIAL PRIMARY KEY,
    room_id INT,
    span TSRANGE
);

-- Create a GiST index to prevent overlapping bookings (Exclusion Constraint)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING GIST (
    room_id WITH =,
    span WITH &&
);

6. B-Tree vs GIN vs GiST: Choosing the Right Index

7. Comparison Summary

Feature B-Tree GIN GiST
Primary Use Scalars (Int, Text, Date) Composites (JSON, Array, Text) Geometric, Ranges, Custom
Query Types =, <, >, ORDER BY @>, ?, &&, @@ <<, &<, &&, <->
Ordering Yes (Can retrieve sorted) No No (except K-NN distance)
Uniqueness Yes No No (supports Exclusion)
Write Speed Fast Slow (High overhead) Moderate
Read Speed Very Fast Fast Moderate
Size Small Compact (Compressed) Moderate

8. B-Tree vs GIN vs GiST: Choosing the Right Index

[!NOTE] This module explores the core principles of B-Tree vs GIN vs GiST, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

9. Visualizing Structures

B-Tree Structure (Depth)
[ 50 ]
↙     ↘
[ 10 - 49 ]
[ 51 - 100 ]
↙ ↘   ↙ ↘
Data
Data
Data
Data

Balanced depth, direct pointers to Heap.

GIN Structure (Inverted)
Key
Posting List (TIDs)
"admin"
[1, 5, 8, 12, ...]
"editor"
[2, 5, 9, 20, ...]
"viewer"
[3, 4, 6, 7, ...]

Keys map to lists of IDs. Efficient for containment.

10. Interactive: Index Selector

Select your data type and query pattern below to see the recommended index strategy and implementation code.