PostGIS: The Spatial Database
Most databases treat data as one-dimensional strings or numbers. But the real world is two-dimensional (or 3D/4D). PostGIS turns PostgreSQL into the world’s most advanced open-source spatial database.
It doesn’t just store coordinates; it understands geometry. It can answer questions like “What is the nearest coffee shop?” or “Is this point inside this park?” efficiently, even with millions of objects.
1. The Genesis: Why B-Trees Fail in 2D
Standard databases use B-Trees for indexing. B-Trees rely on total ordering: for any two items A and B, either A < B, A > B, or A = B.
This works for numbers (1 < 2) and strings (“apple” < “banana”).
But how do you sort 2D points? Is (1, 10) “greater” than (10, 1)?
- If you sort by X, points with similar Y but different X are far apart on disk.
- If you sort by Y, points with similar X are far apart.
This is the Curse of Dimensionality. A linear index (like a B-Tree) cannot preserve spatial locality in multiple dimensions.
The Solution: R-Trees (GiST)
PostGIS uses GiST (Generalized Search Tree) to implement R-Trees.
Instead of sorting, R-Trees group nearby objects into bounding boxes (Rectangles).
- Leaf Nodes: Actual geometries (polygons, lines, points).
- Internal Nodes: A bounding box that fully contains all its children.
To find “points inside a park”, the database checks: Does the park’s bounding box intersect the node’s bounding box? If no, it prunes the entire branch.
2. Interactive: Spatial Query Visualizer
Explore how an R-Tree organizes spatial data. Click to add points. The “Query Box” represents a search area (e.g., a viewport on a map). The green boxes show how the index groups points together.
Control Panel
Click on the canvas to add Points.
Green Boxes: R-Tree Nodes (Bounding Boxes)
Red Box: Your Query Window
Blue Points: Found in Query
3. Hardware Reality: Geometry vs Geography
PostGIS has two main data types:
- Geometry (
geometry): Cartesian plane. Flat earth. Calculations are fast (Pythagorean theorem). Unit: Degrees (usually). - Geography (
geography): Spheroid (round earth). Calculations are complex (Great Circle distance). Unit: Meters.
The Mathematics of Distance
Geometry (Euclidean): Distance = √((x2 - x1)2 + (y2 - y1)2)
Geography (Haversine/Geodesic):
Costly trigonometric functions (sin, cos, atan2).
[!TIP] Use
geometry(SRID 4326) for storage andgeographyonly when you need accurate meter calculations over long distances (like flight paths). For local city apps, projected geometry is often faster and accurate enough.
4. Code Implementation
How to store and query spatial data in Production.
A. Insert a Point (Java & Go)
First, ensure the extension is enabled:
CREATE EXTENSION IF NOT EXISTS postgis;
B. K-Nearest Neighbors (KNN)
The classic “Find 5 nearest restaurants” query.
The Naive Way (Slow):
Calculate distance to every row and sort. O(N log N).
SELECT * FROM places ORDER BY ST_Distance(location, my_loc) LIMIT 5;
The Index Way (Fast):
Use the <-> (distance) operator which GiST indexes support natively. O(K log N).
SELECT * FROM places ORDER BY location <-> my_loc LIMIT 5;