Module Review: Supercharging Postgres
We’ve explored how PostgreSQL transforms from a relational database into a specialized engine for Space, Time, and Scale.
1. Key Takeaways
- PostGIS (Space): Uses R-Trees (GiST) to index 2D space. B-Trees fail here because sorting 2D points linearly destroys spatial locality.
- TimescaleDB (Time): Uses Hypertables to partition data by time into Chunks. This makes deleting old data
O(1)(drop file) instead ofO(N)(delete rows + vacuum). - Citus (Scale): Uses Sharding to distribute data across multiple nodes. Co-location of data by a shared key (e.g.,
tenant_id) is crucial to avoid expensive network joins. - Hardware Reality:
- Space: R-Trees minimize disk seeks by grouping nearby points.
- Time: Sequential writes to recent chunks are cache-friendly.
- Scale: Network latency is the new bottleneck; minimize it with co-location.
2. Interactive Flashcards
What is the "Curse of Dimensionality" in indexing?
A standard B-Tree can only sort by one dimension effectively. Sorting by X makes Y look random, causing massive disk seeks for spatial queries. R-Trees solve this by grouping by area.
Why is DELETE slow in Postgres but fast in TimescaleDB?
Postgres DELETE marks rows as dead (MVCC) requiring VACUUM to reclaim space. TimescaleDB drops entire partitions (files), which is instant and creates no bloat.
What is Co-location in Citus?
Storing related rows (e.g., all data for Tenant 101) on the SAME node. This allows joins to happen locally without sending data over the network.
What is a Continuous Aggregate?
A materialized view in TimescaleDB that incrementally updates itself as new data arrives, making real-time analytics (e.g., hourly avg) instant.
Which index type does PostGIS use?
GiST (Generalized Search Tree), specifically implementing an R-Tree structure.
What is the difference between Geometry and Geography?
Geometry assumes a flat plane (Cartesian). Geography assumes a sphere (Earth). Geometry is faster; Geography is more accurate for long distances.
3. Cheat Sheet
| Feature | PostGIS | TimescaleDB | Citus |
|---|---|---|---|
| Primary Use | Spatial / Maps | Time-Series / IoT | Multi-tenant SaaS / Analytics |
| Core Structure | Geometry / R-Tree | Hypertable / Chunk | Distributed Table / Shard |
| Scaling Strategy | Better Indexing | Partitioning | Horizontal Sharding |
| Key Optimization | Bounding Box Pruning | Constraint Exclusion | Query Routing |
| Data Locality | Spatial (nearby points) | Temporal (nearby time) | Logical (same tenant) |
Essential Commands
PostGIS
-- Find nearest neighbors
ORDER BY location <-> my_loc LIMIT 5;
-- Point in Polygon
ST_Contains(polygon, point);
-- Distance within X meters
ST_DWithin(loc1, loc2, 1000);
TimescaleDB
-- Create Hypertable
SELECT create_hypertable('metrics', 'time');
-- Compression Policy
SELECT add_compression_policy('metrics', INTERVAL '7 days');
-- Retention Policy
SELECT add_retention_policy('metrics', INTERVAL '1 year');
Citus
-- Distribute Table
SELECT create_distributed_table('users', 'tenant_id');
-- Reference Table
SELECT create_reference_table('countries');