TimescaleDB: The Time Machine

Postgres is an amazing general-purpose database. But when you insert millions of metrics per second (IoT, financial ticks, logs), the standard B-Tree index becomes a bottleneck, and deleting old data (DELETE FROM table WHERE time < X) causes massive Table Bloat.

TimescaleDB solves this by automatically partitioning your data across time and space, while still looking like a single standard SQL table.

1. The Genesis: The “DELETE” Problem

In a standard Postgres table, deleting old data is expensive.

MVCC: Postgres doesn’t physically delete rows immediately. It marks them as “dead”.
Vacuum: The Autovacuum process must scan the table to reclaim space.
Fragmentation: This leaves the heap and indexes fragmented (Swiss cheese holes), slowing down future inserts.

The Solution: Hypertables (Partitioning) Instead of one giant table, TimescaleDB creates a Hypertable. Physically, it splits data into small “chunks” based on time ranges (e.g., 1 day per chunk).

To delete old data, you don’t run DELETE. You simply drop the file (the chunk). DROP TABLE chunk_2023_01_01; is O(1) (instant) and creates zero bloat.

2. Interactive: Hypertable Chunk Visualizer

Visualize how data flows into chunks. As time progresses (X-axis), new chunks are created. When a chunk becomes “old” (based on retention policy), it is dropped instantly.

Control Panel

Simulate data ingestion over time.

Green Blocks: Active Chunks (Hot)

Blue Blocks: Compressed Chunks (Cold)

Grey Blocks: Dropped Chunks

Total Chunks: 0

3. Hardware Reality: Compression

Time-series data is highly repetitive.

Timestamp: 2023-10-01 10:00:01, 2023-10-01 10:00:02 (Delta encoding wins here)
Device ID: sensor-1, sensor-1 (Dictionary encoding wins here)
Value: 20.5, 20.6 (XOR / Gorilla encoding wins here)

TimescaleDB uses columnar compression on older chunks. Instead of storing row-by-row (Postgres default), it pivots data to column-by-column, applying specific algorithms like Gorilla (floating point) and Delta-of-Delta (timestamps). Result: 90-95% storage savings.

4. Code Implementation

A. Creating a Hypertable

```sql
-- 1. Create a standard table
CREATE TABLE sensor_data (
    time        TIMESTAMPTZ       NOT NULL,
    sensor_id   INTEGER           NOT NULL,
    temperature DOUBLE PRECISION  NULL,
    humidity    DOUBLE PRECISION  NULL
);

-- 2. Convert it to a Hypertable (partition by time)
SELECT create_hypertable('sensor_data', 'time');

-- 3. Enable Compression (Optional but recommended)
ALTER TABLE sensor_data SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id'
);

-- 4. Add Automatic Compression Policy (older than 7 days)
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
```

B. Continuous Aggregates

Real-time dashboards need “Average temperature per hour”, not raw data. Standard GROUP BY is slow on billions of rows. Continuous Aggregates pre-calculate this.

```java

public void queryHourlyAverage(Connection conn, int sensorId) throws SQLException {
    // Querying the Continuous Aggregate View is instant because it's pre-calculated
    String sql = """
        SELECT bucket, avg_temp
        FROM sensor_data_hourly
        WHERE sensor_id = ?
        ORDER BY bucket DESC LIMIT 24
    """;

    try (PreparedStatement pstmt = conn.prepareStatement(sql)) {
        pstmt.setInt(1, sensorId);
        ResultSet rs = pstmt.executeQuery();
        while (rs.next()) {
            System.out.println(rs.getTimestamp("bucket") + ": " + rs.getDouble("avg_temp"));
        }
    }
}

```

TimescaleDB: Time-Series