Streaming Replication: The Physical Clone
Most databases don’t die from bad code; they die from disk failure or datacenter outages.
Streaming Replication is Postgres’s primary defense mechanism. It creates an exact, byte-for-byte copy (a “physical clone”) of your database on another server.
In this chapter, we will dissect the Write-Ahead Log (WAL) to understand how Postgres achieves near-real-time replication, and we’ll build a mental model of the Primary-Standby relationship.
2. The Genesis: It’s All About the WAL
To understand replication, you must first respect the WAL (Write-Ahead Log).
Hardware Reality: The Disk Speed Limit
Writing to random locations on a disk (random I/O) is slow. Writing sequentially to the end of a file (sequential I/O) is fast.
Postgres exploits this physics. When you commit a transaction, Postgres does not immediately update the table data files (the "Heap"). Instead, it appends the change to the WAL.
- WAL: A sequential log of every single byte change.
- Heap: The actual table data, updated later in the background (by the "Checkpointer").
Key Insight: If you have the WAL, you can reconstruct the database state at any point in time. Replication is simply shipping these WAL records to another machine.
The Replication Pipeline
- Client sends
INSERTto Primary. - Primary writes change to local WAL.
- WAL Sender process reads WAL and sends it over network.
- WAL Receiver process on Standby receives data and writes to local WAL.
- Startup Process on Standby reads local WAL and applies changes to data files (Redo).
4. Interactive: WAL Transmission Simulator
Visualise how WAL records flow from the Primary to the Standby. Notice how Replication Slots prevent the Primary from deleting old WAL files that the Standby hasn’t received yet.
6. Configuration: Wiring it Up
Setting up streaming replication requires configuring the Primary to send WALs and the Standby to receive them.
Primary Configuration (postgresql.conf)
## 7. Turn on the faucet
wal_level = replica # Write enough data to WAL for replication
## 8. Grant access (pg_hba.conf)
## 9. host replication replicator 10.0.0.2/32 md5
Standby Configuration (postgres.auto.conf)
In modern Postgres (12+), recovery.conf is gone. You configure connection settings in postgres.auto.conf or postgresql.conf.
primary_conninfo = 'host=10.0.0.1 port=5432 user=replicator password=secret'
restore_command = 'cp /mnt/wal_archive/%f %p' # Optional: Fetch from archive if streaming fails
[!TIP] Standby Signal: Create an empty file named
standby.signalin the data directory to tell Postgres “I am a standby, do not accept writes.”
8. Replication Slots: The Safety Net
Without a Replication Slot, the Primary is oblivious to the Standby’s needs.
If the Standby falls behind (e.g., network outage for 2 hours) and the Primary rotates its WAL files (overwriting old ones), the Standby is doomed. It can no longer catch up because the history it needs is deleted.
With a Slot, the Primary says: “I will keep this WAL file on disk forever until the Standby confirms it has received it.”
Creating a Slot
SELECT pg_create_physical_replication_slot('standby_1_slot');
[!WARNING] The Danger of Slots: If a Standby dies and never comes back, the Slot remains active. The Primary will retain WAL files forever, eventually filling up the disk and crashing the Primary. Always monitor your slots!
10. Sync vs Async: The Trade-off
By default, replication is Asynchronous.
- Primary writes to WAL.
- Primary commits to client.
- Primary sends WAL to Standby (background).
Risk: If Primary crashes immediately after (2), the Standby might not have received the WAL yet. Data Loss.
Synchronous Replication fixes this:
- Primary writes to WAL.
- Primary sends WAL to Standby.
- Standby confirms receipt.
- Primary commits to client.
Cost: Every transaction latency = Network RTT + Standby Disk I/O. If Standby dies, Primary freezes.
Enabling Sync Rep
## 14. postgresql.conf
synchronous_standby_names = 'standby_1'
12. Monitoring Lag
You can query the Primary to see how far behind the Standby is.
SELECT
client_addr,
state,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
- write_lag: Time since WAL was written vs sent.
- flush_lag: Time since WAL was sent vs written to Standby disk.
- replay_lag: Time since WAL was written to disk vs applied to DB.
[!NOTE] This module explores the core principles of Streaming Replication, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.