The Life of a Write: From RAM to Disk
[!NOTE] This module explores the core principles of The Life of a Write: From RAM to Disk, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Hook: “Near Real-Time” (NRT)
You write a document. You search for it immediately. It’s not there. 1 second later, it appears. Why? Because Disk I/O is expensive, so Elasticsearch cheats.
The Restaurant Analogy (Mnemonic)
Think of Elasticsearch as a busy restaurant kitchen:
- Memory Buffer (The Order Pad): The waiter scribbles down your order. It’s fast, but if the waiter drops the pad (server crash), your order is lost. It’s also not yet being cooked (not searchable).
- Translog (The Carbon Copy): As soon as the waiter writes your order, a carbon copy is instantly sent to the manager’s safe. If the waiter drops their pad, the manager can recover the order from the safe.
- Refresh (The Kitchen Prep): Every 1 second, the kitchen takes all orders from the pad, chops the veggies, and puts them in the pan (Lucene Segment). Now the food is actually cooking (searchable).
- Flush (Serving the Dish): Every 30 minutes, the cooked food is finally served to the customer (fsync to disk), and the carbon copies in the manager’s safe are thrown away (Translog cleared).
2. The Write Path (Step-by-Step)
Step 1: The Memory Buffer
When you POST /index/_doc/1, the document is written to the In-Memory Buffer.
- It is NOT yet searchable.
- It is NOT yet safe (if power fails, it’s gone).
Step 2: The Translog (Safety)
Simultaneously, the document is appended to the Translog (Transaction Log) on disk.
- Hardware Reality (
fsync): By default, Elasticsearch issues anfsyncsystem call after every request (or every 5 seconds ifdurabilityis set toasync).fsyncforces the OS to bypass its caches and write directly to the physical storage device. - Purpose (Crash Recovery): If the server loses power, any un-flushed segments in the Filesystem Cache are wiped out. Upon reboot, Elasticsearch replays the Translog to reconstruct the missing segments.
Step 3: Refresh (Searchability)
Every 1 second (default), the Memory Buffer is cleared and written to a new Lucene Segment in the Filesystem Cache.
- Now it is searchable.
- Hardware Reality: Creating a segment only writes to RAM (OS cache). It avoids the expensive disk
fsyncsystem call. This is why Elasticsearch is “Near Real-Time” instead of “Real-Time”. - Immutability: Once a segment is written, it is strictly immutable. You cannot modify it. Updates simply write a new document and mark the old one as deleted.
Step 4: Flush (Persistence)
Every 30 minutes (or when the Translog reaches its maximum size, default 512MB), a Flush happens:
- All data sitting in the Filesystem Cache is finally
fsynced down to the physical spinning disk (or SSD). - A commit point is created.
- The Translog is cleared, as the data is now safely permanent on disk.
3. Visualizing the Path
4. Tuning for Performance
- Heavy Indexing? Increase
refresh_intervalfrom1sto30s. You lose 30s of “real-time”, but gain massive CPU throughput (fewer segments created). - Data Safety? Change
index.translog.durabilitytoasyncfor faster writes (risk losing 5s of data).