Elasticsearch Glossary

[!NOTE] This module explores the core terminology of Elasticsearch, deriving definitions from first principles and hardware constraints to build world-class, production-ready expertise.

B

Buffer (Memory Buffer): An in-memory data structure where incoming documents are initially stored before being written to a new Lucene segment. Data in the buffer is not yet searchable.

C

Cluster: A collection of one or more connected Elasticsearch nodes that together hold all of the data and provide federated indexing and search capabilities across all nodes.

F

Flush: The process of performing a Lucene commit, which forcefully writes all data in the filesystem cache to persistent disk and clears the transaction log (Translog).

I

Inverted Index: A highly optimized data structure used by search engines that maps individual terms (words) back to the list of documents (or row IDs) that contain them, enabling O(1) lookups instead of O(N) table scans.

L

Lucene: The high-performance, open-source text search engine library written in Java that powers Elasticsearch under the hood. Every Elasticsearch shard is a single, complete Lucene index.

N

Node: A single running instance of Elasticsearch (a JVM process). A node belongs to a cluster, stores data (shards), and participates in the cluster’s indexing and search operations.

R

Refresh: The operation that moves documents from the in-memory buffer into a new Lucene segment (in the filesystem cache), making the newly indexed documents searchable in near real-time.

Replica Shard: An exact copy of a primary shard. Replicas provide high availability (failover) and increase read (search) throughput by serving queries alongside the primary shard.

S

Segment: An immutable file on disk within a shard that holds a mini-inverted index. A shard consists of multiple segments, and search queries must check every segment and merge the results.

Shard (Primary Shard): A horizontal partition of an Elasticsearch index. Each shard is a self-contained, fully functional Lucene index that can be hosted on any node within the cluster, enabling horizontal scalability.

T

Translog (Transaction Log): A sequentially appended log file on disk that records every write operation. It ensures data durability before a flush occurs, allowing Elasticsearch to recover uncommitted data after a crash.