MongoDB Glossary
## A
### Accumulator
An operator used in the `$group` stage to calculate values across a group of documents. Examples include `$sum`, `$avg`, `$max`, `$min`, `$push`, and `$addToSet`.
### Aggregation Framework
A pipeline-based data processing framework in MongoDB. It allows you to transform, filter, and group documents to perform complex analytics. Similar to SQL `GROUP BY` but more powerful.
### Arbiter
A replica set member that does not hold data. Its only function is to participate in elections to ensure a majority vote. Arbiters are used to maintain a quorum without the cost of storing an extra copy of the data.
### Atlas
MongoDB's fully managed cloud database service (DBaaS). It runs on AWS, Azure, and Google Cloud.
## B
### Balancer
A background process that monitors the number of chunks on each shard. If the distribution is uneven, it migrates chunks from shards with more chunks to shards with fewer chunks.
### BSON
**Binary JSON**. The binary-encoded serialization format used to store documents and make remote procedure calls in MongoDB. It supports more data types than JSON (e.g., Date, ObjectId, Binary data).
## C
### Cardinality
The number of elements in a set or other grouping, often used to describe the "many" side of a relationship (e.g., One-to-Few vs. One-to-Squillions).
### Chunk
A contiguous range of shard key values. MongoDB partitions sharded data into chunks and distributes them across shards. The default chunk size is 64 MB.
### Collection
A grouping of MongoDB documents. Equivalent to an RDBMS **Table**. Collections exist within a single database and do not enforce a schema by default.
### Compass
The official GUI for MongoDB. It allows you to visually explore data, run queries, and optimize performance.
### Config Server
A `mongod` instance that stores the metadata for a sharded cluster. It maps chunks of data to specific shards.
### Cursor
A pointer to the result set of a query. Clients can iterate through a cursor to retrieve results.
## D
### Denormalization
The process of optimizing read performance by adding redundant data or grouping data (Embedding). In MongoDB, this means storing related data in a single document to avoid joins.
### Document
A record in a MongoDB collection and the basic unit of data. Documents are analogous to JSON objects but exist as BSON. Equivalent to an RDBMS **Row**.
## E
### Election
The process by which members of a replica set determine which node will become the Primary. Elections occur during initialization or when the current Primary becomes unavailable.
## F
### Field
A key-value pair in a document. A document has zero or more fields. Fields are analogous to columns in a relational database.
## I
### Index
A data structure that improves the speed of data retrieval operations on a database table. MongoDB uses B-Tree indexes.
## J
### Journaling
A write-ahead logging mechanism used by the WiredTiger storage engine to ensure durability. Writes are first recorded in the journal before being applied to the data files, allowing recovery after a crash.
### JSON Schema
A vocabulary that allows you to annotate and validate JSON documents. MongoDB uses it to enforce schema validation rules.
## M
### mongod
The primary daemon process for the MongoDB system. It handles data requests, manages data access, and performs background management operations.
### Mongos
The query router in a sharded cluster. It acts as the interface between client applications and the sharded cluster. Clients connect to `mongos`, which routes queries to the appropriate shards.
## O
### ObjectId
A 12-byte BSON type used as the default value for the `_id` field. It consists of a timestamp, random value, and counter, ensuring uniqueness across distributed systems.
### Oplog (Operations Log)
A special capped collection (`local.oplog.rs`) that keeps a rolling record of all operations that modify the data. Secondaries replicate the Primary by tailing and replaying the oplog.
## P
### Pipeline
A series of stages that documents pass through in the Aggregation Framework. Each stage transforms the documents as they pass through.
### Primary
The single member in a replica set that receives all write operations. It records changes in its oplog.
## R
### Read Concern
An option that controls the consistency and isolation properties of the data read from a replica set. Levels include `local`, `available`, `majority`, `linearizable`, and `snapshot`.
### Read Preference
A setting that determines which member of a replica set the driver should read from. Options include `primary` (default), `primaryPreferred`, `secondary`, `secondaryPreferred`, and `nearest`.
### Replica Set
A group of `mongod` processes that maintain the same data set. Replica sets provide redundancy and high availability. Consists of one Primary and multiple Secondaries.
## S
### Schema Validation
A feature that allows you to define rules for document structure and data types, rejecting writes that do not conform.
### Secondary
A member of a replica set that replicates the data from the Primary. Secondaries can serve read operations (if configured) but cannot accept writes.
### Shard
A single replica set that holds a subset of the data in a sharded cluster.
### Shard Key
The field or fields used to partition a collection's documents across shards. The choice of shard key is critical for performance and scalability.
### Sharding
The process of storing data records across multiple machines. It is MongoDB's approach to meeting the demands of data growth (Horizontal Scaling).
### Stage
A single operation in an aggregation pipeline, such as `$match`, `$group`, or `$project`. Stages process documents and pass the results to the next stage.
## V
### View
A read-only queryable object created from an aggregation pipeline on a source collection.
## W
### WiredTiger
The default storage engine for MongoDB (since 3.2). It provides document-level concurrency, compression, and data integrity (via journaling).
### Write Amplification
The phenomenon where a small logical update results in a large physical write to disk. In MongoDB, updating a field in a document requires rewriting the entire document if it grows or moves on disk.
### Write Concern
The level of acknowledgement requested from MongoDB for write operations. (e.g., `w: 1` means acknowledge after writing to Primary; `w: majority` means acknowledge after writing to a majority of replicas).
## Z
### Zone Sharding
A feature that allows administrators to associate ranges of shard key values with specific shards (zones). This is often used for geographic data distribution (e.g., keeping EU data on EU servers).
## Operators
### $bucket
Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries.
### $facet
Processes multiple aggregation pipelines within a single stage on the same set of input documents.
### $group
Groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group.
### $limit
Limits the number of documents passed to the next stage in the pipeline.
### $lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the "joined" collection for processing.
### $match
Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage.
### $project
Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
### $sort
Reorders the document stream by a specified sort key.
### $unwind
Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.