Node Operations: Lifecycle Management

Managing a Cassandra cluster involves more than just reading and writing data. You must be able to scale the cluster out (add nodes), scale it in (remove nodes), and maintain data consistency (repair). This chapter dives deep into the lifecycle of a Cassandra node and the tools you need to manage it.

1. The Node Lifecycle

A Cassandra node goes through several states during its life. Understanding these states is crucial for operational stability.

Cluster Topology Simulator

Simulate adding and removing nodes to see how token ranges and data ownership change.

> Cluster initialized with 3 nodes.

2. Adding a Node (Bootstrap)

Adding a new node to a cluster is known as bootstrapping. This process ensures the new node receives the data it is responsible for based on its token assignment.

The Bootstrap Process: First Principles

Why is bootstrapping necessary? In a distributed hash ring, each node owns a specific range of tokens. When a new node joins, it splits a range, taking ownership of half (or a specific portion). The data for this new range currently resides on existing nodes.

  1. Configuration: The new node is configured with cluster_name, seeds (must NOT include itself), and auto_bootstrap: true.
  2. Gossip Handshake: The node starts and gossips with seed nodes to learn the cluster topology (schema version, peer list).
  3. Token Selection: It selects a token (or is assigned one via initial_token) to balance the ring.
  4. Streaming (The Heavy Lift):
    • The new node calculates which ranges it is now responsible for.
    • It contacts the current owners of those ranges.
    • Existing nodes stream the relevant SSTables to the new node.
  5. State Transition: JOININGUP.

[!WARNING] Do not make a new node a seed node until it has fully bootstrapped. Seed nodes are gossip contact points; if a seed is bootstrapping, it cannot effectively help others join.

3. Removing a Node (Decommission)

Removing a node cleanly is done via decommissioning.

Decommission vs. Removenode

  • Decommission (nodetool decommission): The polite way to leave. The node is still online and streams its data to the new owners of its token ranges (its neighbors). Use this for planned scale-in.
  • Removenode (nodetool removenode): The aggressive way. Use this only when a node is dead (hardware failure) and cannot be brought back. It tells the cluster to forget the node and replicate its data elsewhere from existing replicas.

4. Repair (Anti-Entropy)

Cassandra is an AP system (Available, Partition Tolerant), which means data can become inconsistent across replicas (e.g., dropped mutations during downtime, network partitions). Repair ensures eventual consistency.

How Repair Works: Merkle Trees

Repair doesn’t blindly copy all data. That would be O(N) transfer. Instead, it uses Merkle Trees to optimize.

  1. Tree Construction: Each node builds a hash tree (Merkle Tree) of its data for a specific token range. Leaves are hashes of data blocks.
  2. Tree Exchange: Nodes exchange these small trees (kilobytes in size, not gigabytes).
  3. Comparison: The trees are compared. If the root hash matches, the data is identical. If not, they traverse down to find the specific leaf that differs.
  4. Streaming: Only the data corresponding to the mismatched leaves is streamed.

Full vs. Incremental Repair

  • Full Repair: Checks all data. Expensive and IO-intensive.
  • Incremental Repair: Only repairs data written since the last repair. Much faster but requires careful management of SSTable marking.

5. Operational Tools (Code Examples)

While nodetool is the standard CLI, you can automate these operations using JMX (Java Management Extensions) or sidecar wrappers.

import javax.management.MBeanServerConnection;
import javax.management.ObjectName;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;
import java.io.IOException;

public class NodeManager {
    // Default Cassandra JMX port is 7199
    private static final String JMX_URL = "service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi";
    private static final String SS_MBEAN = "org.apache.cassandra.db:type=StorageService";

    public static void main(String[] args) throws Exception {
        JMXServiceURL url = new JMXServiceURL(JMX_URL);
        // Connect to JMX
        try (JMXConnector jmxc = JMXConnectorFactory.connect(url, null)) {
            MBeanServerConnection mbsc = jmxc.getMBeanServerConnection();
            ObjectName ssName = new ObjectName(SS_MBEAN);

            // 1. Check Operation Mode (NORMAL, JOINING, LEAVING, DECOMMISSIONED)
            String operationMode = (String) mbsc.getAttribute(ssName, "OperationMode");
            System.out.println("Current Mode: " + operationMode);

            // 2. Trigger Decommission (Blocking Operation)
            // This is equivalent to `nodetool decommission`
            // System.out.println("Decommissioning node...");
            // mbsc.invoke(ssName, "decommission", null, null);

            // 3. Force Keyspace Cleanup
            // Equivalent to `nodetool cleanup`
            // mbsc.invoke(ssName, "forceKeyspaceCleanup",
            //    new Object[]{0, null},
            //    new String[]{"int", "[Ljava.lang.String;"}
            // );
        }
    }
}
package main

import (
    "fmt"
    "net/http"
    "io/ioutil"
    "encoding/json"
    "strings"
)

// Go does not support JMX natively. To manage Cassandra from Go,
// the industry standard is to use a sidecar like Jolokia or Medusa.
// This example assumes Jolokia is running on port 8778.

const jolokiaURL = "http://localhost:8778/jolokia"

func getNodeStatus() {
    // Requesting StorageService OperationMode via HTTP POST
    reqBody := `{"type":"read","mbean":"org.apache.cassandra.db:type=StorageService","attribute":"OperationMode"}`

    resp, err := http.Post(jolokiaURL, "application/json", strings.NewReader(reqBody))
    if err != nil {
        // Handle error in production
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := ioutil.ReadAll(resp.Body)
    var result map[string]interface{}
    json.Unmarshal(body, &result)

    // Parse value from Jolokia response
    if val, ok := result["value"]; ok {
        fmt.Printf("Node Operation Mode: %v\n", val)
    } else {
        fmt.Println("Could not retrieve status")
    }
}

func main() {
    // Ensure the Jolokia agent is installed in cassandra-env.sh
    // JVM_OPTS="$JVM_OPTS -javaagent:/path/to/jolokia-jvm-agent.jar=port=8778,host=localhost"
    getNodeStatus()
}

6. Cleanup and Compaction

After adding a node, existing nodes still hold data that now belongs to the new node. Running nodetool cleanup removes these keys, reclaiming disk space.

  • Cleanup: Removes data that the node no longer owns (crucial after scaling out).
  • Compaction: Merges SSTables, evicting tombstones and improving read performance.

[!TIP] Always run nodetool cleanup on existing nodes after a new node has successfully joined and the cluster is stable. This frees up disk space.

7. Diagram: Data Streaming

A Owner B New Node C Stream SSTables

Figure 1: Node A streaming data to the new Node B during bootstrap. The red dotted line indicates the node is in JOINING state.