Hinted Handoff
In a distributed system, nodes fail. It’s not a matter of if, but when. Network glitches, GC pauses, or hardware restarts happen all the time.
If a node is down when a write comes in, does the write fail? In Cassandra, no. (Assuming you meet your Consistency Level requirements with the remaining nodes).
Cassandra uses a mechanism called Hinted Handoff to store the write temporarily and deliver it when the node comes back.
1. How it Works
- Write Request: The client sends a write to a Coordinator Node.
- Failure Detection: The coordinator tries to forward the write to all replicas (e.g., Nodes A, B, and C).
- The “Hint”: If Node C is down (or unresponsive), the coordinator writes a “hint” locally.
- Success Response: If the coordinator can still satisfy the requested Consistency Level (e.g.,
QUORUMis met by Nodes A and B), it returns SUCCESS to the client. The client has no idea Node C missed the write. - Handoff: When the coordinator detects (via Gossip) that Node C is back up, it “hands off” (replays) the stored hints to Node C.
What is a “Hint”?
A hint is not just a log entry. It is a specialized record stored in the hints directory on the coordinator. It contains:
- Target Node ID: The UUID of the node that missed the data.
- Hint ID: Unique identifier for the hint.
- Message Time: The timestamp of the original mutation.
- Mutation: The actual data (INSERT/UPDATE/DELETE) serialized as a blob.
[!NOTE] Hinted Handoff is an optimization for availability. It helps the cluster recover consistency faster after a temporary outage. It is NOT a replacement for permanent repairs (like Anti-Entropy Repair).
2. Interactive Simulator: The Handoff
Visualize how a hint is stored and replayed.
3. Constraints and Limits
Hinted Handoff is awesome, but it’s not magic.
- Time Limit: Hints are not stored forever. By default, hints are stored for 3 hours (
max_hint_window_in_msincassandra.yaml). If a node is down longer than this, the hints are discarded to save space. You must run a full repair (Anti-Entropy) when the node returns. - Space Limit: Hints are stored on the coordinator’s disk. If the coordinator runs out of disk space, it stops saving hints.
- Performance Impact: Replaying hints puts extra load on the coordinator.
4. Configuration & Code
While Hinted Handoff is a server-side feature, understanding its configuration is vital for operations.
cassandra.yaml Configuration
# Enable or disable hinted handoff
hinted_handoff_enabled: true
# Max time to store hints (default 3 hours)
max_hint_window_in_ms: 10800000
# Directory where hints are stored
hints_directory: /var/lib/cassandra/hints
Client-Side: Specifying Consistency
Wait, if hinted handoff makes writes “durable”, does ConsistencyLevel.ANY mean my data is safe?
NO!
ConsistencyLevel.ANY: Allows a write to succeed even if all replicas are down, as long as the coordinator can store a hint.- Danger: If the coordinator crashes before replaying the hint, DATA IS LOST.
- Recommendation: Never use
ANYfor important data. Use at leastONEorQUORUM.
Java Example: Custom Retry Policy
If a node is down and hinted handoff is triggered, the client usually sees a success. But if too many nodes are down (cannot meet CL), the client gets an UnavailableException. You can handle this with a Retry Policy.
Below is a complete implementation of a custom Retry Policy that logs the failure and decides whether to retry or rethrow.
import com.datastax.oss.driver.api.core.retry.RetryPolicy;
import com.datastax.oss.driver.api.core.retry.RetryDecision;
import com.datastax.oss.driver.api.core.session.Request;
import com.datastax.oss.driver.api.core.ConsistencyLevel;
import com.datastax.oss.driver.api.core.servererrors.WriteType;
import edu.umd.cs.findbugs.annotations.NonNull;
public class MyCustomRetryPolicy implements RetryPolicy {
@Override
public RetryDecision onWriteTimeout(
@NonNull Request request,
@NonNull ConsistencyLevel cl,
@NonNull WriteType writeType,
int blockFor,
int received,
int retryCount) {
// If we received enough acks but some were timeouts (Hinted Handoff might handle it),
// we might want to ignore or return success based on business logic.
// Example: If we are writing a log (CL=ONE) and it timed out, try once more
if (writeType == WriteType.SIMPLE && retryCount < 1) {
return RetryDecision.RETRY_SAME;
}
// Default: Re-throw exception if we didn't meet consistency level
return RetryDecision.RETHROW;
}
@Override
public RetryDecision onReadTimeout(
@NonNull Request request,
@NonNull ConsistencyLevel cl,
int blockFor,
int received,
boolean dataPresent,
int retryCount) {
if (retryCount < 1) {
return RetryDecision.RETRY_SAME;
}
return RetryDecision.RETHROW;
}
@Override
public RetryDecision onUnavailable(
@NonNull Request request,
@NonNull ConsistencyLevel cl,
int required,
int alive,
int retryCount) {
// If not enough nodes are alive, fail fast.
return RetryDecision.RETHROW;
}
@Override
public RetryDecision onRequestAborted(
@NonNull Request request,
@NonNull Throwable error,
int retryCount) {
return RetryDecision.RETHROW;
}
@Override
public RetryDecision onErrorResponse(
@NonNull Request request,
@NonNull com.datastax.oss.driver.api.core.servererrors.CoordinatorException error,
int retryCount) {
return RetryDecision.RETHROW;
}
@Override
public void close() {
// No resources to close
}
}
Go Example: Handling Unavailable
In Go, you check for specific errors.
package main
import (
"log"
"time"
"github.com/gocql/gocql"
)
func main() {
cluster := gocql.NewCluster("127.0.0.1")
cluster.Consistency = gocql.Quorum
session, _ := cluster.CreateSession()
defer session.Close()
// Attempt a write
err := session.Query("INSERT INTO logs (id, message) VALUES (?, ?)",
gocql.TimeUUID(), "User logged in").Exec()
if err != nil {
// Check for specific Cassandra errors
if err == gocql.ErrUnavailable {
log.Println("Cluster cannot meet Consistency Level! Too many nodes down.")
// Logic: Maybe retry with lower consistency if acceptable?
} else if err == gocql.ErrWriteTimeout {
log.Println("Write timed out. Hinted Handoff may have stored it, or it failed.")
} else {
log.Println("Other error:", err)
}
} else {
log.Println("Write success!")
}
}
5. Summary
- Hinted Handoff masks temporary node failures.
- The Coordinator stores writes (hints) for down nodes.
- Hints are replayed when the node recovers.
- Limitations: Hints expire (default 3 hours). They are not a replacement for backups or repair.
- Caution:
CL.ANYrelies entirely on hints and is risky.