Recovering from a single disk failure using JBOD
Steps for recovering from a single disk failure in a disk array using JBOD (just a bunch of disks).
Steps for recovering from a single disk failure in a disk array using JBOD (just a bunch of disks).
Cassandra might not fail from the loss of one disk in a JBOD array, but some reads
and writes may fail when:
- The operation's consistency level is ALL.
- The data being requested or written is stored on the defective disk.
- The data to be compacted is on the defective disk.
It's possible that you can simply replace the disk, restart Cassandra, and run nodetool repair. However, if the disk crash corrupted the Cassandra system table, you must remove the incomplete data from the other disks in the array. The procedure for doing this depends on whether the cluster uses vnodes or single-token architecture.
Procedure
These steps are supported for Cassandra versions 3.2 and later. If a disk
fails on a node in a cluster using an earlier version of Cassandra, replace the node.
-
Verify that the node has a defective disk and identify the disk:
- If the node is still running, stop Cassandra and shut down the node.
- Replace the defective disk and restart the node.
-
If the node cannot restart:
-
Try restarting Cassandra without bootstrapping the node:
Package installations:
- Add the following option to
cassandra-env.sh
file:
JVM_OPTS="$JVM_OPTS -Dcassandra.allow_unsafe_replace=true
- Start the node.
- After the node bootstraps, remove the
-Dcassandra.allow_unsafe_replace=true
parameter from cassandra-env.sh. - Restart the node.
Tarball installations:
- Start Cassandra with this
option:
$ sudo bin/cassandra Dcassandra.allow_unsafe_replace=true
- Add the following option to
cassandra-env.sh
file:
-
Try restarting Cassandra without bootstrapping the node:
- If Cassandra restarts, run nodetool repair on the node. If not, replace the node.
- If the repair succeeds, the node is restored to production. Otherwise, go to 7 or 8.
-
For a cluster using vnodes:
-
For a cluster single-token nodes: