Recovering from a single disk failure using JBOD
Steps for recovering from a single disk failure in a disk array using JBOD (just a bunch of disks).
Steps for recovering from a single disk failure in a disk array using JBOD (just a bunch of disks).
- The operation's consistency level is ALL.
- The data being requested or written is stored on the defective disk.
- The data to be compacted is on the defective disk.
It's possible that you can simply replace the disk, restart DSE, and run nodetool repair. However, if the disk crash corrupted system table, you must remove the incomplete data from the other disks in the array. The procedure for doing this depends on whether the cluster uses vnodes or single-token architecture.
cassandra-env.sh
The location of the cassandra-env.sh file depends on the type of installation:
Package installations |
/etc/dse/cassandra/cassandra-env.sh |
Tarball installations |
installation_location/resources/cassandra/conf/cassandra-env.sh |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:
Package installations |
/etc/dse/cassandra/cassandra.yaml |
Tarball installations |
installation_location/resources/cassandra/conf/cassandra.yaml |
Procedure
-
Verify that the node has a defective disk and identify the disk, by checking
the logs on the affected node.
Disk failures are logged in
FILE NOT FOUND
entries, which identifies the mount point or disk that has failed. - If the node is still running, stop DSE and shut down the node.
- Replace the defective disk and restart the node.
-
If the node cannot restart:
-
Try restarting DSE without bootstrapping the node:
Package and Installer-Services installations:
- Add the following option to
cassandra-env.sh
file:
JVM_OPTS="$JVM_OPTS -Dcassandra.allow_unsafe_replace=true
- Starting DataStax Enterprise as a service.
- After the node bootstraps, remove the
-Dcassandra.allow_unsafe_replace=true
parameter from cassandra-env.sh. - Starting DataStax Enterprise as a service.
Tarball and Installer-No Services installations:
- Start DataStax Enterprise with this
option:
sudo bin/dse cassandra Dcassandra.allow_unsafe_replace=true
Tarball and Installer No-Services path:installation_location
- Add the following option to
cassandra-env.sh
file:
-
Try restarting DSE without bootstrapping the node:
- If DSE restarts, run nodetool repair on the node. If not, replace the node.
- If the repair succeeds, the node is restored to production. Otherwise, go to 7 or 8.
-
For a cluster using vnodes:
-
For a cluster single-token nodes: