Recovering from a single disk failure using JBOD

Recovering from a single disk failure in a disk array using JBOD.

How to recover from a single disk failure in a disk array using JBOD (just a bunch of disks).

Node can restart

  1. Stop Cassandra and shut down the node.
  2. Replace the failed disk.
  3. Start the node and Cassandra.
  4. Run nodetool repair on the node.

Node cannot restart

If the node cannot restart, it is possible the system directory is corrupted. If the node cannot restart after completing these steps, see Replacing a dead node or dead seed node.

If using the node uses vnodes:
  1. Stop Cassandra and shut down the node.
  2. Replace the failed disk.
  3. On a healthy node run the following command:
    $ nodetool ring | grep ip_address_of_node | awk ' {print $NF ","}' | xargs
  4. On the node with the new disk, add the list of tokens from the previous step (separated by commas), under initial_token in the cassandra.yaml file.
  5. Clear each system directory for every functioning drive:

    Assuming disk1 has failed and the data_file_directories setting in the cassandra.yaml for each drive is:

    -/mnt1/cassandra/data
    -/mnt2/cassandra/data
    -/mnt3/cassandra/data
    Run the following commands:
    rm -fr /mnt2/cassandra/data/system
    $ rm -fr /mnt3/cassandra/data/system
  6. Start the node and Cassandra.
  7. Run nodetool repair.
    CAUTION: The node serves stale data until the repair is complete.
  8. After the node is fully integrated into the cluster, it is recommended to return to normal vnode settings:
If the node uses assigned tokens (single-token architecture):
  1. Stop Cassandra and shut down the node.
  2. Replace the failed disk.
  3. Clear each system directory for every functioning drive:

    Assuming disk1 has failed and the data_file_directories setting in the cassandra.yaml for each drive is:

    -/mnt1/cassandra/data
    -/mnt2/cassandra/data
    -/mnt3/cassandra/data
    Run the following commands:
    rm -fr /mnt2/cassandra/data/system
    $ rm -fr /mnt3/cassandra/data/system
  4. Start the node and Cassandra.
  5. Run nodetool repair on the node.
The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml