Replace a dead node in a single-token architecture cluster

Steps for replacing nodes in single-token architecture clusters, not vnodes.

Only add new nodes to the cluster. A new node is a system that HCD has never started.

The node must have absolutely NO PREVIOUS DATA in the data/, saved_caches/, commitlog/, and hints/ subdirectories.

Adding nodes previously used for testing, or that have been removed from another cluster, merges the older data and its incompatible schema into the cluster and may cause data loss or corruption.

The output of the nodetool status command provides a two-letter output for each node. The output indicates the status and the state of nodes. For example, UN for a node that is Up (its status) and in a Normal state. Different releases of HCD provide different information in the state field when the status is D (Down).

Let’s first clarify what to expect when a node status is stopped. A node is in a stopped state if the command nodetool drain has been issued on the node itself, or if the disk policy was set to disk_failure_policy: stop, and the policy has been triggered due to disk issues. A stopped state means that the HCD process is still running and it still responds to JMX commands, but the gossip (port 7000) and client connections (port 9042) are stopped.

Replace a dead node in a single-token cluster

  1. Run nodetool status to verify the node’s status and state. In particular, for the node to be replaced:

    • HCD must not be running on the node; that is, the HCD Java process is stopped or the host itself is offline.

    • The node should be seen in a normal (N) state from other nodes. It should not be marked as joining (J) or leaving (L) the cluster.

      If a node status is D (down) the state can be one of:

      • N - Normal

      • L - Leaving

      • J - Joining

      • M - Moving

      • S - Stopped

        If a node enters in a stopped state, then the state+status of the node is shown as DS on the node itself and DN from all the other nodes.

  2. Record the existing initial_token setting from the dead node’s cassandra.yaml.

  3. If the dead node was a seed node, change the cluster’s seed node configuration on each node:

    1. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds list in the seed-provider property.

    2. If the cluster needs a new seed node to replace the dead node, add the new node’s IP address to the - seeds list of the other nodes.

      Making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately three nodes per datacenter).

  4. On an existing node, gather setting information for the new node from the cassandra.yaml file:

    • cluster_name

    • endpoint_snitch

    • Other non-default settings: Use the diff tool to compare current settings with default settings.

  5. Gather rack and datacenter information:

  6. Add values to the following properties in cassandra.yaml file from the information gathered earlier:

    • auto_bootstrap: If this setting exists and is set to false, set it to true. (This setting is not included in the default cassandra.yaml configuration file.)

    • cluster_name

    • initial token

    • seed list

      If the new node is a seed node, make sure it is not listed in its own - seeds list.

  7. Add the rack and datacenter configuration:

  8. Start the new node with the required options:

    Package installations:

    1. Add the following option to jvm-server.options:

      -Dcassandra.replace_address_first_boot=<address_of_dead_node>
    2. Start the node.

    3. After the node bootstraps, remove replace_address_first_boot (if specified) from jvm-server.options. Tarball installations:

    4. Add the following parameter to the start up command line:

      sudo bin/hcd cassandra -Dcassandra.replace_address_first_boot=<address_of_dead_node>
  9. Run nodetool status to verify that the new node has bootstrapped successfully.

    Tarball path: <installation_location>/resources/cassandra/bin

  10. In environments that use the PropertyFileSnitch, wait at least 72 hours and then, on each node, remove the old node’s IP address from the cassandra-topology.properties file.

    This ensures that old node’s information is removed from gossip. If removed from the property file too soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in gossip until LEFT status disappears.

    The cassandra-rackdc.properties file does not contain IP information; therefore this step is not required when using other snitches, such as GossipingPropertyFileSnitch.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com