Replacing a dead node or dead seed node

The following steps show you how to replace a dead node, such as hardware failure. The procedure for replacing a dead node is the same for vnodes and single-token nodes. Extra steps are required for replacing dead seed nodes.

Only add new nodes to the cluster. A new node is a system in which DataStax Enterprise (DSE) has never started. The node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints. Adding nodes previously used for testing or that have been removed from another cluster, merges the older data into the cluster and may cause data loss or corruption.

Where is the cassandra-topology.properties file?

The location of the cassandra-topology.properties file depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/etc/dse/cassandra/cassandra-topology.properties

Tarball installations + Installer-No Services installations

<installation_location>/resources/cassandra/conf/cassandra-topology.properties

Where is the cassandra-env.sh file?

The location of the cassandra-env.sh file depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/etc/dse/cassandra/cassandra-env.sh

Tarball installations + Installer-No Services installations

<installation_location>/resources/cassandra/conf/cassandra-env.sh

Where is the cassandra.yaml file?

The location of the cassandra.yaml file depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations + Installer-No Services installations

<installation_location>/resources/cassandra/conf/cassandra.yaml

Where is the cassandra-rackdc.properties file?

The location of the cassandra-rackdc.properties depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/etc/dse/cassandra/cassandra-rackdc.properties

Tarball installations + Installer-No Services

<installation_location>/resources/cassandra/conf/cassandra-rackdc.properties

The output of the nodetool status command provides a two-letter output for each node. The output indicates the status and the state of nodes. For example, UN for a node that is Up (its status) and in a Normal state. Different releases of DSE provide different information in the state field when the status is D (Down).

Let’s first clarify what to expect when a node status is stopped. A node is in a stopped state if the command nodetool drain has been issued on the node itself, or if the disk policy was set to disk_failure_policy: stop, and the policy has been triggered due to disk issues. A stopped state means that the DSE process is still running and it still responds to JMX commands, but the gossip (port 7000) and client connections (port 9042) are stopped.

The functionality depends on the DSE version. Because developers and administrators often compare functionality between DSE releases, there are scenarios that span specific releases of DSE 5.1.x, 6.0.x, 6.7.x, 6.8.x, and 6.9.x.

Procedure

  1. Run nodetool status to verify the node’s status and state. In particular, for the node to be replaced:

    • DataStax Enterprise (DSE) must not be running on the node; that is, the DSE Java process is stopped or the host itself is offline.

    • The node should be seen in a normal (N) state from other nodes. In other words, it should not be marked as joining (J) or leaving (L) the cluster. Note that the exact way of checking this status varies, and depends on your DSE version. Be sure to read the introductory text below and the multiple scenarios (ranges of DSE versions) that begin with Scenario 1.

      • Scenario 1

      • Scenario 2

      • Scenario 3

      In the following releases:

      • DSE 6.7.0 up to 6.7.7

      • DSE 6.0.0 up to 6.0.11

      • DSE 5.1.0 up to 5.1.17

      If a node status is D (down) the state can be one of:

      • N - Normal

      • L - Leaving

      • J - Joining

      • M - Moving

      If a node enters in a stopped state, then the state+status of the node is shown as:

      • UN on the node itself

      • DN from all the other nodes

      In the following releases:

      • DSE 6.8.0 up to 6.8.25

      • DSE 6.7.8 and higher 6.7.x

      • DSE 6.0.12 and higher 6.0.x

      • DSE 5.1.18 up to 5.1.32

      If a node status is D (down) the state can only be:

      S - Stopped

      If Gossip reports the node to be down, the state information doesn’t provide details on the state of the node and always returns stopped.

      To find if a node with status Down is on a Normal state, or if it was in a transitioning state such as L (leaving the cluster), you can use the output of the command nodetool ring. Check the status reported for its IP on any token belonging to the node, as in the following example for node 1.2.3.12:

      Datacenter: DC1
      ==========
      Address   Rack        Status State   Load            Owns     Token
                                                                     8932492356975004956
      1.2.3.10  RACK3       Up     Normal  105.33 GiB      ?        -8332242847914492341
      1.2.3.11  RACK2       Up     Normal  102.20 GiB      ?        -8236178585342294604
      1.2.3.12  RACK1       Down   Leaving 110.43 GiB      ?        -8053138995941424636
      1.2.3.10  RACK3       Up     Normal  105.33 GiB      ?        -7195762468279176051
      ...

      In the following releases:

      • DSE 6.9.0 and higher 6.9.x

      • DSE 5.1.33 and higher 5.1.x

      If a node status is D (down) the state can be one of:

      • N - Normal

      • L - Leaving

      • J - Joining

      • M - Moving

      • S - Stopped

      If a node enters in a stopped state, then the state+status of the node is shown as:

      • DS on the node itself

      • DN from all the other nodes

  2. Record the datacenter, address, and rack settings of the dead node to use later.

  3. Add the replacement node to the network and record its IP address.

  4. If the dead node was a seed node, change the cluster’s seed node configuration on each node:

    1. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds list in the seed-provider property.

    2. Restart the other nodes in the cluster so the cassandra.yaml update to the seed node field works.

    3. If the cluster needs a new seed node to replace the dead node, add the new node’s IP address to the - seeds list of the other nodes.

    4. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds list in the seed-provider property.

    5. Add the new node’s IP address to the - seeds list of the other nodes.

    6. Run nodetool reloadseeds to force the node to read the changes to the - seeds list in the cassandra.yaml file.

      Making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately three nodes per datacenter).

  5. On an existing node, gather setting information for the new node from the cassandra.yaml file:

    • cluster_name

    • endpoint_snitch

    • Other non-default settings: Use the diff tool to compare current settings with default settings.

  6. Gather rack and datacenter information:

  7. Make sure that the new node meets all prerequisites and then Install DSE on the new node, but do not start DSE.

    Be sure to install the same version of DSE as is installed on the other nodes in the cluster, as described in the installation instructions.

  8. If DSE automatically started on the node, stop and clear the data that was added automatically on startup.

  9. Add values to the following properties in cassandra.yaml file from the information you gathered earlier:

    • auto_bootstrap: If this setting exists and is set to false, set it to true. (This setting is not included in the default cassandra.yaml configuration file.)

    • cluster_name

    • seed list

      If the new node is a seed node, make sure it is not listed in its own - seeds list.

  10. Add the rack and datacenter configuration:

  11. Start the new node with the replace_address option, passing in the IP address of the dead node.

    • Package and Installer-Services installations:

      1. Add the following option to cassandra-env.sh file:

        JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=address_of_dead_node
        1. Start the node.

        2. After the node bootstraps, remove the replace-address parameter from cassandra-env.sh.

        3. Restart the node.

    • Tarball and Installer-No Services installations:

      • Start DataStax Enterprise from the <installation_location> with this option:

        sudo bin/dse cassandra -Dcassandra.replace_address=address_of_dead_node

        Start DataStax Enterprise from the <installation_location> with this option:

        sudo bin/dse cassandra -Dcassandra.replace_address=address_of_dead_node
  12. Run nodetool status to verify that the new node has bootstrapped successfully.

    Tarball and Installer No-Services path:

    <installation_location>/resources/cassandra/bin
  13. In environments that use the PropertyFileSnitch, wait at least 72 hours and then, on each node, remove the old node’s IP address from the cassandra-topology.properties file.

    This ensures that old node’s information is removed from gossip. If removed from the property file too soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in gossip until LEFT status disappears.

    The cassandra-rackdc.properties file does not contain IP information; therefore this step is not required when using other snitches, such as GossipingPropertyFileSnitch.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com