Replacing a dead node or dead seed node
Steps to replace a node that has died for some reason, such as hardware failure.
The procedure for replacing a dead node is the same for vnodes and single-token nodes. Extra steps are required for replacing dead seed nodes.
jvm.options
The location of the jvm.options file depends on the type of installation:Package installations | /etc/dse/cassandra/jvm.options |
Tarball installations | installation_location/resources/cassandra/conf/jvm.options |
cassandra-rackdc.properties
The location of the cassandra-rackdc.properties file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra-rackdc.properties |
Tarball installations | installation_location/resources/cassandra/conf/cassandra-rackdc.properties |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
cassandra-topology.properties
The location of the cassandra-topology.properties file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra-topology.properties |
Tarball installation | installation_location/resources/cassandra/conf/cassandra-topology.properties |
Procedure
-
Run nodetool status to verify the node's
status and state.
In particular, for the node to be replaced:
- DataStax Enterprise (DSE) must not be running on the node; that is, the DSE Java process is stopped or the host itself is offline.
- The node should be seen in a normal (N) state from other nodes. Meaning, it should not be marked as joining (J) or leaving (L) the cluster. Note that the exact way of checking this status varies, and depends on your DSE version. Be sure to read the introductory text below and the multiple scenarios (ranges of DSE versions) that begin with Scenario 1.
The output of the
nodetool status
command provides a two-letter output for each node. The output indicates the status and the state of nodes. For example,UN
for a node that is Up (its status) and in a Normal state. Different releases of DSE provide different information in the state field when the status is D (Down).Let's first clarify what to expect when a node status is stopped. A node is in a
stopped
state if the commandnodetool drain
has been issued on the node itself, or if the disk policy was set todisk_failure_policy: stop
, and the policy has been triggered due to disk issues. A stopped state means that the DSE process is still running and it still responds to JMX commands, but the gossip (port 7000) and client connections (port 9042) are stopped.The functionality depends on the DSE version. Because developers and administrators often compare functionality between DSE releases, we'll present scenarios that span specific releases of DSE 5.1.x, 6.0.x, 6.7.x, and 6.8.x.
Scenario 1
In the following releases:- DSE 6.7.0 up to 6.7.7
- DSE 6.0.0 up to 6.0.11
- DSE 5.1.0 up to 5.1.17
If a node status is D (down) the state can be one of:- N - Normal
- L - Leaving
- J - Joining
- M - Moving
If a node enters in a stopped state, then the state+status of the node will be shown as:- UN on the node itself
- DN from all the other nodes
Scenario 2
In the following releases:- DSE 6.8.0 up to 6.8.25
- DSE 6.7.8 and higher 6.7.x
- DSE 6.0.12 and higher 6.0.x
- DSE 5.1.18 up to 5.1.32
If a node status is D (down) the state can only be:
S - Stopped
In other words, if Gossip reports the node to be down, the state information doesn't provide details on the state of the node and always returns stopped. To find if a node with status Down is on a Normal state, or if it was in a transitioning state such as L (leaving the cluster), you can use the output of the command
nodetool ring
. Check the status reported for its IP on any token belonging to the node, as in the following example for node 1.2.3.12:Datacenter: DC1 ========== Address Rack Status State Load Owns Token 8932492356975004956 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -8332242847914492341 1.2.3.11 RACK2 Up Normal 102.20 GiB ? -8236178585342294604 1.2.3.12 RACK1 Down Leaving 110.43 GiB ? -8053138995941424636 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -7195762468279176051 ...
Scenario 3
In the following releases:- DSE 6.8.26 and higher 6.8.x
- DSE 5.1.33 and higher 5.1.x
If a node status is D (down) the state can be one of:- N - Normal
- L - Leaving
- J - Joining
- M - Moving
- S - Stopped
If a node enters in a stopped state, then the state+status of the node will be shown as:- DS on the node itself
- DN from all the other nodes
- Record the datacenter, address, and rack settings of the dead node; you will use these later.
- Add the replacement node to the network and record its IP address.
-
If the dead node was a seed node, change the cluster's seed node configuration
on each node:
-
On an existing node, gather setting information for the new node from the
cassandra.yaml file:
cluster_name
endpoint_snitch
- Other non-default settings: Use the diff tool to compare current settings with default settings.
-
Gather rack and datacenter information:
- If the cluster uses the , record the rack and data assignments listed in the cassandra-topology.properties file, or copy the file to the new node.
- If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region snitch, Configuring Amazon EC2 multi-region snitch, or Configuring the Google Cloud Platform snitch, record the rack and datacenter assignments in the dead node's cassandra-rackdc.properties file.
-
Make sure that the new node meets all prerequisites and then Install DataStax Enterprise on the new node,
but do not start DataStax Enterprise.
Note: Be sure to install the same version of DataStax Enterprise as is installed on the other nodes in the cluster, as described in the installation instructions.
- If DataStax Enterprise automatically started on the node, stop and clear the data that was added automatically on startup.
-
Add values to the following properties in
cassandra.yaml file from the information you
gathered earlier:
- auto_bootstrap: If this setting exists and is set to
false
, set it totrue
. (This setting is not included in the default cassandra.yaml configuration file.) - cluster_name
- seed listWarning: If the new node is a seed node, make sure it is not listed in its own
- seeds
list.
- auto_bootstrap: If this setting exists and is set to
-
Add the rack and datacenter configuration:
- If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region snitch, and Configuring Amazon EC2 multi-region snitch or Configuring the Google Cloud Platform snitch, add the dead node's rack and datacenter assignments to the cassandra-rackdc.properties file on the replacement node.
- If the cluster uses the
PropertyFileSnitch:
- Copy the cassandra-topology.properties file from an existing node to the replacement node.
- For each node in the cluster, edit the file to add an entry with the new node's IP address and the dead node's rack and datacenter assignments.
Important: Do not remove the entry for the dead node's IP address yet. -
Start the new node with the required options:
Package installations:
- Add the following option to
jvm.options:
-Dcassandra.replace_address_first_boot=address_of_dead_node
- If applications expect
QUORUM
orLOCAL_QUORUM
consistency levels from the cluster, add the consistent_replace option to jvm.options using eitherQUORUM
orLOCAL_QUORUM
values to ensure data consistency on the replacement node, otherwise the node may stream from a potentially inconsistent replica, and reads may return stale data.For example:
-Ddse.consistent_replace=LOCAL_QUORUM
Tip: Other options that control repair during a consistent replace are: - Start the node.
- After the node bootstraps, remove
replace_address_first_boot
andconsistent_replace
(if specified) from jvm.options.
Tarball installations:
- Add the following parameter to the start up command
line:
sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=address_of_dead_node
- If applications expect
QUORUM
orLOCAL_QUORUM
consistency levels from the cluster, in addition toreplace_address_first_boot
, add the consistent_replace parameter using eitherQUORUM
orLOCAL_QUORUM
values to ensure data consistency on the replacement node, otherwise the node may stream from a potentially inconsistent replica, and reads may return stale data.For example:
sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=address_of_dead_node -Ddse.consistent_replace=LOCAL_QUORUM
Tip: Other options that control repair during a consistent replace are:
- Add the following option to
jvm.options:
-
Run nodetool status to verify that the new node has
bootstrapped successfully.
Tarball path:
installation_location/resources/cassandra/bin
-
In environments that use the PropertyFileSnitch, wait at least 72 hours and
then, on each node, remove the old node's IP address from the
cassandra-topology.properties file.
CAUTION: This ensures that old node's information is removed from gossip. If removed from the property file too soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in gossip until LEFT status disappears.Note: The cassandra-rackdc.properties file does not contain IP information; therefore this step is not required when using other snitches, such as GossipingPropertyFileSnitch.