Replacing a dead node or dead seed node
The following steps show you how to replace a dead node. The procedure for replacing a dead node is the same for vnodes and single-token nodes. Extra steps are required for replacing dead seed nodes.
|
Only add new nodes to the cluster: A new node is a system that DSE has never started. The node must have absolutely no previous data in the Data loss or corruption can occur if you add a node that was previously used for testing or moved from another cluster because the older data is merged into the existing cluster data. |
-
Run
nodetool statusto verify the node’s status and state. This command returns a two-letter code describing the status and state of the node. For example,UNmeans the node is inUpstatus andNormalstate.To replace a node, DSE must not be running on the node. This means that the DSE Java process is stopped or the host itself is offline.
A node in
Stopped(S) state isn’t the same as stopping DSE on the node. A node enters theStoppedstate if thenodetool draincommand is issued on the node itself, or if the disk policy is set todisk_failure_policy: stopand the policy is triggered due to disk issues. A stopped state means that the DSE process is still running and it can still responds to JMX commands. However, connections are stopped for gossip (port 7000) and clients (port 9042).To replace a node, DSE cannot be running on the node and the node must be seen in a normal (
N) state from other nodes. The node you want to replace cannot be marked asJoining(J) orLeaving(L) the cluster.When a node is in
Down(D) status, the possible state values depend on the node’s version of DSE. Review the following information for your version of DSE to understand what to expect when a node is inDownstatus. This helps you verify that the node is in a condition to be replaced.DSE 6.9.x
In DSE 6.9.0 and later 6.9.x versions, if a node status is
D(down), the state can be one of:-
N- Normal -
L- Leaving -
J- Joining -
M- Moving -
S- Stopped
If a node enters in a stopped state, then the state and status of the node is shown as
DSon the node itself andDNfrom all the other nodes.DSE 6.8.0 to 6.8.25
In DSE 6.8.0 to 6.8.25, if a node status is
D(down) the state can only beS(stopped).If Gossip reports the node to be down, the state information doesn’t provide details on the state of the node and always returns stopped.
Use the output of the
nodetool ringcommand to determine if a node in the down status is in a normal state or if it was in a transitioning state, such asLeaving(leaving the cluster). Check the status reported for the node’s IP on any token belonging to the node. For example, the following result is for a node with IP1.2.3.12:Datacenter: DC1 ========== Address Rack Status State Load Owns Token 8932492356975004956 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -8332242847914492341 1.2.3.11 RACK2 Up Normal 102.20 GiB ? -8236178585342294604 1.2.3.12 RACK1 Down Leaving 110.43 GiB ? -8053138995941424636 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -7195762468279176051 ...DSE 6.7.8 through the end of the 6.7.x series
In DSE 6.7.8 and later 6.7.x versions, if a node status is
D(down) the state can only beS(stopped).If Gossip reports the node to be down, the state information doesn’t provide details on the state of the node and always returns stopped.
Use the output of the
nodetool ringcommand to determine if a node in the down status is in a normal state or if it was in a transitioning state, such asLeaving(leaving the cluster). Check the status reported for the node’s IP on any token belonging to the node. For example, the following result is for a node with IP1.2.3.12:Datacenter: DC1 ========== Address Rack Status State Load Owns Token 8932492356975004956 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -8332242847914492341 1.2.3.11 RACK2 Up Normal 102.20 GiB ? -8236178585342294604 1.2.3.12 RACK1 Down Leaving 110.43 GiB ? -8053138995941424636 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -7195762468279176051 ...DSE 6.7.0 to 6.7.7
In DSE 6.7.0 to 6.7.7, if a node status is
D(down) the state can be one of:-
N- Normal -
L- Leaving -
J- Joining -
M- Moving
If a node enters in a stopped state, then the state and status of the node is shown as
UNon the node itself andDNfrom all the other nodes.DSE 6.0.12 through the end of the 6.0.x series
In DSE 6.0.12 and later 6.0.x versions, if a node status is
D(down) the state can only beS(stopped).If Gossip reports the node to be down, the state information doesn’t provide details on the state of the node and always returns stopped.
Use the output of the
nodetool ringcommand to determine if a node in the down status is in a normal state or if it was in a transitioning state, such asLeaving(leaving the cluster). Check the status reported for the node’s IP on any token belonging to the node. For example, the following result is for a node with IP1.2.3.12:Datacenter: DC1 ========== Address Rack Status State Load Owns Token 8932492356975004956 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -8332242847914492341 1.2.3.11 RACK2 Up Normal 102.20 GiB ? -8236178585342294604 1.2.3.12 RACK1 Down Leaving 110.43 GiB ? -8053138995941424636 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -7195762468279176051 ...DSE 6.0.0 to 6.0.11
In DSE 6.0.0 to 6.0.11, if a node status is
D(down) the state can be one of:-
N- Normal -
L- Leaving -
J- Joining -
M- Moving
If a node enters in a stopped state, then the state and status of the node is shown as
UNon the node itself andDNfrom all the other nodes.DSE 5.1.33 through the end of the 5.1.x series
In DSE 5.1.33 and later 5.1.x versions, if a node status is
D(down), the state can be one of:-
N- Normal -
L- Leaving -
J- Joining -
M- Moving -
S- Stopped
If a node enters in a stopped state, then the state and status of the node is shown as
DSon the node itself andDNfrom all the other nodes.DSE 5.1.18 to 5.1.32
In DSE 5.1.18 up to 5.1.32, if a node status is
D(down) the state can only beS(stopped).If Gossip reports the node to be down, the state information doesn’t provide details on the state of the node and always returns stopped.
Use the output of the
nodetool ringcommand to determine if a node in the down status is in a normal state or if it was in a transitioning state, such asLeaving(leaving the cluster). Check the status reported for the node’s IP on any token belonging to the node. For example, the following result is for a node with IP1.2.3.12:Datacenter: DC1 ========== Address Rack Status State Load Owns Token 8932492356975004956 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -8332242847914492341 1.2.3.11 RACK2 Up Normal 102.20 GiB ? -8236178585342294604 1.2.3.12 RACK1 Down Leaving 110.43 GiB ? -8053138995941424636 1.2.3.10 RACK3 Up Normal 105.33 GiB ? -7195762468279176051 ...DSE 5.1.0 to 5.1.17
In DSE 5.1.0 to 5.1.17, if a node status is
D(down) the state can be one of:-
N- Normal -
L- Leaving -
J- Joining -
M- Moving
If a node enters in a stopped state, then the state and status of the node is shown as
UNon the node itself andDNfrom all the other nodes. -
-
Record the datacenter, address, and rack settings of the dead node to use later.
-
Add the replacement node to the network and record its IP address.
-
If the dead node was a seed node, change the cluster’s seed node configuration on each node:
-
In the
cassandra.yamlfile for each node, remove the IP address of the dead node from the- seedslist in theseed-providerproperty. -
If the cluster needs a new seed node to replace the dead node, add the new node’s IP address to the
- seedslist of the other nodes.Making every node a seed node isn’t recommended because of increased maintenance and reduced gossip performance. Gossip optimization isn’t critical, but DataStax recommends using a small seed list of approximately three nodes per datacenter.
-
Run
nodetool reloadseedsto force the node to read the changes to the- seedslist in thecassandra.yamlfile.
-
-
On an existing node, gather setting information for the new node from the
cassandra.yamlfile:-
cluster_name -
endpoint_snitch -
Other non-default settings: Use the
difftool to compare current settings with default settings.
-
-
Gather rack and datacenter information:
-
If the cluster uses the
PropertyFileSnitch, record the rack and data assignments listed in thecassandra-topology.propertiesfile, or copy the file to the new node. -
If the cluster uses the
GossipingPropertyFileSnitch, Amazon EC2 single-region snitch, Amazon EC2 multi-region snitch, or Google Cloud Platform snitch, then you must record the rack and datacenter assignments in the dead node’scassandra-rackdc.propertiesfile.
-
-
Make sure that the new node meets all prerequisites and then install DSE on the new node, but don’t start DSE.
Be sure to install the same version of DSE as is installed on the other nodes in the cluster.
-
If DSE automatically starts on the node, stop DSE and clear the data that was added automatically on startup.
-
Add values to the following properties in
cassandra.yamlfile from the information you gathered earlier:-
auto_bootstrap: Iffalse, set it totrue. This option is not explicitly set in the defaultcassandra.yamlconfiguration file, and it defaults totrue. -
seedslistIf the new node is a seed node, make sure it is not listed in its own
- seedslist.
-
-
Add the rack and datacenter configuration:
-
If the cluster uses the
GossipingPropertyFileSnitch, Amazon EC2 single-region snitch, Amazon EC2 multi-region snitch, or Configuring the Google Cloud Platform snitch, then you must add the dead node’s rack and datacenter assignments to thecassandra-rackdc.propertiesfile on the replacement node.Don’t remove the entry for the dead node’s IP address yet.
-
If the cluster uses the
PropertyFileSnitch, copy thecassandra-topology.propertiesfile from an existing node to the replacement node. Then, for each node in the cluster, edit the file to add an entry with the new node’s IP address and the dead node’s rack and datacenter assignments.
-
-
Start the new node with the required options:
-
Package installations
-
Tarball installations
-
Add the following option to
jvm-server.options:-Dcassandra.replace_address_first_boot=<address_of_dead_node> -
Note that during node replacement, the replacement node will run a repair to make data consistent with respect to
LOCAL_QUORUM. To change the replace consistency toONE(no consistency) orQUORUM(global consistency), set theconsistent_replaceflag. For example:-Ddse.consistent_replace=QUORUMOther options that control repair during a consistent replace are:
-
After the node bootstraps, remove
replace_address_first_bootandconsistent_replace(if specified) fromjvm-server.options.
Start each node with the following options:
-
For all nodes, include
-Dcassandra.replace_address_first_boot=set to the address of the dead node:sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=ADDRESS_OF_DEAD_NODE -
If applications expect
QUORUMorLOCAL_QUORUMconsistency levels from the cluster, include theconsistent_replaceparameter set to eitherQUORUMorLOCAL_QUORUM:sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=ADDRESS_OF_DEAD_NODE -Ddse.consistent_replace=QUORUMThis ensures data consistency on the replacement node; otherwise, the node might stream from a potentially inconsistent replica, which can cause reads to return stale data.
Other options that control repair during a consistent replace are:
-
-
Run
nodetool statusto verify that the new node has bootstrapped successfully.For tarball installations, run this command from the
/resources/cassandra/bindirectory of your DSE installation. -
In environments that use the
PropertyFileSnitch, wait at least 72 hours, and then remove the old node’s IP address from thecassandra-topology.propertiesfile on each node.This ensures that old node’s information is removed from gossip. If removed from the property file too soon, problems can occur. Use
nodetool gossipinfoto check the gossip status. The node is still in gossip untilLEFTstatus disappears.The
cassandra-rackdc.propertiesfile doesn’t contain IP information. Therefore, this step isn’t required when using other snitches, such asGossipingPropertyFileSnitch.