Data replication

A replication strategy determines the nodes where replicas are placed.

DataStax Enterprise (DSE) stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the particular nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row in the cluster. A replication factor of 2 means two copies of each row, where each copy is on a different node.

Never use a replication factor of 1. If the node containing the row goes down, the row cannot be retrieved.

All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later. The replication factor also depends on the node type. Before using DSE in production, you must set the replication factors for analytics keyspaces and security keyspaces.

Two replication strategies are available:

SimpleStrategy

Use only for development and if you only have a single datacenter and one rack. However, if you ever have, or intend to have more than one datacenter, use the NetworkTopologyStrategy instead. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location).

NetworkTopologyStrategy

Highly recommended for most deployments because it is much easier to expand your cluster to multiple datacenters. This strategy specifies how many replicas you want in each datacenter.

NetworkTopologyStrategy places replicas in the same datacenter by walking the ring clockwise until reaching the first node in another rack. It also attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) can fail at the same time due to power, cooling, or network issues.

When deciding how many replicas to configure in each datacenter, the two primary considerations are satisfying reads locally without incurring cross datacenter latency, and failure scenarios. The two most common ways to configure multiple-datacenter clusters are:

  • Two replicas in each datacenter: This configuration tolerates the failure of a single node per replication group and still allows local reads at a consistency level of ONE.

  • Three replicas in each datacenter: This configuration tolerates either the failure of one node per replication group at a strong consistency level of LOCAL_QUORUM or multiple node failures per datacenter using consistency level ONE. Asymmetrical replication groupings are also possible. For example, you can have three replicas in one datacenter to serve real-time application requests and use a single replica elsewhere for running analytics.

Replication strategy is defined per keyspace, and is set during keyspace creation. To set up a keyspace, see Creating a keyspace.

For more replication strategy options, see Changing keyspace replication strategy.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com