Replication and consistency
In Cassandra-based databases you can configure both the replication and consistency of your data. Configure the replication factor to control the process of copying data to multiple replica nodes, ensuring its availability and durability. Set the consistency level to specify how many replica nodes must acknowledge a request for it to succeed.
Replication
Astra DB automatically manages replication, ensuring data is distributed across multiple cloud availability zones for fault tolerance and high availability. You cannot manually configure replication in Astra DB. |
To configure data replication in Cassandra-based databases, you need to define a keyspace and specify the replication strategy and replication factor. The replication strategy controls how the data is distributed across the cluster. The replication factor determines the number of copies of data that are stored in the cluster. All tables in a keyspace use the same replication strategy and replication factor.
To ensure availability, most production databases should use a replication factor of 3.
Configure replication
DataStax recommends the NetworkTopologyStrategy
to distribute data in single or multi-datacenter clusters.
-
Single datacenter
-
Multiple datacenters
To create a keyspace with 3 replicas in a single datacenter, use the following CQL command:
CREATE KEYSPACE user_profiles WITH replication = {
'class': 'NetworkTopologyStrategy',
'replication_factor': 3
};
To create a keyspace with 3 replicas in the east
datacenter and 3 replica in the west
datacenter, use the following CQL command:
CREATE KEYSPACE user_profiles WITH replication = {
'class': 'NetworkTopologyStrategy',
'east': 3,
'west': 3
};
Consistency level
In Cassandra-based databases, the consistency level together with the replication factor determines the number of replica nodes that must acknowledge a read or write operation for it to succeed. You can configure the consistency level for both read and write operations.
You can set the consistency level in the driver, in the connection, or for individual operations.
Write consistency
The database will always attempt to write data to the number of replica nodes specified by the replication factor. Write operations will only succeed if the number of nodes specified by the consistency level acknowledge the write operation.
Low write consistency levels can result in different nodes holding different versions of the data. Cassandra-based databases are eventually consistent since they use internal mechanisms to synchronize data to all nodes over time.
Write consistency levels
Level | Description | Usage |
---|---|---|
|
All replica nodes must acknowledge the write. |
This write consistency level provides the highest consistency, the highest latency, and the lowest availability of any level. |
|
A quorum of replica nodes across all datacenters must acknowledge the write. |
Cross-datacenter communication may incur extra latency. |
|
At least one replica node must acknowledge the write. |
Use for high availability and low consistency. Note: Astra DB does not support consistency level |
|
At least two replica nodes must acknowledge the write. |
Similar to |
|
At least three replica nodes must acknowledge the write. |
Similar to |
|
At least one replica node must acknowledge the write, or if no replica nodes are available, a coordinator node must store a hint. If all replica nodes are down at write time, the data will not be available until the replica nodes for that partition have recovered. |
This write consistency level provides the lowest latency, the highest write availability, and the lowest consistency. Note: Astra DB does not support consistency level |
|
A quorum of replica nodes in the local datacenter must acknowledge the write. Avoids latency of cross-datacenter communication. |
Use |
|
A quorum of replica nodes in the each datacenter must acknowledge the write. |
Use |
|
At least one replica node in the local datacenter must acknowledge the write. |
Use Note: Astra DB does not support consistency level |
Read consistency
The database will only attempt to read data from the number of replica nodes specified by the consistency level. Read operations will only succeed if the number of nodes specified by the consistency level acknowledge the read operation.
Read consistency levels
Level | Description | Usage |
---|---|---|
|
Queries return the most recent data from all replica nodes in the cluster. All replica nodes must must respond. |
This read consistency level provides the highest consistency, the highest latency, and the lowest availability of any level. |
|
Queries return the most recent data from a quorum of replica nodes across all datacenters. |
Cross-datacenter communication may incur extra latency. |
|
Queries return data from the closest replica. |
Use for high availability and low consistency. |
|
Queries return the most recent data from two of the closest replicas. Two replica nodes must respond. |
Similar to |
|
Queries return the most recent data from three of the closest replicas. Three replica nodes must respond. |
Similar to |
|
Queries returns the most recent data from a quorum of replicas in the current datacenter.
|
Use |
|
Queries return the most recent data from a quorum of replica nodes in each datacenter has responded. |
Use |
|
Queries return data from the closest replica node in the local datacenter. |
Use |
|
Read consistency level |
Use to achieve linearizable consistency for lightweight transactions. |
|
Read consistency level |
Use to achieve linearizable consistency for lightweight transactions. |
Immediate consistency
Immediate consistency ensures that read operations always return the most recent version of data. You can configure of read and write consistency levels to achieve immediate consistency. Immediate consistency is sometimes referred to as strong consistency.
The simplest way to ensure immediate consistency is to use the consistency level ALL
for both read and write operations.
With this approach, both read and write operations will only succeed if all replica nodes respond to the operation.
While this guarantees immediate consistency, it also results in the highest latency and lowest availablity.
You can use different consistency levels for read and write operations to achieve immediate consistency. Selecting different consistency levels for read and write operations can help you balance consistency, availability, and latency.
The formula for immediate consistency is:
Write Consistency Level + Read Consistency Level > Replication Factor
The following table contains some examples of write and read consistency levels that achieve immediate consistency.
Write Consistency Level | Read Consistency Level | Description |
---|---|---|
|
|
Average latency and availability. Consistency levels are met in the local datacenter. Suitable for balanced workloads. |
|
|
High write latency, low write availability, low read latency, high read availability. Suitable for read-heavy workloads. |
|
|
Low write latency, high write availability, high read latency, low read availability. Suitable for write-heavy workloads. |