How are consistent read and write operations handled?
An introduction to how Cassandra extends eventual consistency with tunable consistency to vary the consistency of data read and written.
Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Using repair operations, Cassandra data will eventually be consistent in all replicas. Repairs work to decrease the variability in replica data, but at a given time, stale data can be present. Cassandra is a AP system according to the CAP theorem, providing high availability and partition tolerance. Cassandra does have flexibility in its configuration, though, and can perform more like a CP (consistent and partition tolerant) system according to the CAP theorem, depending on the application requirements. Two consistency features are tunable consistency and linearizable consistency.
Tunable consistency
To ensure that data is written and read correctly, Cassandra extends the concept of eventual consistency by offering tunable consistency. Tunable consistency allows individual read or write operations to be as strongly consistent as required by the client application. The consistency level of each read or write operation can be set, so that the data returned is more or less consistent, based on need. The tradeoff between operation latency and consistency level can be tuned down to the per-operation level, or set globally for a cluster or datacenter. Using tunable consistency, Cassandra can act more like a CP (consistent and partition tolerant) or AP (highly available and partition tolerant) system according to the CAP theorem, depending on the application requirements.
The consistency level determines only the number of replicas that need to acknowledge the read or write operation success to the client application. For read operations, the read consistency level specifies how many replicas must respond to a read request before returning data to the client application. Read operations will use read repair to update stale data in the background if discovered during a read operation.
For write operations, the write consistency level specified how many replicas must respond to a write request before the write is considered successful. Even at low consistency levels, Cassandra writes to all replicas of the partition key, including replicas in other datacenters. The write consistency level just specifies when the coordinator can report to the client application that the write operation is considered completed. Write operations will use hinted handoffs to ensure the writes are completed when replicas are down or otherwise not responsive to the write request.
Typically, a client specifies a consistency level that is less than the replication factor specified by the keyspace. Another common practice is to write at a consistency level of QUORUM and read at a consistency level of QUORUM. The choices made depend on the client application's needs, and Cassandra provides maximum flexibility for application design.
Linearizable consistency
In ACID terms, linearizable consistency (or serial consistency) is a serial (immediate) isolation level for lightweight transactions. Cassandra does not use locking or transactional dependencies when concurrently updating multiple rows or tables. However, sometimes operations must be performed in sequence and not interrupted by other operations. For example, a typical use case is the creation of user accounts, where a duplication or overwrite will have serious consequences. Linearizable consistency is not required for all aspects of the user's account, but the unique identifier like the userID or email address that claims the account is treated differently. Such a serial operation is implemented in Cassandra with the Paxos consensus protocol, which uses a quorum-based algorithm. Lightweight transactions can be implemented without the need for a master database or two-phase commit process.
Lightweight transaction write operations use the serial consistency level for Paxos consensus and the regular consistency level for the write to the table. For more information, see Lightweight Transactions.
Calculating consistency
R + W > N
where - R is the consistency level of read operations
- W is the consistency level of write operations
- N is the number of replicas
R + W =< N
where - R is the consistency level of read operations
- W is the consistency level of write operations
- N is the number of replicas
Additional consistency examples:
- You do a write at ONE, the replica crashes one second later. The other messages are not delivered. The data is lost.
- You do a write at ONE, and the operation times out. Future reads can return the old or the new value. You will not know the data is incorrect.
- You do a write at ONE, and one of the other replicas is down. The node comes back online. The application will get old data from that node until the node gets the correct data or a read repair occurs.
- You do a write at QUORUM, and then a read at QUORUM. One of the replicas dies. You will always get the correct data.