How do I accomplish lightweight transactions with linearizable consistency?
Distributed databases present a unique challenge when data must be strictly read and written sequentially. In transactions for creating user accounts or transferring money, race conditions between two potential writes must be regulated to ensure that one write precedes the other. The DataStax Enterprise (DSE) database uses the Paxos consensus protocol to implement lightweight transactions that can handle concurrent operations.
The Paxos protocol is implemented in the database with linearizable consistency, which ensures transaction isolation at a level similar to the serializable level offered by relational database management systems (RDBMSs). This type of transaction is known as compare and set (CAS). Replica data is compared and any data out of date is set to the most consistent value. In DSE, the process combines the Paxos protocol with normal read and write operations to accomplish the CAS operation.
The Paxos protocol is implemented as a series of phases:
These phases are actions that take place between a proposer and acceptors. Any node can be a proposer, and multiple proposers can be operating at the same time. For simplicity, this description will use only one proposer.
A proposer prepares by sending a message to a quorum of acceptors that includes a proposal number. Each acceptor promises to accept the proposal if the proposal number is the highest they have received. After the proposer receives a promise from a quorum of acceptors, the value for the proposal is read from each acceptor and sent back to the proposer. The proposer determines which value to use and proposes the value to a quorum of the acceptors along with the proposal number. Each acceptor accepts the proposal with a certain number if the acceptor is not already promised to a proposal with a high number. The value is committed and acknowledged as a write operation if all conditions are met.
These four phases require four round trips between a node proposing a lightweight transaction and any cluster replicas involved in the transaction. Therefore, performance will be affected. Reserve lightweight transactions for situations where concurrency must be considered.
Lightweight transactions will block other lightweight transactions from occurring, but will not stop normal read and write operations from occurring. Lightweight transactions use a timestamping mechanism different from normal operations, so mixing lightweight transactions and normal operations can result in errors. If lightweight transactions are used to write to a row within a partition, only lightweight transactions for both read and write operations should be used. This caution applies to all operations, whether individual or batched.
For example, the following series of operations can fail:
DELETE ... INSERT .... IF NOT EXISTS SELECT ....
The following series of operations will work:
DELETE ... IF EXISTS INSERT .... IF NOT EXISTS SELECT .....