How do I accomplish lightweight transactions with linearizable consistency?
Distributed databases present a unique challenge when data must be strictly read and written sequentially. In transactions for creating user accounts or transferring money, race conditions between two potential writes must be regulated to ensure that one write precedes the other. The DataStax Enterprise (DSE) database uses the Paxos consensus protocol to implement lightweight transactions that can handle concurrent operations.
The database implements the Paxos protocol with linearizable consistency. This ensures transaction isolation at a level similar to the serializable level that relational database management systems (RDBMSs) offer. This type of transaction is known as compare and set (CAS). Replica data is compared and any out-of-date data is set to the most consistent value. In DSE, the process combines the Paxos protocol with normal read and write operations to accomplish the CAS operation.
The Paxos protocol is implemented as a series of phases:
-
Prepare/Promise
-
Read/Results
-
Propose/Accept
-
Commit/Acknowledge
These phases are actions that take place between a proposer and acceptors. Any node can be a proposer, and multiple proposers can be operating at the same time. For simplicity, this description uses only one proposer.
In the prepare phase, a proposer sends a message that includes a proposal number to all of the nodes responsible for storing a replica of the key. Each acceptor promises to accept the proposal if the proposal number is the highest they have received. After the proposer receives a promise from a quorum of acceptors, the value for the proposal is read from each acceptor and sent back to the proposer. The proposer determines which value to use and proposes the value to a quorum of the acceptors along with the proposal number. Each acceptor accepts the proposal with a certain number if the acceptor is not already promised to a proposal with a high number. The value is committed and acknowledged as a write operation if all conditions are met.
These four phases require four round trips between a node proposing a lightweight transaction and any cluster replicas involved in the transaction. Therefore, performance will be affected. Reserve lightweight transactions for situations where concurrency must be considered.
Lightweight transactions block other lightweight transactions from occurring, but do not stop normal read and write operations from occurring. Because lightweight transactions use a timestamping mechanism different from normal operations, mixing lightweight transactions and normal operations can result in errors. When using lightweight transactions to write to a row within a partition, you must also use them for both read and write operations. This caution applies to all operations, whether individual or batched.
For example, the following series of operations can fail:
DELETE ...
INSERT .... IF NOT EXISTS
SELECT ....
The following series of operations will work:
DELETE ... IF EXISTS
INSERT .... IF NOT EXISTS
SELECT .....
Reads with linearizable consistency
A SERIAL
consistency level allows reading the current (and possibly uncommitted) state of data without proposing a new addition or update.
If a SERIAL
read finds an uncommitted transaction in progress, then the database performs a read repair as part of the commit.