How is data updated?
The DataStax Enterprise (DSE) database treats each new row as an upsert: if the new row has the same primary key as that of an existing row, the database processes it as an update to the existing row.
During a write, DataStax Enterprise adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible that many versions of the same row may exist in the database.
Periodically, the rows stored in memory are streamed to disk into structures called SSTables. At certain intervals, the database compacts smaller SSTables into larger SSTables. If the database encounters two or more versions of the same row during this process, it only writes the most recent version to the new SSTable. After compaction, the database drops the original SSTables, deleting the outdated rows.
Most DSE installations store replicas of each row on two or more nodes. Each node performs compaction independently. This means that out-of-date versions of a row have been dropped from one node, they may still exist on another node.
This is why the database performs another round of comparisons during a read process. When a client requests data with a particular primary key, DataStax Enterprise retrieves many versions of the row from one or more replicas. The version with the most recent timestamp is the only one returned to the client ("last-write-wins").
Some database operations may only write partial updates of a row, so some versions of a row may include some columns, but not all. During a compaction or write, the database assembles a complete version of each row from the partial updates, using the most recent version of each column. |