How is data updated?

A brief description of how the DataStax Distribution of Apache Cassandra 3.11 database updates data.

The DataStax Distribution of Apache Cassandra™ (DDAC) database treats each new row as an upsert: if the new row has the same primary key as that of an existing row, the database processes it as an update to the existing row.

During a write, Cassandra adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible that many versions of the same row may exist in the database.

Periodically, the rows stored in memory are streamed to disk into structures called SSTables. At certain intervals, the database compacts smaller SSTables into larger SSTables. If the database encounters two or more versions of the same row during this process, it only writes the most recent version to the new SSTable. After compaction, the database drops the original SSTables, deleting the outdated rows.

Most Cassandra installations store replicas of each row on two or more nodes. Each node performs compaction independently. This means that out-of-date versions of a row have been dropped from one node, they may still exist on another node.

This is why the database performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas. The version with the most recent timestamp is the only one returned to the client ("last-write-wins").

Note: Some database operations may only write partial updates of a row, so some versions of a row may include some columns, but not all. During a compaction or write, the database assembles a complete version of each row from the partial updates, using the most recent version of each column.