How is data deleted?
How DataStax Enterprise deletes data and why deleted data can reappear.
The processes for deleting data are designed to improve performance and work with the DataStax Enterprise (DSE) database built-in properties for data distribution and fault-tolerance.
Marking a record (row or column) with a time-to-live (TTL) value indicates that when the specified time ends, the database marks the record with a tombstone and handles it like other tombstoned records.
Deletion in a distributed system
In a multi-node cluster, the DSE database can store replicas of the same data on two or more nodes. This helps prevent data loss, but complicates the delete process. If a node receives a delete for data it stores locally, the node marks the specified record for deletion and tries to pass the tombstone to other nodes containing replicas of that record. If one replica node is unresponsive at that time, it does not receive the tombstone immediately, so it still contains the pre-delete version of the record.
To prevent the reappearance of zombies, the database gives each tombstone a grace period. The purpose of the grace period is to give unresponsive nodes time to recover and process tombstones normally. When multiple replica answers are part of a read request, and those responses differ, then whichever values are most recent take precedence. For example, if a node has a tombstone but another node has a more recent change, then the final result includes the more recent change.
If a node has a tombstone and another node has only an older value for the record, then the final record will have the tombstone. If a client writes a new update to the tombstone during the grace period, the database overwrites the tombstone.
When an unresponsive node recovers, DataStax Enterprise uses hinted handoffs to replay the database mutations that the node missed while it was down. DSE does not replay a mutation for a tombstone during its grace period. If the node does not recover until after the grace period ends, the deletion might be missed.
After the tombstone's grace period ends, DSE deletes the tombstone during compaction.
Expiring data
The grace period for a tombstone is set by the property. The default value is 864,000 seconds (ten days), and each table can have its own value for this property. On a single-node cluster, this property can safely be set to zero.
- The expiration date/time for a tombstone is the date/time of its creation plus the value of the gc_grace_seconds property.
- To completely prevent the reappearance of zombie records, run nodetool repair on a node after it recovers, and on each table every interval set by gc_grace_seconds.
If all records in a table are given a TTL at creation, are allowed to expire, and are not deleted manually, it is not necessary to run nodetool repair for that table on a regular basis. For more information expiring data with TTL, see Expiring data with TTL.
DSE also supports batch data insertion and updates. This procedure introduces the danger of replaying a record insertion after that record has been removed from the rest of the cluster. DSE does not replay a batched mutation for a tombstone that is still within its grace period.
DSE supports immediate deletion through the DROP KEYSPACE and statements.
Diagram legend
Icon | Description |
---|---|
Data on a node. | |
Data on an unavailable replica node. | |
Data removed from node. | |
Tombstone indicating that data has been deleted. | |
Tombstone removed from node. |