What are tombstones?
Explains how tombstones are created and the impact on reads.
- Using a CQL DELETE statement
- Expiring data with time-to-live (TTL)
- Using internal operations, such as Using materialized views
- INSERT or UPDATE operations with a
null
value - UPDATE operations with a collection column
When a tombstone is created, it can be marked on different parts of a partition. Based on the location of the marker, tombstones can be categorized into one of the following groups. Each category typically corresponds to one unique type of data deletion operation.
- Partition tombstones
- Row tombstones
- Range tombstones
- ComplexColumn tombstones
- Cell tombstones
- TTL tombstones
The tombstones go through the write path, and are written to SSTables on one or more nodes. A key differentiator of a tombstone is a built-in expiration known as the grace period, set by gc_grace_seconds. At the end of its expiration period, the tombstone is deleted as part of the normal compaction process.
Having an excessive number of tombstones in a table can negatively impact application performance. Many tombstones often indicate potential issues with either the data model or in the application.
Create the keyspace and tables
In the following examples, the cycling
keyspace is used to illustrate
different tombstone categories. Two tables are used:
rank_by_year_and_cycling_name
and
cyclist_career_teams
.
cqlsh
and CQL
commands, using two different terminals is recommended. Before getting started, copy the following commands into a cqlsh
prompt to
create the cycling
keyspace, create both tables, and insert data into the
rank_by_year_and_cycling_name
table.
You insert data later into the cyclist_career_teams
table in Cell tombstones and TTL tombstones.
CREATE KEYSPACE cycling WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE cycling.rank_by_year_and_name (
race_year int,
race_name text,
rank int,
cyclist_name text,
PRIMARY KEY ((race_year, race_name), rank)
) WITH CLUSTERING ORDER BY (rank ASC);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, 'Tour of Japan - Stage 4 - Minami > Shinshu', 'Benjamin PRADES', 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, 'Tour of Japan - Stage 4 - Minami > Shinshu', 'Adam PHELAN', 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, 'Tour of Japan - Stage 4 - Minami > Shinshu', 'Thomas LEBAS', 3);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, 'Giro d''Italia - Stage 11 - Forli > Imola', 'Ilnur ZAKARIN', 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, 'Giro d''Italia - Stage 11 - Forli > Imola', 'Carlos BETANCUR', 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, '4th Tour of Beijing', 'Phillippe GILBERT', 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, '4th Tour of Beijing', 'Daniel MARTIN', 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, '4th Tour of Beijing', 'Johan Esteban CHAVES', 3);
CREATE TABLE cycling.cyclist_career_teams (
id UUID PRIMARY KEY,
lastname text,
teams set<text>
);
Flushing to SSTables
After each modification to a table, run the nodetool flush
command on the
cycling
keyspace to flush data from the memtable to the SSTables on disk.
This step is necessary before running sstabledump
to view the output.
nodetool flush cycling;
After flushing the cycling
keyspace, run the sstabledump
command on the SSTable, as shown in the following example.
cd /var/lib/cassandra/data/cycling/rank_by_year_and_name-bc05fba12baf11e8b4a8ad2b042f3e18
sstabledump mc-2-big-Data.db
Partition tombstones
Partition tombstones are generated when an entire partition is deleted explicitly. In the CQL DELETE statement, the WHERE clause is an equality condition against the partition key.
DELETE from cycling.rank_by_year_and_name WHERE
race_year = 2014 AND race_name = '4th Tour of Beijing';
Looking at the sstabledump output for this partition, the deletion_info
tombstone marker is at the partition level, and is not associated with any rows or cells
within the partition.
{
"partition" : {
"key" : [ "2014", "4th Tour of Beijing" ],
"position" : 0,
"deletion_info" : { "marked_deleted" : "2018-05-16T19:40:06.454282Z", "local_delete_time" : "2018-05-16T19:40:06Z" }
},
"rows" : [ ]
}
Row tombstones
Row tombstones are generated when a particular row within a partition is deleted explicitly. The schema has a composite primary key that includes both the partition key and the clustering key. In the CQL DELETE statement, the WHERE clause is an equality condition against both the partition key and the clustering key columns.
DELETE from cycling.rank_by_year_and_name WHERE
race_year = 2015 AND race_name = 'Giro d''Italia - Stage 11 - Forli > Imola' AND rank = 2;
Looking at the sstabledump output for this partition, the deletion_info
tombstone marker is at the row level, and is identified by a clustering key under the
partition. Neither the partition nor the row cells contain the tombstone marker.
{
"partition" : {
"key" : [ "2015", "Giro d'Italia - Stage 11 - Forli > Imola" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 74,
"clustering" : [ 2 ],
"deletion_info" : { "marked_deleted" : "2018-05-18T15:29:06.227148Z", "local_delete_time" : "2018-05-18T15:29:06Z" },
"cells" : [ ]
}
]
}
Range tombstones
Range tombstones occur when several rows within a partition that can be expressed through a range search are deleted explicitly. The schema has a composite primary key that includes both a partition key and a clustering key. In the CQL DELETE statement, the WHERE clause is an equality condition against the partition key, plus an inequality condition against the clustering key.
rank_by_year_and_name
table and then recreate it to populate the table
with the necessary data.DELETE from cycling.rank_by_year_and_name WHERE
race_year = 2015 AND race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu' AND rank > 1;
Looking at the sstabledump output for this partition, the deletion_info
tombstone marker is at the row level. A special boundary marker,
range_tombstone_bound
, marks the range scope (identified by the
clustering key values) of the deleted rows.
{
"partition" : {
"key" : [ "2015", "Tour of Japan - Stage 4 - Minami > Shinshu" ],
"position" : 252
},
"rows" : [
{
"type" : "range_tombstone_bound",
"start" : {
"type" : "inclusive",
"deletion_info" : { "marked_deleted" : "2018-05-18T16:09:21.474713Z", "local_delete_time" : "2018-05-18T16:09:21Z" }
}
},
{
"type" : "range_tombstone_bound",
"end" : {
"type" : "exclusive",
"clustering" : [ 1 ],
"deletion_info" : { "marked_deleted" : "2018-05-18T16:09:21.474713Z", "local_delete_time" : "2018-05-18T16:09:21Z" }
}
}
]
}
ComplexColumn tombstones
ComplexColumn tombstones are generated when inserting or updating a collection type column, such as set, list, and map.
Previously we created the
cyclist_career_teams
table. Run the following cqlsh
command to insert data into that table.
INSERT INTO cycling.cyclist_career_teams (
id,
lastname,
teams)
VALUES (cb07baad-eac8-4f65-b28a-bddc06a0de23, 'ARMITSTEAD', {
'Boels-Dolmans Cycling Team','AA Drink - Leontien.nl','Team Garmin - Cervelo' } );
Looking at the sstabledump output for this partition, no explicit manual deletion occurs on
the partition, but a deletion_info
marker is listed at the cell level for
the collection type column teams
.
{
"partition" : {
"key" : [ "cb07baad-eac8-4f65-b28a-bddc06a0de23" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 130,
"liveness_info" : { "tstamp" : "2018-05-18T16:26:23.779724Z" },
"cells" : [
{ "name" : "lastname", "value" : "ARMITSTEAD" },
{ "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T16:26:23.779723Z", "local_delete_time" : "2018-05-18T16:26:23Z" } },
{ "name" : "teams", "path" : [ "AA Drink - Leontien.nl" ], "value" : "" },
{ "name" : "teams", "path" : [ "Boels-Dolmans Cycling Team" ], "value" : "" },
{ "name" : "teams", "path" : [ "Team Garmin - Cervelo" ], "value" : "" }
]
}
]
}
Cell tombstones
Cell tombstones are generated when explicitly deleting a value from a cell, such as a
column for a specific row of a partition, or when inserting or updating a cell with
null
values, as shown in the following example.
INSERT INTO cycling.rank_by_year_and_name (
race_year,
race_name,
cyclist_name,
rank)
VALUES (2018, 'Giro d''Italia - Stage 11 - Osimo > Imola', null, 1);
Looking at the "sstabledump" output for this partition, deletion_info
tombstone marker is associated with a particular cell.
{
"partition" : {
"key" : [ "2018", "Giro d'Italia - Stage 11 - Osimo > Imola" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 80,
"clustering" : [ 1 ],
"liveness_info" : { "tstamp" : "2018-05-18T17:13:42.602827Z" },
"cells" : [
{ "name" : "cyclist_name", "deletion_info" : { "local_delete_time" : "2018-05-18T17:13:42Z" }
}
]
}
]
}
TTL tombstones
TTL tombstones are generated when the TTL (time-to-live) period expires. The TTL expiration marker can occur at either the row or cell level. However, Cassandra marks TTL data differently from tombstone data that was explicitly deleted. Even if a partition has only a single row (with no clustering key), the TTL mark is still made at the row level.
The following statement sets TTL for an entire row.
INSERT INTO cycling.cyclist_career_teams (
id,
lastname,
teams)
VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN', {
'Rabobank-Liv Woman Cycling Team','Sengers Ladies Cycling Team','Team Flexpoint' }) USING TTL 1;
The following statement sets TTL for a single cell.
UPDATE cycling.rank_by_year_and_name USING TTL 1
SET cyclist_name = 'Cloudy Archipelago' WHERE race_year = 2018 AND
race_name = 'Giro d''Italia - Stage 11 - Osimo > Imola' AND rank = 1;
Looking at the sstabledump output for these partitions, the first CQL statement marks the
row (partition key: e7cd5752-bc0d-4157-a80f-7523add8dbcd) with an "expired" :
true
TTL expiration marker in the liveness_info
section.
{
"partition" : {
"key" : [ "e7cd5752-bc0d-4157-a80f-7523add8dbcd" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 134,
"liveness_info" : { "tstamp" : "2018-05-18T17:38:13.135226Z", "ttl" : 1, "expires_at" : "2018-05-18T17:38:14Z", "expired" : true },
"cells" : [
{ "name" : "lastname", "value" : "VAN DER BREGGEN" },
{ "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T17:38:13.135225Z", "local_delete_time" : "2018-05-18T17:38:13Z" } },
{ "name" : "teams", "path" : [ "Rabobank-Liv Woman Cycling Team" ], "value" : "" },
{ "name" : "teams", "path" : [ "Sengers Ladies Cycling Team" ], "value" : "" },
{ "name" : "teams", "path" : [ "Team Flexpoint" ], "value" : "" }
]
}
]
}
The second CQL statement marks the cell (partition key: 2018, clustering key: 1, column
name: cyclist_name) with an "expired" : true
TTL expiration marker for the
specific cell.
{
"partition" : {
"key" : [ "2018", "Giro d'Italia - Stage 11 - Osimo > Imola" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 95,
"clustering" : [ 1 ],
"cells" : [
{ "name" : "cyclist_name", "value" : "Cloudy Archipelago", "tstamp" : "2018-05-18T18:22:52.532855Z", "ttl" : 1, "expires_at" : "2018-05-18T18:22:53Z", "expired" : true }
]
}
]
}