Restoring from a snapshot
Methods for restoring from a snapshot.
About this task
Restoring a keyspace from a snapshot requires all snapshot files for the table, and if using incremental backups, any incremental backup files created after the snapshot was taken. Streamed SSTables (from repair, decommission, and so on) are also hardlinked and included.
Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the node being restored. |
Restoring from local nodes
This method copies the SSTables from the snapshots directory into the correct data directories.
-
Make sure the table schema exists and is the same as when the snapshot was created.
The nodetool snapshot command creates a table schema in the output directory. If the table does not exist, recreate it using the schema.cql file.
-
If necessary, TRUNCATE the target table.
You may not need to truncate under certain conditions. For example, if a node lost a disk, you might restart before restoring so that the node continues to receive new writes before starting the restore procedure.
Truncating is usually necessary. For example, if there was an accidental deletion of data, the tombstone from that delete has a later write timestamp than the data in the snapshot. If you restore without truncating (removing the tombstone), the database continues to shadow the restored data. This behavior also occurs for other types of overwrites and causes the same problem.
-
Locate the most recent snapshot folder. For example:
/var/lib/cassandra/data/<keyspace_name>/<table_name>-<UUID>/snapshots/<snapshot_name>
A snapshot is a hardlink to the SSTable files in the data directory for a schema table at the moment the snapshot is executed. Data is backed up into multiple
.db
files and table schema is saved toschema.cql
. -
Run nodetool import. In this command, specify the data directory that contains the snapshot of the SSTable files.
Restoring from centralized backups
This method uses sstableloader to restore snapshots.
-
Verify that the SSTable version is compatible with the current version of DSE:
-
Locate the version in the file names.
Use the version number and format in the SSTable file name to determine compatibility and upgrade requirements. The first two letters of the file name is the version, where the first letter indicates a major version and the second letter indicates a minor version.
For example, the following SSTable version is <aa> and the format is
bti
:data/cycling/cyclist_expenses-e4f31e122bc511e8891b23da85222d3d/aa-1-bti-Data.db
-
Use the correct DSE version of
sstableupgrade
to create a compatible version.For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL, and SSTable compatibility.
-
-
Make sure the table schema exists and is the same as when the snapshot was created.
The nodetool snapshot command creates a table schema in the output directory. If the table does not exist, recreate it using the schema.cql file.
-
If necessary, TRUNCATE the target table.
You may not need to truncate under certain conditions. For example, if a node lost a disk, you might restart before restoring so that the node continues to receive new writes before starting the restore procedure.
Truncating is usually necessary. For example, if there was an accidental deletion of data, the tombstone from that delete has a later write timestamp than the data in the snapshot. If you restore without truncating (removing the tombstone), the database continues to shadow the restored data. This behavior also occurs for other types of overwrites and causes the same problem.
-
Restore the most recent snapshot using the sstableloader tool on the backed-up SSTables.
The
sstableloader
streams the SSTables to the correct nodes. You do not need to remove the commitlogs, or to drain or restart the nodes.