Restoring from a snapshot

Methods for restoring from a snapshot.

Restoring a keyspace from a snapshot requires all snapshot files for the table, and if using incremental backups, any incremental backup files created after the snapshot was taken. Streamed SSTables (from repair, decommission, and so on) are also hardlinked and included.

Note: Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the node being restored.

Restoring from local nodes

This method copies the SSTables from the snapshots directory into the correct data directories.

  1. Make sure the table schema exists and is the same as when the snapshot was created.

    The nodetool snapshot command creates a table schema in the output directory. If the table does not exist, recreate it using the schema.cql file.

  2. If necessary, the target table.
    Note: You may not need to truncate under certain conditions. For example, if a node lost a disk, you might restart before restoring so that the node continues to receive new writes before starting the restore procedure.

    Truncating is usually necessary. For example, if there was an accidental deletion of data, the tombstone from that delete has a later write timestamp than the data in the snapshot. If you restore without truncating (removing the tombstone), the database continues to shadow the restored data. This behavior also occurs for other types of overwrites and causes the same problem.

  3. Locate the most recent snapshot folder. For example:

    /var/lib/cassandra/data/keyspace_name/table_name-UUID/snapshots/snapshot_name

  4. Copy the most recent snapshot SSTable directory to the /var/lib/cassandra/data/keyspace/table_name-UUID directory.
  5. Run nodetool refresh.

Restoring from centralized backups

This method uses sstableloader to restore snapshots.

  1. Verify that the SSTable version is compatible with the current version of DSE:
    1. Locate the version in the file names.

      Use the version number and format in the SSTable file name to determine compatibility and upgrade requirements. The first two letters of the file name is the version, where the first letter indicates a major version and the second letter indicates a minor version.

      For example, the following SSTable version is aa and the format is bti:
      data/cycling/cyclist_expenses-e4f31e122bc511e8891b23da85222d3d/aa-1-bti-Data.db
    2. Using the correct DSE version of sstableupgrade, to create a compatible version.

      For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL, and SSTable compatibility.

  2. Make sure the table schema exists and is the same as when the snapshot was created.

    The nodetool snapshot command creates a table schema in the output directory. If the table does not exist, recreate it using the schema.cql file.

  3. If necessary, the target table.
    Note: You may not need to truncate under certain conditions. For example, if a node lost a disk, you might restart before restoring so that the node continues to receive new writes before starting the restore procedure.

    Truncating is usually necessary. For example, if there was an accidental deletion of data, the tombstone from that delete has a later write timestamp than the data in the snapshot. If you restore without truncating (removing the tombstone), the database continues to shadow the restored data. This behavior also occurs for other types of overwrites and causes the same problem.

  4. Restore the most recent snapshot using the sstableloader tool on the backed-up SSTables.

    The sstableloader streams the SSTables to the correct nodes. You do not need to remove the commitlogs or drain or restart the nodes.