Backup Service

The Backup Service provides automatic scheduled backup, manual backup, and manual restore of DSE cluster data.

The OpsCenter Backup Service allows scheduling an automatic backup or running a manual backup of DSE cluster data. A backup is a snapshot of all on-disk data files (SSTable files) stored in the data directory. Use OpsCenter to schedule and manage backups, and restore from those backups, across all registered DataStax Enterprise (DSE) clusters.

OpsCenter intelligently stores backup data to prevent duplication of files. A backup first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. Unlike traditional backup systems that use full backups and then incremental backups with deltas based on the last full backup, OpsCenter allows you to fully recreate the state of the database at the time of each backup without duplicating files.

Backups are stored locally on each node (On Server). If you configured an additional backup location (such as Amazon S3), OpsCenter creates a manifest for each backup that contains a list of the SSTables in that backup, and only uploads new SSTable files.

You can schedule backups to run automatically on a recurring interval, or manually run one-off backups on a scheduled or ad hoc basis.

Backup considerations

Backups can be taken per datacenter, per keyspace, for multiple keyspaces, or for all keyspaces in the cluster while the system is online.

Important: The Backup Service requires control over the data and structure of its destination locations. The backup destinations must be dedicated for use only by OpsCenter. Any additional directories or files in those destinations can prevent the Backup Service from properly conducting a Backup or Restore operation.

There must be enough free disk space on the node to accommodate making snapshots of data files. A single snapshot requires little disk space. However, snapshots cause disk usage to grow more quickly over time because a snapshot prevents obsolete data files from being deleted.

Configure the free disk space threshold to prevent backups from starting if there is insufficient disk space below a specified percentage. Specify how long to retain the snapshot data by setting a retention policy for each backup location.
Note: OpsCenter data backups do not show or manage manual snapshots taken using the nodetool snapshot command.

If a cluster includes DSE Search or DSE Analytics nodes, a backup job that includes keyspaces with DSE Search data or DSE Analytics nodes will save the associated data. Any Solr indexes are recreated upon restore.

Note: Keep the following caveats in mind when creating and restoring backups:
  • Restoring a snapshot that contains only the system keyspace is not allowed. There must be both system and non-system keyspaces, or only non-system keyspaces in the snapshot you want to restore.
  • Restoring a snapshot that does not contain a table definition is not allowed.
  • Restoring a snapshot to a location with insufficient disk space fails. The Restore Report indicates which nodes do not have sufficient space and how much space is necessary for a successful restore. For more information and tips for preventative measures, see Monitoring sufficient disk space for restoring backups.
  • OpsCenter does not back up indexes. Therefore, DSE must recompute the indexes after a restore.

Restoring data

When restoring data, you can restore from a previous backup, to a specific point in time, or restore to a different cluster. Restoring to a different cluster is known as cloning, which supports different workflows.

In addition to keyspace backups, commit log backups are also available to facilitate point-in-time restores for finer-grained control of the backup data. Point-in-time restores are available after enabling commit log backups in conjunction with keyspace backups. Similar to keyspace backups, the commit log archives are retained based on a configurable retention policy.