Backing up a cluster

OpsCenter provides a way to schedule and run backup operations for a DSE cluster.

OpsCenter provides a way to schedule and run backup operations on a cluster. OpsCenter allows organizations to run one-time backup jobs as well as schedule backup jobs to run at a later date and on a recurring basis. Commit log backups facilitate restoring backup data to a particular date and time.

Note: Keep the following caveats in mind when creating and restoring backups:
  • Restoring a snapshot that contains only the system keyspace is not allowed. There must be both system and non-system keyspaces, or only non-system keyspaces in the snapshot you want to restore.
  • Restoring a snapshot that does not contain a table definition is not allowed.
  • Restoring a snapshot to a location with insufficient disk space fails. The Restore Report indicates which nodes do not have sufficient space and how much space is necessary for a successful restore. For more information and tips for preventative measures, see Monitoring sufficient disk space for restoring backups.
  • OpsCenter does not back up indexes. Therefore, DSE must recompute the indexes after a restore.

Scheduled backup retention policies

Each scheduled backup has a retention policy that defines how OpsCenter handles the files for older backup data. The default policy is to retain On Server backup files for 30 days. The default retention policy for all backups except On Server is to retain all backups.

For each scheduled backup task and configured location, configure a time period to retain the snapshot data. OpsCenter supports minutes, hours, days, and weeks for the retention time period. For example, you can define a retention policy that removes snapshot data older than 30 days, or 26 weeks, or 3 hours. If you want to keep all backups, OpsCenter has a Retain All policy that retains the backup files indefinitely.

When a backup that was configured with a time-limited retention policy completes, OpsCenter scans the snapshot data for outdated files that do not belong to other snapshots and removes them at the next scheduled backup.

For example, consider a scheduled backup that sends data to Amazon S3, runs weekly, and has a retention policy of removing backups older than three days. The layout in the Amazon S3 bucket is as follows:

mybucket/
  snapshots/
    node-id1/
      sstables/
        MyKeyspace-MyTable-ic-4-Data.db
        MyKeyspace-MyTable-ic-5-Data.db
        MyKeyspace-MyTable-ic-6-Data.db
        MyKeyspace-MyTable-ic-7-Data.db
        ...
      1234-ABCD-2018-01-25-01-00/
        backup.json #includes 4-Data and 5-Data
        MyKeyspace/schema.json
      1234-ABCD-2018-02-01-01-00/
        backup.json #includes 5,6,7-Data
        MyKeyspace/schema.json
   

After the February 1 backup completes, OpsCenter scans the SSTables for outdated files according to the retention policy, and removes the January 25 backup files. Because MyKeyspace-MyTable-ic-4-Data.db was in the January 25 backup but not in the February 1 backup, it will be removed. Even though MyKeyspace-MyTable-ic-5-Data.db was in the January 25 backup, it is also in the latest backup, so it will be retained until it meets its defined retention policy.