Backing up a cluster

OpsCenter provides a way to schedule and run backup operations on a cluster. OpsCenter allows organizations to run one-time backup jobs as well as schedule backup jobs to run at a later date and on a recurring basis. Commit log backups facilitate restoring backup data to a particular date and time.

Keep the following caveats in mind when creating and restoring backups:

  • Restoring a snapshot that contains only the system keyspace is not allowed. There must be both system and non-system keyspaces, or only non-system keyspaces in the snapshot you want to restore.

  • Restoring a snapshot that does not contain a table definition is not allowed.

  • Restoring a snapshot to a location with insufficient disk space fails. The Restore Report indicates which nodes do not have sufficient space and how much space is necessary for a successful restore. For more information and tips for preventative measures, see Monitoring sufficient disk space for restoring backups.

  • OpsCenter does not back up indexes. Therefore, DSE must recompute the indexes after a restore.

Scheduled backup retention policies

Each scheduled backup has a retention policy that defines how OpsCenter handles the files for older backup data. The default policy is to retain On Server backup files for 30 days. The default retention policy for all backups except On Server is to retain all backups.

For each scheduled backup task and configured location, configure a time period to retain the snapshot data. OpsCenter supports minutes, hours, days, and weeks for the retention time period. For example, you can define a retention policy that removes snapshot data older than 30 days, or 26 weeks, or 3 hours. If you want to keep all backups, OpsCenter has a Retain All policy that retains the backup files indefinitely.

When a backup that was configured with a time-limited retention policy completes, OpsCenter scans the snapshot data for outdated files that do not belong to other snapshots and removes them at the next scheduled backup.

If removing old backup data manually, the backup.json and schema.json files can be manually deleted if the timestamp is past the retention policy. However, SSTable files should not be manually deleted, because deleting these files could impact all snapshots.

For example, consider a scheduled backup that sends data to Amazon S3, runs weekly, and has a retention policy of removing backups older than three days. The layout in the Amazon S3 bucket is as follows:

mybucket/
  snapshots/
    node-id1/
      sstables/
        MyKeyspace-MyTable-ic-4-Data.db
        MyKeyspace-MyTable-ic-5-Data.db
        MyKeyspace-MyTable-ic-6-Data.db
        MyKeyspace-MyTable-ic-7-Data.db
        ...
      1234-ABCD-2018-01-25-01-00/
        backup.json #includes 4-Data and 5-Data
        MyKeyspace/schema.json
      1234-ABCD-2018-02-01-01-00/
        backup.json #includes 5,6,7-Data
        MyKeyspace/schema.json

After the February 1 backup completes, OpsCenter scans the SSTables for outdated files according to the retention policy, and removes the January 25 backup files. Because MyKeyspace-MyTable-ic-4-Data.db was in the January 25 backup but not in the February 1 backup, it is removed. Even though MyKeyspace-MyTable-ic-5-Data.db was in the January 25 backup, it is also in the latest backup, so it is retained until it meets its defined retention policy.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com