Backing up to Amazon S3

When adding an Amazon S3 bucket location as an additional location for storing backup snapshots, the DataStax Agent sends the snapshot files to the S3 bucket automatically. All SSTables for a particular node and table are only stored once in Amazon S3 to optimize storage space.

The Backup Service requires control over the data and structure of its destination locations. The backup destinations must be dedicated for use only by OpsCenter. Any additional directories or files in those destinations can prevent the Backup Service from properly conducting a Backup or Restore operation.

The backup.json file contains metadata about which of the backed up SSTables are included in that backup.

The Backup Service switched to the AWS SDK as of OpsCenter version 6.1. With the current heap default, a maximum SSTable file size of 1 TB is supported when backing up to S3.

If OpsCenter encounters an error when backing up to S3, it retries the backup a configurable number of times (3 by default) unless it encounters an unrecoverable error such as invalid AWS credentials.

The AWS credentials and bucket names are stored in cluster_name.conf (with the exception of ad hoc backups). Be sure to use proper security precautions to ensure that this file is not readable by unauthorized users.

The location of the cluster_name.conf file depends upon the type of installation:

  • Package installations: /etc/opscenter/clusters/cluster_name.conf

  • Tarball installations: install_location/conf/clusters/cluster_name.conf

Amazon S3 bucket requirements

An S3 bucket destination must be unique and self-contained. Any defined destination cannot be contained within another backup destination.

For example, if you configure a backup location to mybucket1, do not designate another backup location to mybucket1/myfolder1. If you configure a mybucket1/myfolder1 location, do not set up another location as mybucket1/myfolder1/mysubfolder1. Folders are supported; however, bucket paths cannot share any portion of a backup destination.

As a recommended best practice, limit an S3 bucket to a single keyspace for OpsCenter backups. Because every backup job gathers a list of all existing data files before the transfer to S3 can start, that process can take more time as the number of files grow in the bucket.

Amazon S3 backup hierarchy

The backup files are stored in S3 in the following hierarchy.

  mybucket/
    snapshots/
      node-id1/
        sstables/
          MyKeyspace-MyTable-ic-5-Data.db
          ...
          MyKeyspace-MyTable-ic-5-TOC.txt
          MyKeyspace-MyTable-ic-6-Data.db
          ...
        1234-ABCD-2014-10-01-01-00/
          backup.json
          MyKeyspace/schema.json
        1234-ABCD-2014-09-30-01-00/
          backup.json
          MyKeyspace/schema.json
       node-id2/
         sstables/
           MyKeyspace-MyTable-ic-1-Data.db
           ...
           MyKeyspace-MyTable-ic-2-Data.db
           ...
         1234-ABCD-2014-10-01-01-00/
           backup.json
           MyKeyspace/schema.json
         1234-ABCD-2014-09-30-01-00/
           backup.json
           MyKeyspace/schema.json
   commitlogs/
     node1/
       1435432324_Commitlog-3-1432320421.log
       1435433232_Commitlog-3-1432320422.log
       ...

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com