Backing up to Amazon S3
When adding an Amazon S3 bucket location as an additional location for storing backup snapshots, the DataStax Agent sends the snapshot files to the S3 bucket automatically. All SSTables for a particular node and table are only stored once in Amazon S3 to optimize storage space.
The Backup Service requires control over the data and structure of its destination locations. The backup destinations must be dedicated for use only by OpsCenter. Any additional directories or files in those destinations can prevent the Backup Service from properly conducting a Backup or Restore operation.
The backup.json file contains metadata about which of the backed up SSTables are included in that backup.
The Backup Service switched to the AWS SDK as of OpsCenter version 6.1. With the current heap default, a maximum SSTable file size of 1 TB is supported when backing up to S3.
If OpsCenter encounters an error when backing up to S3, it retries the backup a configurable number of times (3 by default) unless it encounters an unrecoverable error such as invalid AWS credentials.
The AWS credentials and bucket names are stored in cluster_name.conf (with the exception of ad hoc backups). Be sure to use proper security precautions to ensure that this file is not readable by unauthorized users.
The location of the cluster_name.conf file depends upon the type of installation:
Package installations: /etc/opscenter/clusters/cluster_name.conf
Tarball installations: install_location/conf/clusters/cluster_name.conf
An S3 bucket destination must be unique and self-contained. Any defined destination cannot be contained within another backup destination.
For example, if you configure a backup location to
mybucket1, do not designate another backup location to mybucket1/myfolder1.
If you configure a
mybucket1/myfolder1 location, do not set up another location as
Folders are supported;
however, bucket paths cannot share any portion of a backup destination.
As a recommended best practice, limit an S3 bucket to a single keyspace for OpsCenter backups. Because every backup job gathers a list of all existing data files before the transfer to S3 can start, that process can take more time as the number of files grow in the bucket.
The backup files are stored in S3 in the following hierarchy.
mybucket/ snapshots/ node-id1/ sstables/ MyKeyspace-MyTable-ic-5-Data.db ... MyKeyspace-MyTable-ic-5-TOC.txt MyKeyspace-MyTable-ic-6-Data.db ... 1234-ABCD-2014-10-01-01-00/ backup.json MyKeyspace/schema.json 1234-ABCD-2014-09-30-01-00/ backup.json MyKeyspace/schema.json node-id2/ sstables/ MyKeyspace-MyTable-ic-1-Data.db ... MyKeyspace-MyTable-ic-2-Data.db ... 1234-ABCD-2014-10-01-01-00/ backup.json MyKeyspace/schema.json 1234-ABCD-2014-09-30-01-00/ backup.json MyKeyspace/schema.json commitlogs/ node1/ 1435432324_Commitlog-3-1432320421.log 1435433232_Commitlog-3-1432320422.log ...