Backup Service overview

The Backup Service allows backing up and restoring DSE cluster data.

cluster_name.conf

The location of the cluster_name.conf file depends on the type of installation:
  • Package installations: /etc/opscenter/clusters/cluster_name.conf
  • Tarball installations: install_location/conf/clusters/cluster_name.conf

Use OpsCenter to schedule and manage backups, and restore from those backups, across all registered DataStax Enterprise clusters. The Backup Service:

  • Performs all functions using the REST API or visually through the OpsCenter UI
  • Delivers smart backups that always ensure full data protection, including backups of commit logs
  • Backs up data to a local server (On Server), Amazon S3, or a custom location on the local filesystem
  • Compresses backup files to save storage
  • Allows specifying retention policies on scheduled backups
  • Easily lets admins carry out full, table-level, or point-in-time restores for a cluster
  • Notifies operations staff should backup or restore operations fail
  • Supports cloning data between clusters (such as copy data from a production cluster to a development cluster) or from a defined other location (Amazon S3 or Local FS)
  • Provides detailed backup and restore reports and history

A backup is a snapshot of all on-disk data files (SSTable files) stored in the data directory. Backups are stored locally on each node (On Server), and you can specify additional locations such as a local filesystem or in cloud backup services like Amazon S3 where the snapshot data is copied.

Backups can be taken per datacenter, per keyspace, for selected multiple keyspaces, or for all keyspaces in the cluster while the system is online.
Note: Consider the following caveats when creating and restoring backups:
  • Restoring a snapshot that contains only the system keyspace is not allowed. There must be both system and non-system keyspaces, or only non-system keyspaces in the snapshot you want to restore.
  • Restoring a snapshot that does not contain a table definition is not allowed.
  • Restoring from a backup while Kerberos is enabled is not currently supported by OpsCenter.
  • Restoring a snapshot to a location with insufficient disk space fails. The Restore Report indicates which nodes do not have sufficient space and how much space is necessary for a successful restore. For more information and tips for preventative measures, see Monitoring sufficient disk space for restoring backups.
There must be enough free disk space on the node to accommodate making snapshots of data files. Configure the free disk space threshold to prevent backups from starting if there is insufficient disk space below a specified percentage. A single snapshot requires little disk space. However, snapshots cause disk usage to grow more quickly over time because a snapshot prevents obsolete data files from being deleted. Specify how long to retain the snapshot data by setting a retention policy for each backup location.
Note: OpsCenter data backups do not show or manage manual snapshots taken using the nodetool snapshot command.

If a cluster includes DSE Search or DSE Analytics nodes, a backup job that includes keyspaces with DSE Search data or Analytics nodes will save the Search and Analytics data. Any Solr indexes are recreated upon restore.

OpsCenter intelligently stores the backup data to prevent duplication of files. A backup first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. Unlike traditional backup systems that use full backups and then incremental backups with deltas based on the last full backup, the OpsCenter approach allows you to fully recreate the state of the database at the time of each backup without duplicating files. If you have configured an additional Local FS or S3 location, OpsCenter creates a manifest for each backup that contains a list of the SSTables in that backup, and only uploads new SSTable files.

You can schedule backups to run automatically on a recurring interval, or manually run one-off backups on a scheduled or ad hoc basis.

Backing up data using OpsCenter

The Backup Service provides a simple interface for scheduling regular or one-off backups of all or specific keyspaces in a DataStax Enterprise (DSE) cluster, and for recovering data from the stored backups.

Important: DataStax strongly recommends that organizations using DSE create a good backup and recovery plan using the Backup Service. Testing backup and restore operations on a non-production cluster is also recommended to ensure that the disaster recovery plan deployed for your organization works as intended.

The Backup Service was designed to manage enterprise-wide backup and restore operations for DSE clusters. While some administrators and operations staff believe that backups are not needed because of powerful and flexible replication capabilities in DSE, proper backup and restore procedures are still very important to implement for production clusters.

While replication does provide for copies of data to exist in multiple locations, datacenters, and cloud availability zones, all operations performed in a cluster are replicated, including operations that result in lost or incorrect data. For example, if a table is mistakenly dropped, if data is accidentally deleted, or if cluster data becomes corrupted, those adverse events will be replicated to all other copies of that data. In such cases, there is no way to recover the lost or uncorrupted data without a backup of the data. The Backup Service provides a simple interface for scheduling regular or one-off backups of all or specific keyspaces in a cluster, and for recovering data from the stored backups.

Commit log backups for point-in-time restores

In addition to keyspace backups, commit log backups are also available in the backup service to facilitate point-in-time restores for finer-grained control of the backup data. Point-in-time restores are available after enabling commit log backups in conjunction with keyspace backups. Similar to keyspace backups, the commit log archives are retained based on a configurable retention policy.

Note: Point-in-time restores are only supported if the cluster topology is unchanged since the point-in-time to which you want to restore a backup.

Backup retention policies

Each scheduled backup has a retention policy that defines how OpsCenter handles the files for older backup data. The default policy is to retain On Server backup files for 30 days. Amazon S3 and Local FS default retention policy is to Retain all. For each scheduled backup task and configured location, you can set a configurable time period for which to retain the snapshot data. OpsCenter supports minutes, hours, days, and weeks for the retention time period. For example, you can define a retention policy that removes snapshot data older than 30 days, or 26 weeks, or 3 hours. If you want to keep all backups, OpsCenter has a Retain All policy that retains the backup files indefinitely.

When a backup that was configured with a time-limited retention policy completes, OpsCenter scans the snapshot data for outdated files that do not belong to other snapshots and removes them at the next scheduled backup.

For example, a scheduled backup sends data to S3, runs weekly, and has a retention policy of removing backups older than 3 days. The layout in the S3 bucket is as follows:

mybucket/
  snapshots/
    node-id1/
      sstables/
        MyKeyspace-MyTable-ic-4-Data.db
        MyKeyspace-MyTable-ic-5-Data.db
        MyKeyspace-MyTable-ic-6-Data.db
        MyKeyspace-MyTable-ic-7-Data.db
        ...
      1234-ABCD-2015-01-25-01-00/
        backup.json #includes 4-Data and 5-Data
        MyKeyspace/schema.json
      1234-ABCD-2015-02-01-01-00/
        backup.json #includes 5,6,7-Data
        MyKeyspace/schema.json
   

After the February 1 backup completes, OpsCenter scans the SSTables for outdated files according to the retention policy. The January 25 backup files can be removed by OpsCenter. Because MyKeyspace-MyTable-ic-4-Data.db was in the January 25 backup but not in the February 1 backup, it will be removed. Even though MyKeyspace-MyTable-ic-5-Data.db was in the January 25 backup, it is also in the latest backup, so it will be retained until it meets its defined retention policy.

Backing up to Amazon S3

When adding an Amazon S3 bucket location as an additional location for storing backup snapshots, the DataStax Agent sends the snapshot files to the S3 bucket automatically. All SSTables for a particular node and table are only stored once in Amazon S3 to optimize storage space.

Important: The Backup Service requires control over the data and structure of its destination locations. The AWS S3 bucket and the Local file system destinations must be dedicated for use only by OpsCenter. Any additional directories or files in those destinations can prevent the Backup Service from properly conducting a Backup or Restore operation.

The backup files are stored in S3 in the following hierarchy:

  mybucket/
    snapshots/
      node-id1/
        sstables/
          MyKeyspace-MyTable-ic-5-Data.db
          ...
          MyKeyspace-MyTable-ic-5-TOC.txt
          MyKeyspace-MyTable-ic-6-Data.db
          ...
        1234-ABCD-2014-10-01-01-00/
          backup.json
          MyKeyspace/schema.json
        1234-ABCD-2014-09-30-01-00/
          backup.json
          MyKeyspace/schema.json
       node-id2/
         sstables/
           MyKeyspace-MyTable-ic-1-Data.db
           ...
           MyKeyspace-MyTable-ic-2-Data.db
           ...
         1234-ABCD-2014-10-01-01-00/
           backup.json
           MyKeyspace/schema.json
         1234-ABCD-2014-09-30-01-00/
           backup.json
           MyKeyspace/schema.json
   commitlogs/
     node1/
       1435432324_Commitlog-3-1432320421.log
       1435433232_Commitlog-3-1432320422.log
       ...
   
The backup.json file contains metadata about which of the backed up SSTables are included in that backup.
Note: The Backup Service switched to the AWS SDK as of OpsCenter version 6.1. With the current heap default, a maximum SSTable file size of 1 TB is supported when backing up to S3.
If OpsCenter encounters an error when backing up to S3, it retries the backup a configurable number of times (3 by default) unless it encounters an unrecoverable error such as invalid AWS credentials.
Warning: The AWS credentials and bucket names are stored in cluster_name.conf (with the exception of ad hoc backups). Be sure to use proper security precautions to ensure that this file is not readable by unauthorized users.