Bulk uploading S3 backups using the AWS CLI

Using the S3 CLI is a feature that must be enabled.

About this task

Use the AWS CLI instead of the AWS SDK when bulk loading backups to Amazon S3 locations. Using the AWS CLI rather than the AWS SDK can result in a performance increase, with a noticeable decrease in the time it takes to complete a backup.

This feature is available in OpsCenter versions 6.1.3 and later as an OpsCenter Labs feature. As of OpsCenter version 6.5 and later, the AWS CLI feature is officially a production feature.

For more information, see AWS CLI in the Amazon documentation.

When the AWS S3 CLI is enabled, the S3 throttling setting is ignored by OpsCenter during backups. See Tuning throttling for AWS CLI.

Prerequisites

  1. Install the AWS CLI package on every node. DataStax recommends using the Amazon bundled installer method and upgrading to the latest version of AWS CLI if it is already installed. See Install the AWS CLI using the bundled installer in the Amazon documentation for installation procedures.

    As a recommended best practice for OpsCenter, install the AWS CLI bundle using APT as follows:

    sudo apt-get install -y unzip
    curl 'https://s3.amazonaws.com/aws-cli/awscli-bundle.zip' -o awscli-bundle.zip
    unzip awscli-bundle.zip
    sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

    Regardless of the install procedure used, make sure that the AWS CLI package is installed in the PATH of the cassandra user, or whichever user the DataStax agent runs as.

  2. Add an S3 location for backups.

Procedure

  1. Locate the cluster_name.conf file. The location of this file depends on the type of installation:

    • Package installations: /etc/opscenter/clusters/cluster_name.conf

    • Tarball installations: install_location/conf/clusters/cluster_name.conf

  2. Open cluster_name.conf for editing. Substitute cluster_name with the name of your cluster. Setting agent options through the cluster configuration file sets the corresponding property in address.yaml on every node.

    To configure the setting for all clusters managed by an OpsCenter instance, open opscenterd.conf for editing.

    The location of this file depends on the type of installation:

    • Package installations: /etc/opscenter/opscenterd.conf

    • Tarball installations: install_location/conf/opscenterd.conf

    If necessitated by your environment, open address.yaml for editing and configuring at the node level. The location of this file depends on the type of installation:

    • Package installations: /var/lib/datastax-agent/conf/address.yaml

    • Tarball installations: install_location/conf/address.yaml

Do so for every node that requires a specific configuration override.

  1. Add the following configuration option:

    [backups]
    use_s3_cli = True
  2. Save the configuration file or files.

  3. Restart the OpsCenter daemon.

  4. If you made changes to address.yaml, restart the DataStax agents.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com