Back up data

Medusa, a tool for backing up Apache Cassandra®, DataStax Enterprise (DSE), and Hyper-Converged Database (HCD) data in an Object Store, is included in the Mission Control package.

These backup instructions use as example a local MinIO (S3-compatible) bucket.

Topics covered:

Supported object storage types for backups

Mission Control uses remote object stores to retain backed up files and to maintain their durability. Provide an object store as a managed service of a cloud provider or as a component within a private cloud. The following providers are known to work with Mission Control for backup and restore services.

Backup and restore providers and backends

Provider

Backend

Amazon Simple Storage Service (S3)

s3

S3 Compatible (MinIO, OpenStack Swift, etc.)

s3_compatible

Ceph Object Gateway

s3_rgw

Azure Blob Storage

azure_blobs

Google Cloud Storage (GCS)

google_storage

Prerequisites

  • A running Mission Control environment.

  • A configured storage backend or bucket with the appropriate, supported provider.

  • Configuring the backup uses the MissionControlCluster definition that matches your release version. You should create a secret with the appropriate provider configuration in the target namespace before the MissionControlCluster object gets created. For examples, see Create a secret.

    When the secret is not present before the cluster is created, nodes get stuck waiting for the referenced secret to exist. Upon creation of the secret the nodes are scheduled and move to bootstrapping.

Create a secret

Based on your provider (S3-compatible MinIO, AWS S3, GCS, or Azure), choose one of the following example formats for secret creation:

  • S3-compatible (Minio, OpenStack Swift, etc.)

  • AWS S3

  • Google Cloud Storage (GCS)

  • Azure (blob)

apiVersion: v1
kind: Secret
metadata:
 name: medusa-bucket-key
type: Opaque
stringData:
 credentials: |-
   [default]
   aws_access_key_id = minio_username
   aws_secret_access_key = minio_password
apiVersion: v1
kind: Secret
metadata:
 name: medusa-bucket-key
type: Opaque
stringData:
 credentials: |-
   [default]
   aws_access_key_id = XXXXXX
   aws_secret_access_key = XXXXXXX
apiVersion: v1
kind: Secret
metadata:
 name: medusa-bucket-key
type: Opaque
stringData:
 credentials: |-
    {
      "type": "service_account",
      "project_id": "gcp-project-name",
      "private_key_id": "796056f9e1d65abc3defedb60c881496f22836",
      "private_key": "-----BEGIN PRIVATE KEY-----\n
      ...
       insert extensive alphanumeric string
      ...
      \n-----END PRIVATE KEY-----\n",
      "client_email": "medusa@gcp-project.iam.gserviceaccount.com",
      "client_id": "xxxxxxxxxxxxxxxxxxxxxxx",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/medusa%40gcp-project.iam.gserviceaccount.com"
    }
apiVersion: v1
kind: Secret
metadata:
 name: medusa-bucket-key
type: Opaque
stringData:
 credentials: |-
    {
        "storage_account": "medusa-storage-account-name",
        "key": "the key's long alphanumeric string"
    }

In the stringData.credentials section, do not change the format as shown in the example and provide the credential values that Medusa expects for your Cloud Storage Provider’s storage backend.

Enable backup support on a cluster

Deploy the backup component Medusa on all Mission Control datacenters in a cluster by adding and defining the spec.medusa section in the MissionControlCluster manifest that matches your release version. Based on your provider (S3-compatible, AWS S3, GCS, or Azure), choose one of the following spec.medusa example formats within the MissionControlCluster manifest:

  • S3-compatible

  • AWS S3

  • GCS

  • Azure (blob)

apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
  name: demo
spec:
  k8ssandra:
    medusa:
      storageProperties:
        storageProvider: s3_compatible
        bucketName: medusa-backups
        storageSecretRef:
          name: medusa-bucket-key
        host: minio-service.minio.svc.cluster.local
        port: 9000
        secure: false
    cassandra:
      serverVersion: 4.0.11
      …
spec:
  k8ssandra:
    medusa:
      storageProperties:
        storageProvider: s3
        bucketName: medusa-backups
        region: us-west-2
        storageSecretRef:
          name: medusa-bucket-key
    cassandra:
      serverVersion: 4.0.11
      …
spec:
  k8ssandra:
    medusa:
      storageProperties:
        storageProvider: google_storage
        bucketName: medusa-backups
        storageSecretRef:
          name: medusa-bucket-key
    cassandra:
      serverVersion: 4.0.11
      …
spec:
  k8ssandra:
    medusa:
      storageProperties:
        storageProvider: azure_blobs
        bucketName: medusa-backups
        storageSecretRef:
          name: medusa-bucket-key
    cassandra:
      serverVersion: 4.0.11
      …

Useful backup spec.k8ssandra.medusa.storageProperties fields:

  • storageProvider: The storage provider: google_storage, azure_blob, s3, s3_compatible, or s3_rgw.

  • bucketName: The name of the storage bucket

  • storageSecretRef.name: The name of the secret containing the credentials file to access the backup storage backend

  • host: Host to connect to for the storage backend. Omitted for GCS, S3, Azure and local.

  • port: Port to connect to for the storage backend. Omitted for GCS, S3, Azure and local.

  • region: Region of the storage bucket. Used for AWS S3.

  • secure: Indicates whether to use SSL to connect to the storage backend. Used for S3-compatible.

Configure backup storage

Complete these steps if you need to configure a storage bucket for your cloud provider. A storage bucket is required before you can run a data backup.

Choose User Interface (UI) or Command Line Interface (CLI) steps.

  • UI

  • CLI

  1. Access Mission Control’s UI.

  2. In the main navigation, click Settings, and then click Backup Configuration.

    Create, modify, or delete a backup storage configuration by choosing the appropriate step:

    • To set up a storage bucket for the backup, click Create a new configuration.

      • Enter the required field information and any other values that are pertinent to your environment, and click Create Backup Configuration.

    • To modify a storage bucket configuration, find its name in the list of buckets.

      • Click the overflow menu icon (3 dots) on the row of your target bucket.

      • Click Modify Configuration.

      • In the Modify Backup Configuration dialog, enter the required fields and any other values that are pertinent to your environment.

      • Click Modify Backup Configuration.

    • To delete a backup configuration, find the row with its name in the list of buckets.

      • Click the overflow menu icon (3 dots) on the row of your target bucket.

      • Click Delete Configuration.

  1. Access Mission Control’s CLI.

  2. Deploy the backup storage configuration by adding or modifying the definitions in the spec.storageProperties section in the backup MedusaConfiguration CRD that matches your release version. Make the changes in the same namespace and Data Plane where either the MissionControlCluster or cassandraDatacenter resource resides.

    spec:
      k8ssandra:
        medusa:
          storageProperties:
            storageProvider: azure_blobs
            bucketName: medusa-backups
            storageSecretRef:
              name: medusa-bucket-key
            region: default
            secure: "False"
            concurrentTransfers: 1
            host: "minio-service.minio.svc.cluster.local"
            port: 9000
            maxBackupAge: 0
            maxBackupCount: 0
        cassandra:
          serverVersion: 4.0.11
          …

NOTE: Useful backup spec.k8ssandra.medusa.storageProperties fields:

  • storageProvider: The storage provider: google_storage, azure_blob, s3, s3_compatible, or s3_rgw.

  • bucketName: The name of the storage bucket to use for backups

  • region: The region of the storage bucket. Used for AWS S3.

  • secure: Indicates whether to use SSL to connect to the storage backend. Used for S3-compatible.

  • concurrentTransfers: The number of concurrent uploads

  • host: Host to connect to for the storage backend. Omitted for GCS, S3, Azure, and local.

  • port: Port to connect to the storage backend. Omitted for GCS, S3, Azure, and local.

  • maxBackupAge: Maximum backup age that the purge process should observe

  • maxBackupCount: Maximum number of backups to keep. Used by the purge process. Default is unlimited.

Create an immediate backup

Run a backup of a Cassandra or DSE datacenter (DC) that uses a MedusaBackupJob custom resource in the same namespace and Data Plane where either the MissionControlCluster or cassandraDatacenter resource resides.

Mission Control runs a synthetic full backup, and names it according to how it is created. This backup type combines, at a point in time, the last full backup in storage with all data that is different since that last full backup. Mission Control copies this combined backup to the backup datastore. The use of immutable data files allow for this optimization.

Choose User Interface (UI) or Command Line Interface (CLI) steps to immediately back up a datacenter in a cluster.

To schedule a backup activity of a datacenter in the cluster, see Create a backup schedule.

  • UI

  • CLI

  1. Access Mission Control’s UI.

  2. In the Home Clusters dialog, click the target cluster namespace.

  3. Click the Backups tab.

  4. Click Create Backup.

  5. Change the backup Type to Run Now.

  6. Choose the target datacenter.

  7. Click Create Backup

To view notifications from the backup activity, see Monitor backup status.

  1. Access Mission Control’s CLI.

  2. Create a MedusaBackupJob Custom Resource (CR) for your release in the same namespace and Data Plane where either the MissionControlCluster or cassandraDatacenter resource resides. This example uses cassandraDatacenter: dc1.

    apiVersion: medusa.k8ssandra.io/v1alpha1
    kind: MedusaBackupJob
    metadata:
      name: medusa-backup1
    spec:
      cassandraDatacenter: dc1

    Mission Control detects the MedusaBackupJob object creation and triggers a backup asynchronously.

To view notifications from the backup activity, see Monitor backup status.

Monitor backup status

  • UI

  • CLI

  1. Access Mission Control’s UI.

  2. In the Home Clusters dialog, click the target cluster namespace.

  3. Click the Backups tab.

    In the Backup your cluster data pane, immediate backup jobs are tracked as well as any future scheduled backups.

In the Backup Activity section, review notifications of each datacenter’s immediate backup status.

In the Scheduled Backups section, review a datacenter’s scheduled backup details.

  1. Access Mission Control’s CLI.

  2. Check if the finishTime status is set in the MedusaBackupJob object.

    kubectl get medusabackupjob/medusa-backup1 -o yaml
    Sample results
    kind: MedusaBackupJob
    metadata:
        name: medusa-backup1
    spec:
      cassandraDatacenter: demo-dc1
    status:
      ...
      ...
      finishTime: "2022-01-06T16:34:35Z"
      finished:
      - demo-dc1-default-sts-0
      - demo-dc1-default-sts-1
      - demo-dc1-default-sts-2
      startTime: "2022-01-06T16:34:30Z"
  3. Review the start and finish times in the results from this kubectl get command:

    kubectl get MedusaBackupJob -A
    Sample results
    NAME             STARTED   FINISHED
    backup1          25m       24m
    medusa-backup1   19m       19m

    All nodes with completed backups list times in the FINISHED column. At the end of the backup operation, a MedusaBackup custom resource (CR) is created with the same name as the MedusaBackupJob object. The CR materializes the backup locally on the Kubernetes cluster. The MedusaBackup object status contains the total number of nodes in the cluster at the time of the backup, the number of nodes that successfully achieved the backup, and the topology of the datacenter at the time of the backup:

    apiVersion: medusa.k8ssandra.io/v1alpha1
    kind: MedusaBackup
    metadata:
      name: backup1
    status:
      startTime: '2023-09-13T12:15:57Z'
      finishTime: '2023-09-13T12:16:12Z'
      totalNodes: 2
      finishedNodes: 2
      nodes:
        - datacenter: dc1
          host: firstcluster-dc1-default-sts-0
          rack: default
          tokens:
            - -110555885826893
            - -1149279817337332700
            - -1222258121654772000
            - -127355705089199870
        - datacenter: dc1
          host: firstcluster-dc1-default-sts-1
          rack: default
          tokens:
            - -1032268962284829800
            - -1054373523049285200
            - -1058110708807841300
            - -107256661843445790
      status: SUCCESS
    spec:
      backupType: differential
      cassandraDatacenter: demo-dc1
  4. Review the resulting subset of the CR information for MedusaBackup objects:

    kubectl get MedusaBackup -A
    Sample results
    NAME             STARTED   FINISHED   NODES   COMPLETED   STATUS
    backup1          29m       28m        2       2           SUCCESS
    medusa-backup1   23m       23m        2       2           SUCCESS

NOTE: For a restore to be possible, a MedusaBackup object must exist.

Create a backup schedule

Run a backup of a Cassandra, DSE, or HCD datacenter (DC) that uses a MedusaBackupJob Custom Resource in the same namespace and Data Plane where either the MissionControlCluster or cassandraDatacenter resource resides. This section describes scheduling a future backup activity.

To schedule an immediate backup of a datacenter in the cluster, see Create an immediate backup.

  • UI

  • CLI

  1. Access Mission Control’s UI.

  2. In the Home Clusters dialog, click the target cluster namespace.

  3. Click the Backups tab.

  4. Click Create Scheduled Backup.

  5. To schedule a future backup of the datacenter you choose, keep Schedule as the default backup Type.

  6. Fill out the Cron expression.

  7. Choose the target datacenter.

  8. Click Create Schedule.

To view notifications from the backup activity, see Monitor backup status.

Choose the backup MedusaBackupSchedule Custom Resource Definition (CRD) that matches your installation version. Mission Control uses your modified custom resource file with its cronSchedule expression to manage backup schedules:

  1. Create and modify a MedusaBackupSchedule Custom Resource Definition (CRD) file.

    apiVersion: medusa.k8ssandra.io/v1alpha1
    kind: MedusaBackupSchedule
    metadata:
      name: medusa-backup-schedule
      namespace: demo-dc1
    spec:
      backupSpec:
        backupType: differential
        cassandraDatacenter: demo-dc1
      cronSchedule: 30 1 * * *
      disabled: false

    This resource must be created in the same Data Plane and namespace where the cassandraDatacenter resource resides. This example uses cassandraDatacenter: demo-dc1. The example definition defines a backup of demo-dc1 every day at 1:30 AM.

    NOTE: When specifying spec.cronSchedule, the * * * * * definitions from left to right indicate minute, hour, day of the month, month, day of the week. See cronSchedule specifications.

  2. The status of the backup schedule is updated with the last execution and next execution times.

  3. Review the execution times for the MedusaBackupSchedule object:

    kubectl get MedusaBackupSchedule -A
    Sample results
    ...
    status:
      lastExecution: "2022-07-26T01:30:00Z"
      nextSchedule: "2022-07-27T01:30:00Z"
    ...

The MedusaBackupJob and MedusaBackup objects are created with the name of the MedusaBackupSchedule object as prefix and a timestamp as suffix. For example: medusa-backup-schedule-1658626200.

References

See also

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com