Back up data
Medusa, a tool for backing up Apache Cassandra®, DataStax Enterprise (DSE), and Hyper-Converged Database (HCD) data in an Object Store, is included in the Mission Control package. Medusa integrates with the Management API to coordinate snapshot operations across cluster nodes during backup and restore processes.
Mission Control runs scheduled backups in the data planes even when the control plane is unavailable. Your data protection strategy remains intact during control plane outages. For more information, see Backup and repair behavior during outages.
Supported object storage types for backups
Mission Control uses remote object stores to retain backed up files and to maintain their durability. Provide an object store as a managed service of a cloud provider or as a component within a private cloud. The following providers are known to work with Mission Control for backup and restore services.
Provider |
Backend |
|---|---|
|
|
S3 Compatible (RedHat OpenShift Data Foundation Object Bucket Claim, SeaweedFS) |
|
|
|
|
|
|
Prerequisites
-
A running Mission Control environment.
-
A configured storage backend or bucket with the appropriate, supported provider.
-
Configuring the backup uses the
MissionControlClusterdefinition that matches your release version. You must create a secret with the appropriate provider configuration in the target namespace before you create theMissionControlClusterobject. For examples, see Create a secret.When the secret is not present before the cluster is created, nodes get stuck waiting for the referenced secret to exist. Upon creation of the secret the nodes are scheduled and move to bootstrapping.
Create a secret
Based on your provider (S3-compatible, AWS S3, GCS, or Azure), choose one of the following example formats for secret creation:
- S3-compatible (RedHat OpenShift Data Foundation or SeaweedFS)
-
apiVersion: v1 kind: Secret metadata: name: medusa-bucket-key type: Opaque stringData: credentials: |- [default] aws_access_key_id = S3_COMPATIBLE_USERNAME aws_secret_access_key = S3_COMPATIBLE_PASSWORDReplace the following:
-
S3_COMPATIBLE_USERNAME: The username for your S3-compatible storage -
S3_COMPATIBLE_PASSWORD: The password for your S3-compatible storage
-
- AWS S3
-
apiVersion: v1 kind: Secret metadata: name: medusa-bucket-key type: Opaque stringData: credentials: |- [default] aws_access_key_id = XXXXXX aws_secret_access_key = XXXXXXX - Google Cloud Storage (GCS)
-
apiVersion: v1 kind: Secret metadata: name: medusa-bucket-key type: Opaque stringData: credentials: |- { "type": "service_account", "project_id": "gcp-project-name", "private_key_id": "PRIVATE_KEY_ID", "private_key": "-----BEGIN PRIVATE KEY-----\n ... insert extensive alphanumeric string ... \n-----END PRIVATE KEY-----\n", "client_email": "medusa@gcp-project.iam.gserviceaccount.com", "client_id": "CLIENT_ID", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/medusa%40gcp-project.iam.gserviceaccount.com" } - Azure (blob)
-
apiVersion: v1 kind: Secret metadata: name: medusa-bucket-key type: Opaque stringData: credentials: |- { "storage_account": "medusa-storage-account-name", "key": "the key's long alphanumeric string" }In the
stringData.credentialssection, do not change the format as shown in the example and provide the credential values that Medusa expects for your Cloud Storage Provider’s storage backend.
Enable backup support on a cluster
Deploy the backup component Medusa on all Mission Control datacenters in a cluster by adding and defining the spec.medusa section in the MissionControlCluster manifest that matches your release version.
Based on your provider (S3-compatible, AWS S3, GCS, or Azure), choose one of the following spec.medusa example formats within the MissionControlCluster manifest:
- S3-compatible
-
apiVersion: missioncontrol.datastax.com/v1beta2 kind: MissionControlCluster metadata: name: demo spec: k8ssandra: medusa: storageProperties: storageProvider: s3_compatible bucketName: medusa-backups storageSecretRef: name: medusa-bucket-key host: seaweedfs-s3.seaweedfs.svc.cluster.local port: 8333 secure: false cassandra: serverVersion: 4.0.11 ... - AWS S3
-
spec: k8ssandra: medusa: storageProperties: storageProvider: s3 bucketName: medusa-backups region: us-west-2 storageSecretRef: name: medusa-bucket-key cassandra: serverVersion: 4.0.11 ... - GCS
-
spec: k8ssandra: medusa: storageProperties: storageProvider: google_storage bucketName: medusa-backups storageSecretRef: name: medusa-bucket-key cassandra: serverVersion: 4.0.11 ... - Azure (blob)
-
spec: k8ssandra: medusa: storageProperties: storageProvider: azure_blobs bucketName: medusa-backups storageSecretRef: name: medusa-bucket-key cassandra: serverVersion: 4.0.11 ...Useful backup
spec.k8ssandra.medusa.storagePropertiesfields:-
storageProvider: The storage provider: google_storage, azure_blob, s3, s3_compatible, or s3_rgw. -
bucketName: The name of the storage bucket -
storageSecretRef.name: The name of the secret containing the credentials file to connect to the backup storage backend -
host: Host to connect to for the storage backend. Omitted for GCS, S3, Azure and local. -
port: Port to connect to for the storage backend. Omitted for GCS, S3, Azure and local. -
region: Region of the storage bucket. Used for AWS S3. -
secure: Indicates whether to use SSL to connect to the storage backend. Used for S3-compatible.
-
Configure backup storage
Complete these steps if you need to configure a storage bucket for your cloud provider. A storage bucket is required before you can run a data backup.
You can use the Mission Control UI or CLI to configure backup storage:
- Use the UI
-
-
In the main navigation, click Settings, and then click Backup Configuration.
Create, modify, or delete a backup storage configuration by choosing the appropriate step:
-
To set up a storage bucket for the backup, click Create a new configuration.
-
Enter the required field information and any other values that are pertinent to your environment, and click Create Backup Configuration.
-
-
To modify a storage bucket configuration, find its name in the list of buckets.
-
Click More Options on the row of your target bucket.
-
Click Modify Configuration.
-
In the Modify Backup Configuration dialog, enter the required fields and any other values that are pertinent to your environment.
-
Click Modify Backup Configuration.
-
-
To delete a backup configuration, find the row with its name in the list of buckets.
-
Click More Options on the row of your target bucket.
-
Click Delete Configuration.
-
-
- Use the CLI
-
-
Deploy the backup storage configuration by adding or modifying the definitions in the
spec.storagePropertiessection in thebackupMedusaConfiguration CRD that matches your release version. Make the changes in the same namespace and data plane where either theMissionControlClusterorcassandraDatacenterresource resides.spec: k8ssandra: medusa: storageProperties: storageProvider: azure_blobs bucketName: medusa-backups storageSecretRef: name: medusa-bucket-key region: default secure: "False" concurrentTransfers: 1 host: "seaweedfs-s3.seaweedfs.svc.cluster.local" port: 8333 maxBackupAge: 0 maxBackupCount: 0 cassandra: serverVersion: 4.0.11 ...The following are useful backup
spec.k8ssandra.medusa.storagePropertiesfields:-
storageProvider: The storage provider: google_storage, azure_blob, s3, s3_compatible, or s3_rgw. -
bucketName: The name of the storage bucket to use for backups -
region: The region of the storage bucket. Used for AWS S3. -
secure: Indicates whether to use SSL to connect to the storage backend. Used for S3-compatible. -
concurrentTransfers: The number of concurrent uploads -
host: Host to connect to for the storage backend. Omitted for GCS, S3, Azure, and local. -
port: Port to connect to the storage backend. Omitted for GCS, S3, Azure, and local. -
maxBackupAge: Maximum backup age that the purge process must observe -
maxBackupCount: Maximum number of backups to keep. Used by the purge process. Default is unlimited.
-
Create an immediate backup
Run a backup of a Cassandra or DSE datacenter (DC) that uses a MedusaBackupJob CR in the same namespace and data plane where either the MissionControlCluster or cassandraDatacenter resource resides.
|
Mission Control runs a synthetic full backup, and names it according to how it is created. This backup type combines, at a point in time, the last full backup in storage with all data that is different since that last full backup. Mission Control copies this combined backup to the backup datastore. The use of immutable data files allow for this optimization. |
Choose UI or CLI steps to immediately back up a datacenter in a cluster.
|
To schedule a backup activity of a datacenter in the cluster, see Create a backup schedule. |
- Use the UI
-
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
-
Click Create Backup.
-
Change the backup Type to Run Now.
-
Choose the target datacenter.
-
Click Create Backup
To view notifications from the backup activity, see Monitor backup status.
- Use the CLI
-
-
Create a
MedusaBackupJobcustom resource (CR) for your release in the same namespace and data plane where either theMissionControlClusterorcassandraDatacenterresource resides. This example usescassandraDatacenter: dc1.apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackupJob metadata: name: medusa-backup1 spec: cassandraDatacenter: dc1Mission Control detects the
MedusaBackupJobobject creation and triggers a backup asynchronously.To view notifications from the backup activity, see Monitor backup status.
Monitor backup status
You can use the Mission Control UI or CLI to monitor backup status:
- Use the UI
-
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
In the Backup your cluster data pane, immediate backup jobs are tracked as well as any future scheduled backups.
In the Backup Activity section, review notifications of each datacenter’s immediate backup status.
In the Scheduled Backups section, review a datacenter’s scheduled backup details.
- Use the CLI
-
-
Check if the
finishTimestatus is set in theMedusaBackupJobobject.kubectl get medusabackupjob/medusa-backup1 -o yamlResultkind: MedusaBackupJob metadata: name: medusa-backup1 spec: cassandraDatacenter: demo-dc1 status: ... ... finishTime: "2022-01-06T16:34:35Z" finished: - demo-dc1-default-sts-0 - demo-dc1-default-sts-1 - demo-dc1-default-sts-2 startTime: "2022-01-06T16:34:30Z" -
Review the start and finish times in the results from this
kubectl getcommand:kubectl get MedusaBackupJob -AResultNAME STARTED FINISHED backup1 25m 24m medusa-backup1 19m 19mIf a node has a completed backup, the time is listed in the
FINISHEDcolumn. At the end of the backup operation, aMedusaBackupCR is created with the same name as theMedusaBackupJobobject. The CR materializes the backup locally on the Kubernetes cluster. TheMedusaBackupobject status contains the total number of nodes in the cluster at the time of the backup, the number of nodes that successfully achieved the backup, and the topology of the datacenter at the time of the backup:apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackup metadata: name: backup1 status: startTime: '2023-09-13T12:15:57Z' finishTime: '2023-09-13T12:16:12Z' totalNodes: 2 finishedNodes: 2 nodes: - datacenter: dc1 host: firstcluster-dc1-default-sts-0 rack: default tokens: - -110555885826893 - -1149279817337332700 - -1222258121654772000 - -127355705089199870 - datacenter: dc1 host: firstcluster-dc1-default-sts-1 rack: default tokens: - -1032268962284829800 - -1054373523049285200 - -1058110708807841300 - -107256661843445790 status: SUCCESS spec: backupType: differential cassandraDatacenter: demo-dc1 -
Review the resulting subset of the CR information for
MedusaBackupobjects:kubectl get MedusaBackup -AResultNAME STARTED FINISHED NODES COMPLETED STATUS backup1 29m 28m 2 2 SUCCESS medusa-backup1 23m 23m 2 2 SUCCESSFor a restore to be possible, a
MedusaBackupobject must exist.
Create a backup schedule
Run a backup of a Cassandra, DSE, or HCD datacenter (DC) that uses a MedusaBackupJob CR in the same namespace and data plane where either the MissionControlCluster or cassandraDatacenter resource resides.
This section describes scheduling a future backup activity.
|
To schedule an immediate backup of a datacenter in the cluster, see Create an immediate backup. |
- Use the UI
-
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
-
Click Create Scheduled Backup.
-
To schedule a future backup of the datacenter you choose, keep Schedule as the default backup Type.
-
Fill out the Cron expression.
See cron expression tips.
-
Choose the target datacenter.
-
Click Create Schedule.
To view notifications from the backup activity, see Monitor backup status.
- Use the CLI
-
Choose the backup
MedusaBackupSchedulecustom resource definition (CRD) that matches your installation version. Mission Control uses your modified CR file with its cronSchedule expression to manage backup schedules:-
Create and modify a
MedusaBackupScheduleCRD file.apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackupSchedule metadata: name: medusa-backup-schedule namespace: demo-dc1 spec: backupSpec: backupType: differential cassandraDatacenter: demo-dc1 cronSchedule: 30 1 * * * disabled: falseThis resource must be created in the same data plane and namespace where the
cassandraDatacenterresource resides. This example usescassandraDatacenter: demo-dc1. The example definition defines a backup ofdemo-dc1every day at 1:30 AM.When specifying
spec.cronSchedule, the* * * * *definitions from left to right indicate minute, hour, day of the month, month, day of the week. See cronSchedule specifications. -
The status of the backup schedule is updated with the last execution and next execution times.
-
Review the execution times for the
MedusaBackupScheduleobject:kubectl get MedusaBackupSchedule -AResult... status: lastExecution: "2022-07-26T01:30:00Z" nextSchedule: "2022-07-27T01:30:00Z" ...The
MedusaBackupJobandMedusaBackupobjects are created with the name of theMedusaBackupScheduleobject as prefix and a timestamp as suffix. For example:medusa-backup-schedule-1658626200.
-
References
-
MissionControlCluster object manifest - choose the one that matches your release
-
See the release-specific list of CRD backup files with their properties for:
-
MedusaBackupJob
-
MedusaBackup
-
MedusaBackupSchedule
-
See also
-
Medusa documentation reference to determine the correct file format to use for each supported storage backend