Back up data
Medusa, a tool for backing up Apache Cassandra®, DataStax Enterprise (DSE), and Hyper-Converged Database (HCD) data in an Object Store, is included in the Mission Control package.
These backup instructions use as example a local MinIO (S3-compatible) bucket.
Topics covered:
Supported object storage types for backups
Mission Control uses remote object stores to retain backed up files and to maintain their durability. Provide an object store as a managed service of a cloud provider or as a component within a private cloud. The following providers are known to work with Mission Control for backup and restore services.
Provider |
Backend |
|---|---|
|
|
S3 Compatible (MinIO, OpenStack Swift, etc.) |
|
|
|
|
|
|
Prerequisites
-
A running Mission Control environment.
-
A configured storage backend or bucket with the appropriate, supported provider.
-
Configuring the backup uses the
MissionControlClusterdefinition that matches your release version. You must create a secret with the appropriate provider configuration in the target namespace before you create theMissionControlClusterobject. For examples, see Create a secret.When the secret is not present before the cluster is created, nodes get stuck waiting for the referenced secret to exist. Upon creation of the secret the nodes are scheduled and move to bootstrapping.
Create a secret
Based on your provider (S3-compatible MinIO, AWS S3, GCS, or Azure), choose one of the following example formats for secret creation:
-
S3-compatible (Minio, OpenStack Swift, etc.)
-
AWS S3
-
Google Cloud Storage (GCS)
-
Azure (blob)
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
[default]
aws_access_key_id = minio_username
aws_secret_access_key = minio_password
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
[default]
aws_access_key_id = XXXXXX
aws_secret_access_key = XXXXXXX
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
{
"type": "service_account",
"project_id": "gcp-project-name",
"private_key_id": "796056f9e1d65abc3defedb60c881496f22836",
"private_key": "-----BEGIN PRIVATE KEY-----\n
...
insert extensive alphanumeric string
...
\n-----END PRIVATE KEY-----\n",
"client_email": "medusa@gcp-project.iam.gserviceaccount.com",
"client_id": "xxxxxxxxxxxxxxxxxxxxxxx",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/medusa%40gcp-project.iam.gserviceaccount.com"
}
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
{
"storage_account": "medusa-storage-account-name",
"key": "the key's long alphanumeric string"
}
|
In the |
Enable backup support on a cluster
Deploy the backup component Medusa on all Mission Control datacenters in a cluster by adding and defining the spec.medusa section in the MissionControlCluster manifest that matches your release version.
Based on your provider (S3-compatible, AWS S3, GCS, or Azure), choose one of the following spec.medusa example formats within the MissionControlCluster manifest:
-
S3-compatible
-
AWS S3
-
GCS
-
Azure (blob)
apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
name: demo
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: s3_compatible
bucketName: medusa-backups
storageSecretRef:
name: medusa-bucket-key
host: minio-service.minio.svc.cluster.local
port: 9000
secure: false
cassandra:
serverVersion: 4.0.11
…
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: s3
bucketName: medusa-backups
region: us-west-2
storageSecretRef:
name: medusa-bucket-key
cassandra:
serverVersion: 4.0.11
…
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: google_storage
bucketName: medusa-backups
storageSecretRef:
name: medusa-bucket-key
cassandra:
serverVersion: 4.0.11
…
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: azure_blobs
bucketName: medusa-backups
storageSecretRef:
name: medusa-bucket-key
cassandra:
serverVersion: 4.0.11
…
|
Useful backup
|
Configure backup storage
Complete these steps if you need to configure a storage bucket for your cloud provider. A storage bucket is required before you can run a data backup.
Choose UI or CLI steps.
-
UI
-
CLI
-
In the main navigation, click Settings, and then click Backup Configuration.
Create, modify, or delete a backup storage configuration by choosing the appropriate step:
-
To set up a storage bucket for the backup, click Create a new configuration.
-
Enter the required field information and any other values that are pertinent to your environment, and click Create Backup Configuration.
-
-
To modify a storage bucket configuration, find its name in the list of buckets.
-
Click More Options on the row of your target bucket.
-
Click Modify Configuration.
-
In the Modify Backup Configuration dialog, enter the required fields and any other values that are pertinent to your environment.
-
Click Modify Backup Configuration.
-
-
To delete a backup configuration, find the row with its name in the list of buckets.
-
Click More Options on the row of your target bucket.
-
Click Delete Configuration.
-
-
-
Deploy the backup storage configuration by adding or modifying the definitions in the
spec.storagePropertiessection in thebackupMedusaConfiguration CRD that matches your release version. Make the changes in the same namespace and data plane where either theMissionControlClusterorcassandraDatacenterresource resides.spec: k8ssandra: medusa: storageProperties: storageProvider: azure_blobs bucketName: medusa-backups storageSecretRef: name: medusa-bucket-key region: default secure: "False" concurrentTransfers: 1 host: "minio-service.minio.svc.cluster.local" port: 9000 maxBackupAge: 0 maxBackupCount: 0 cassandra: serverVersion: 4.0.11 …
The following are useful backup spec.k8ssandra.medusa.storageProperties fields:
-
storageProvider: The storage provider: google_storage, azure_blob, s3, s3_compatible, or s3_rgw. -
bucketName: The name of the storage bucket to use for backups -
region: The region of the storage bucket. Used for AWS S3. -
secure: Indicates whether to use SSL to connect to the storage backend. Used for S3-compatible. -
concurrentTransfers: The number of concurrent uploads -
host: Host to connect to for the storage backend. Omitted for GCS, S3, Azure, and local. -
port: Port to connect to the storage backend. Omitted for GCS, S3, Azure, and local. -
maxBackupAge: Maximum backup age that the purge process must observe -
maxBackupCount: Maximum number of backups to keep. Used by the purge process. Default is unlimited.
Create an immediate backup
Run a backup of a Cassandra or DSE datacenter (DC) that uses a MedusaBackupJob custom resource in the same namespace and data plane where either the MissionControlCluster or cassandraDatacenter resource resides.
|
Mission Control runs a synthetic full backup, and names it according to how it is created. This backup type combines, at a point in time, the last full backup in storage with all data that is different since that last full backup. Mission Control copies this combined backup to the backup datastore. The use of immutable data files allow for this optimization. |
Choose UI or CLI steps to immediately back up a datacenter in a cluster.
|
To schedule a backup activity of a datacenter in the cluster, see Create a backup schedule. |
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
-
Click Create Backup.
-
Change the backup Type to Run Now.
-
Choose the target datacenter.
-
Click Create Backup
To view notifications from the backup activity, see Monitor backup status.
-
Create a
MedusaBackupJobCustom Resource (CR) for your release in the same namespace and data plane where either theMissionControlClusterorcassandraDatacenterresource resides. This example usescassandraDatacenter: dc1.apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackupJob metadata: name: medusa-backup1 spec: cassandraDatacenter: dc1Mission Control detects the
MedusaBackupJobobject creation and triggers a backup asynchronously.
To view notifications from the backup activity, see Monitor backup status.
Monitor backup status
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
In the Backup your cluster data pane, immediate backup jobs are tracked as well as any future scheduled backups.
In the Backup Activity section, review notifications of each datacenter’s immediate backup status.
In the Scheduled Backups section, review a datacenter’s scheduled backup details.
-
Check if the
finishTimestatus is set in theMedusaBackupJobobject.kubectl get medusabackupjob/medusa-backup1 -o yamlResult
kind: MedusaBackupJob metadata: name: medusa-backup1 spec: cassandraDatacenter: demo-dc1 status: ... ... finishTime: "2022-01-06T16:34:35Z" finished: - demo-dc1-default-sts-0 - demo-dc1-default-sts-1 - demo-dc1-default-sts-2 startTime: "2022-01-06T16:34:30Z" -
Review the start and finish times in the results from this
kubectl getcommand:kubectl get MedusaBackupJob -AResult
NAME STARTED FINISHED backup1 25m 24m medusa-backup1 19m 19mAll nodes with completed backups list times in the
FINISHEDcolumn. At the end of the backup operation, aMedusaBackupcustom resource (CR) is created with the same name as theMedusaBackupJobobject. The CR materializes the backup locally on the Kubernetes cluster. TheMedusaBackupobject status contains the total number of nodes in the cluster at the time of the backup, the number of nodes that successfully achieved the backup, and the topology of the datacenter at the time of the backup:apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackup metadata: name: backup1 status: startTime: '2023-09-13T12:15:57Z' finishTime: '2023-09-13T12:16:12Z' totalNodes: 2 finishedNodes: 2 nodes: - datacenter: dc1 host: firstcluster-dc1-default-sts-0 rack: default tokens: - -110555885826893 - -1149279817337332700 - -1222258121654772000 - -127355705089199870 - datacenter: dc1 host: firstcluster-dc1-default-sts-1 rack: default tokens: - -1032268962284829800 - -1054373523049285200 - -1058110708807841300 - -107256661843445790 status: SUCCESS spec: backupType: differential cassandraDatacenter: demo-dc1 -
Review the resulting subset of the CR information for
MedusaBackupobjects:kubectl get MedusaBackup -AResult
NAME STARTED FINISHED NODES COMPLETED STATUS backup1 29m 28m 2 2 SUCCESS medusa-backup1 23m 23m 2 2 SUCCESS
For a restore to be possible, a MedusaBackup object must exist.
Create a backup schedule
Run a backup of a Cassandra, DSE, or HCD datacenter (DC) that uses a MedusaBackupJob Custom Resource in the same namespace and data plane where either the MissionControlCluster or cassandraDatacenter resource resides.
This section describes scheduling a future backup activity.
|
To schedule an immediate backup of a datacenter in the cluster, see Create an immediate backup. |
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
-
Click Create Scheduled Backup.
-
To schedule a future backup of the datacenter you choose, keep Schedule as the default backup Type.
-
Fill out the Cron expression.
See cron expression tips.
-
Choose the target datacenter.
-
Click Create Schedule.
To view notifications from the backup activity, see Monitor backup status.
Choose the backup MedusaBackupSchedule Custom Resource Definition (CRD) that matches your installation version. Mission Control uses your modified custom resource file with its cronSchedule expression to manage backup schedules:
-
Create and modify a
MedusaBackupScheduleCustom Resource Definition (CRD) file.apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackupSchedule metadata: name: medusa-backup-schedule namespace: demo-dc1 spec: backupSpec: backupType: differential cassandraDatacenter: demo-dc1 cronSchedule: 30 1 * * * disabled: falseThis resource must be created in the same data plane and namespace where the
cassandraDatacenterresource resides. This example usescassandraDatacenter: demo-dc1. The example definition defines a backup ofdemo-dc1every day at 1:30 AM.NOTE: When specifying
spec.cronSchedule, the* * * * *definitions from left to right indicate minute, hour, day of the month, month, day of the week. See cronSchedule specifications. -
The status of the backup schedule is updated with the last execution and next execution times.
-
Review the execution times for the
MedusaBackupScheduleobject:kubectl get MedusaBackupSchedule -AResult
... status: lastExecution: "2022-07-26T01:30:00Z" nextSchedule: "2022-07-27T01:30:00Z" ...
The MedusaBackupJob and MedusaBackup objects are created with the name of the MedusaBackupSchedule object as prefix and a timestamp as suffix.
For example: medusa-backup-schedule-1658626200.
References
-
MissionControlCluster object manifest - choose the one that matches your release
-
See the release-specific list of Custom Resource Definitions (CRD) backup files with their properties for:
-
MedusaBackupJob
-
MedusaBackup
-
MedusaBackupSchedule
-
See also
-
Medusa documentation reference to determine the correct file format to use for each supported storage backend