Back up data
Medusa, a tool for backing up Apache Cassandra®, DataStax Enterprise (DSE), and Hyper-Converged Database (HCD) data in an Object Store, is included in the Mission Control package.
These backup instructions use as example a local MinIO (S3-compatible) bucket.
Topics covered:
Supported object storage types for backups
Mission Control uses remote object stores to retain backed up files and to maintain their durability. Provide an object store as a managed service of a cloud provider or as a component within a private cloud. The following providers are known to work with Mission Control for backup and restore services.
Provider |
Backend |
---|---|
|
|
S3 Compatible (MinIO, OpenStack Swift, etc.) |
|
|
|
|
|
|
Prerequisites
-
A running Mission Control environment.
-
A configured storage backend or bucket with the appropriate, supported provider.
-
Configuring the backup uses the
MissionControlCluster
definition that matches your release version. You should create a secret with the appropriate provider configuration in the target namespace before theMissionControlCluster
object gets created. For examples, see Create a secret.When the secret is not present before the cluster is created, nodes get stuck waiting for the referenced secret to exist. Upon creation of the secret the nodes are scheduled and move to bootstrapping.
Create a secret
Based on your provider (S3-compatible MinIO, AWS S3, GCS, or Azure), choose one of the following example formats for secret creation:
-
S3-compatible (Minio, OpenStack Swift, etc.)
-
AWS S3
-
Google Cloud Storage (GCS)
-
Azure (blob)
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
[default]
aws_access_key_id = minio_username
aws_secret_access_key = minio_password
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
[default]
aws_access_key_id = XXXXXX
aws_secret_access_key = XXXXXXX
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
{
"type": "service_account",
"project_id": "gcp-project-name",
"private_key_id": "796056f9e1d65abc3defedb60c881496f22836",
"private_key": "-----BEGIN PRIVATE KEY-----\n
...
insert extensive alphanumeric string
...
\n-----END PRIVATE KEY-----\n",
"client_email": "medusa@gcp-project.iam.gserviceaccount.com",
"client_id": "xxxxxxxxxxxxxxxxxxxxxxx",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/medusa%40gcp-project.iam.gserviceaccount.com"
}
apiVersion: v1
kind: Secret
metadata:
name: medusa-bucket-key
type: Opaque
stringData:
credentials: |-
{
"storage_account": "medusa-storage-account-name",
"key": "the key's long alphanumeric string"
}
In the |
Enable backup support on a cluster
Deploy the backup component Medusa on all Mission Control datacenters in a cluster by adding and defining the spec.medusa
section in the MissionControlCluster
manifest that matches your release version.
Based on your provider (S3-compatible, AWS S3, GCS, or Azure), choose one of the following spec.medusa
example formats within the MissionControlCluster
manifest:
-
S3-compatible
-
AWS S3
-
GCS
-
Azure (blob)
apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
name: demo
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: s3_compatible
bucketName: medusa-backups
storageSecretRef:
name: medusa-bucket-key
host: minio-service.minio.svc.cluster.local
port: 9000
secure: false
cassandra:
serverVersion: 4.0.11
…
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: s3
bucketName: medusa-backups
region: us-west-2
storageSecretRef:
name: medusa-bucket-key
cassandra:
serverVersion: 4.0.11
…
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: google_storage
bucketName: medusa-backups
storageSecretRef:
name: medusa-bucket-key
cassandra:
serverVersion: 4.0.11
…
spec:
k8ssandra:
medusa:
storageProperties:
storageProvider: azure_blobs
bucketName: medusa-backups
storageSecretRef:
name: medusa-bucket-key
cassandra:
serverVersion: 4.0.11
…
Useful backup
|
Configure backup storage
Complete these steps if you need to configure a storage bucket for your cloud provider. A storage bucket is required before you can run a data backup.
Choose User Interface (UI) or Command Line Interface (CLI) steps.
-
UI
-
CLI
-
In the main navigation, click Settings, and then click Backup Configuration.
Create, modify, or delete a backup storage configuration by choosing the appropriate step:
-
To set up a storage bucket for the backup, click Create a new configuration.
-
Enter the required field information and any other values that are pertinent to your environment, and click Create Backup Configuration.
-
-
To modify a storage bucket configuration, find its name in the list of buckets.
-
Click the overflow menu icon (3 dots) on the row of your target bucket.
-
Click Modify Configuration.
-
In the Modify Backup Configuration dialog, enter the required fields and any other values that are pertinent to your environment.
-
Click Modify Backup Configuration.
-
-
To delete a backup configuration, find the row with its name in the list of buckets.
-
Click the overflow menu icon (3 dots) on the row of your target bucket.
-
Click Delete Configuration.
-
-
-
Deploy the backup storage configuration by adding or modifying the definitions in the
spec.storageProperties
section in thebackup
MedusaConfiguration CRD that matches your release version. Make the changes in the same namespace andData Plane
where either theMissionControlCluster
orcassandraDatacenter
resource resides.spec: k8ssandra: medusa: storageProperties: storageProvider: azure_blobs bucketName: medusa-backups storageSecretRef: name: medusa-bucket-key region: default secure: "False" concurrentTransfers: 1 host: "minio-service.minio.svc.cluster.local" port: 9000 maxBackupAge: 0 maxBackupCount: 0 cassandra: serverVersion: 4.0.11 …
NOTE:
Useful backup spec.k8ssandra.medusa.storageProperties
fields:
-
storageProvider
: The storage provider: google_storage, azure_blob, s3, s3_compatible, or s3_rgw. -
bucketName
: The name of the storage bucket to use for backups -
region
: The region of the storage bucket. Used for AWS S3. -
secure
: Indicates whether to use SSL to connect to the storage backend. Used for S3-compatible. -
concurrentTransfers
: The number of concurrent uploads -
host
: Host to connect to for the storage backend. Omitted for GCS, S3, Azure, and local. -
port
: Port to connect to the storage backend. Omitted for GCS, S3, Azure, and local. -
maxBackupAge
: Maximum backup age that the purge process should observe -
maxBackupCount
: Maximum number of backups to keep. Used by the purge process. Default is unlimited.
Create an immediate backup
Run a backup of a Cassandra or DSE datacenter (DC) that uses a MedusaBackupJob
custom resource in the same namespace and Data Plane
where either the MissionControlCluster
or cassandraDatacenter
resource resides.
Mission Control runs a synthetic full backup, and names it according to how it is created. This backup type combines, at a point in time, the last full backup in storage with all data that is different since that last full backup. Mission Control copies this combined backup to the backup datastore. The use of immutable data files allow for this optimization. |
Choose User Interface (UI) or Command Line Interface (CLI) steps to immediately back up a datacenter in a cluster.
To schedule a backup activity of a datacenter in the cluster, see Create a backup schedule. |
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
-
Click Create Backup.
-
Change the backup Type to Run Now.
-
Choose the target datacenter.
-
Click Create Backup
To view notifications from the backup activity, see Monitor backup status.
-
Create a
MedusaBackupJob
Custom Resource (CR) for your release in the same namespace andData Plane
where either theMissionControlCluster
orcassandraDatacenter
resource resides. This example usescassandraDatacenter: dc1
.apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackupJob metadata: name: medusa-backup1 spec: cassandraDatacenter: dc1
Mission Control detects the
MedusaBackupJob
object creation and triggers a backup asynchronously.
To view notifications from the backup activity, see Monitor backup status.
Monitor backup status
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
In the Backup your cluster data pane, immediate backup jobs are tracked as well as any future scheduled backups.
In the Backup Activity section, review notifications of each datacenter’s immediate backup status.
In the Scheduled Backups section, review a datacenter’s scheduled backup details.
-
Check if the
finishTime
status is set in theMedusaBackupJob
object.kubectl get medusabackupjob/medusa-backup1 -o yaml
Sample results
kind: MedusaBackupJob metadata: name: medusa-backup1 spec: cassandraDatacenter: demo-dc1 status: ... ... finishTime: "2022-01-06T16:34:35Z" finished: - demo-dc1-default-sts-0 - demo-dc1-default-sts-1 - demo-dc1-default-sts-2 startTime: "2022-01-06T16:34:30Z"
-
Review the start and finish times in the results from this
kubectl get
command:kubectl get MedusaBackupJob -A
Sample results
NAME STARTED FINISHED backup1 25m 24m medusa-backup1 19m 19m
All nodes with completed backups list times in the
FINISHED
column. At the end of the backup operation, aMedusaBackup
custom resource (CR) is created with the same name as theMedusaBackupJob
object. The CR materializes the backup locally on the Kubernetes cluster. TheMedusaBackup
object status contains the total number of nodes in the cluster at the time of the backup, the number of nodes that successfully achieved the backup, and the topology of the datacenter at the time of the backup:apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackup metadata: name: backup1 status: startTime: '2023-09-13T12:15:57Z' finishTime: '2023-09-13T12:16:12Z' totalNodes: 2 finishedNodes: 2 nodes: - datacenter: dc1 host: firstcluster-dc1-default-sts-0 rack: default tokens: - -110555885826893 - -1149279817337332700 - -1222258121654772000 - -127355705089199870 - datacenter: dc1 host: firstcluster-dc1-default-sts-1 rack: default tokens: - -1032268962284829800 - -1054373523049285200 - -1058110708807841300 - -107256661843445790 status: SUCCESS spec: backupType: differential cassandraDatacenter: demo-dc1
-
Review the resulting subset of the CR information for
MedusaBackup
objects:kubectl get MedusaBackup -A
Sample results
NAME STARTED FINISHED NODES COMPLETED STATUS backup1 29m 28m 2 2 SUCCESS medusa-backup1 23m 23m 2 2 SUCCESS
NOTE:
For a restore to be possible, a MedusaBackup
object must exist.
Create a backup schedule
Run a backup of a Cassandra, DSE, or HCD datacenter (DC) that uses a MedusaBackupJob
Custom Resource in the same namespace and Data Plane
where either the MissionControlCluster
or cassandraDatacenter
resource resides.
This section describes scheduling a future backup activity.
To schedule an immediate backup of a datacenter in the cluster, see Create an immediate backup. |
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
Click the Backups tab.
-
Click Create Scheduled Backup.
-
To schedule a future backup of the datacenter you choose, keep Schedule as the default backup Type.
-
Fill out the Cron expression.
See cron expression tips.
-
Choose the target datacenter.
-
Click Create Schedule.
To view notifications from the backup activity, see Monitor backup status.
Choose the backup MedusaBackupSchedule
Custom Resource Definition (CRD) that matches your installation version. Mission Control uses your modified custom resource file with its cronSchedule expression to manage backup schedules:
-
Create and modify a
MedusaBackupSchedule
Custom Resource Definition (CRD) file.apiVersion: medusa.k8ssandra.io/v1alpha1 kind: MedusaBackupSchedule metadata: name: medusa-backup-schedule namespace: demo-dc1 spec: backupSpec: backupType: differential cassandraDatacenter: demo-dc1 cronSchedule: 30 1 * * * disabled: false
This resource must be created in the same
Data Plane
and namespace where thecassandraDatacenter
resource resides. This example usescassandraDatacenter: demo-dc1
. The example definition defines a backup ofdemo-dc1
every day at 1:30 AM.NOTE: When specifying
spec.cronSchedule
, the* * * * *
definitions from left to right indicate minute, hour, day of the month, month, day of the week. See cronSchedule specifications. -
The status of the backup schedule is updated with the last execution and next execution times.
-
Review the execution times for the
MedusaBackupSchedule
object:kubectl get MedusaBackupSchedule -A
Sample results
... status: lastExecution: "2022-07-26T01:30:00Z" nextSchedule: "2022-07-27T01:30:00Z" ...
The MedusaBackupJob
and MedusaBackup
objects are created with the name of the MedusaBackupSchedule
object as prefix and a timestamp as suffix.
For example: medusa-backup-schedule-1658626200
.
References
-
MissionControlCluster object manifest - choose the one that matches your release
-
See the release-specific list of Custom Resource Definitions (CRD) backup files with their properties for:
-
MedusaBackupJob
-
MedusaBackup
-
MedusaBackupSchedule
-
See also
-
Medusa documentation reference to determine the correct file format to use for each supported storage backend