Migrate data with Astra DB Sideloader

You can use Astra DB Sideloader to migrate data to Astra DB from Apache Cassandra®, DataStax Enterprise (DSE), or Hyper-Converged Database (HCD).

Create snapshots

On each node in your origin cluster, use nodetool to create a backup of the data that you want to migrate, including all keyspaces and CQL tables that you want to migrate.

Prepare to create snapshots

  1. Due to Astra DB Sideloader limitations related to materialized views, secondary indexes, and encrypted data, you might need to modify the data model on your origin cluster to prepare for the migration. For more information, see Origin cluster requirements.

  2. Optional: Before you create snapshots, consider running nodetool cleanup to remove data that no longer belongs to your nodes. This command is particularly useful after adding more nodes to a cluster because it helps ensure that each node only contains the data that it is responsible for, according to the current cluster configuration and partitioning scheme.

    If you run nodetool cleanup before you take a snapshot, you can ensure that the snapshot only includes relevant data, potentially reducing the size of the snapshot. Smaller snapshots can lead to lower overall migration times and lower network transfer costs.

    However, take adequate precautions before you run this command because the cleanup operations can introduce additional load on your origin cluster.

Run nodetool snapshot

Use nodetool snapshot to create snapshots for the tables that you want to migrate.

Don’t create snapshots of system tables or tables that you don’t want to migrate. The migration can fail if you attempt to migrate snapshots that don’t have a matching schema in the target database. Astra DB Sideloader ignores system keyspaces.

The structure of the nodetool snapshot command depends on the keyspaces and tables that you want to migrate.

Snapshot all keyspaces

Create a snapshot of all tables in all keyspaces:

nodetool snapshot -t SNAPSHOT_NAME

Replace the following:

  • SNAPSHOT_NAME: A descriptive name for the snapshot. Use the same snapshot name for each node’s snapshot; this makes it easier to programmatically upload the snapshots to the migration directory.

Snapshot specific keyspaces

Create a snapshot of all tables in one or more specified keyspaces:

Snapshot one keyspace
nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME
Snapshot multiple keyspaces
nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME_1 KEYSPACE_NAME_2

Replace the following:

  • SNAPSHOT_NAME: A descriptive name for the snapshot. Use the same snapshot name for each node’s snapshot; this makes it easier to programmatically upload the snapshots to the migration directory.

  • KEYSPACE_NAME: The name of the keyspace that you want to migrate.

    To snapshot multiple keyspaces, pass a space-separated list of keyspace names. For example, customer_data product_data purchase_history specifies three keyspaces.

Snapshot specific tables

Create a snapshot of one or more specified tables:

Snapshot one table
nodetool snapshot -kt KEYSPACE_NAME.TABLE_NAME -t SNAPSHOT_NAME
Snapshot multiple tables
nodetool snapshot -kt KEYSPACE_NAME_1.TABLE_NAME_A KEYSPACE_NAME_1.TABLE_NAME_B KEYSPACE_NAME_2.TABLE_NAME_X -t SNAPSHOT_NAME

Replace the following:

  • KEYSPACE_NAME.TABLE_NAME: The name of the table that you want to migrate and the keyspace that it belongs to, separated by a period. For example, product_data.appliances specifies the appliances table in the product_data keyspace.

    To snapshot multiple tables, pass a space-separated list of keyspace-table pairs. For example, product_data.appliances purchase_history.nevada purchase_history.wisconsin specifies the appliances table in the product_data keyspace and the nevada and wisconsin tables in the purchase_history keyspace.

  • SNAPSHOT_NAME: A descriptive name for the snapshot. Use the same snapshot name for each node’s snapshot; this makes it easier to programmatically upload the snapshots to the migration directory.

Verify snapshot creation with nodetool listsnapshots

Use nodetool listsnapshots to verify that the snapshots were created:

nodetool listsnapshots

Snapshots have a specific directory structure, such as KEYSPACE_NAME/TABLE_NAME/snapshots/SNAPSHOT_NAME/…​. Astra DB Sideloader relies on this fixed structure to properly interpret the SSTable components. Don’t modify the snapshot’s directory structure; this can cause your migration to fail.

Optional: Use for loops for snapshot creation and validation

If the nodes in your origin cluster are named in a predictable way (for example, dse0, dse1, dse2, etc.), you can use a for loop to simplify snapshot creation. For example:

Use a for loop to snapshot all keyspaces

To snapshot all keyspaces on each node, append the nodetool command to your for loop:

for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME; done
Use a for loop to snapshot specific keyspaces

To snapshot one keyspace on each node, append the nodetool command to your for loop:

for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME; done

To snapshot multiple specific keyspaces on each node, use commas (not spaces) to separate the keyspace names:

for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME_1,KEYSPACE_NAME_2; done
Use a for loop to snapshot specific tables

To snapshot one table on each node, append the nodetool command to your for loop:

for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt KEYSPACE_NAME.TABLE_NAME -t SNAPSHOT_NAME; done

To snapshot multiple specific tables on each node, use commas (not spaces) to separate the keyspace-table pairs:

for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt KEYSPACE_NAME_1.TABLE_NAME_A,KEYSPACE_NAME_1.TABLE_NAME_B -t SNAPSHOT_NAME; done

You can use the same for loop structure to verify that each snapshot was successfully created:

for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done

Configure the target database

To prepare your target database for the migration, you must record the schema for each table in your origin cluster that you want to migrate, re-create these schemas in your target database, and then set environment variables required to connect to your database.

For the migration to succeed, your target database must meet the schema requirements described in this section. Additionally, your snapshots must contain compatible data and directories, as described in Origin cluster requirements and Create snapshots. For example, Astra DB doesn’t support materialized views, and Astra DB Sideloader cannot migrate encrypted data.

However, indexes don’t need to match. You can define indexes in your target database independently from the origin cluster because Astra DB Sideloader ignores Storage Attached Indexes (SAI) defined on the origin cluster. During the migration, Astra DB Sideloader automatically populates any SAI defined in your target database, even if those SAI weren’t present in your origin cluster.

  1. Get the following schema properties for each table that you want to migrate:

    • Exact keyspace name.

    • Exact table name.

    • Exact column names, data types, and the order in which they appear in the table creation DDL.

    • Exact primary key definition as defined in your origin cluster, including the partition key, clustering columns, and ascending/descending ordering clauses. You must define partition key columns and clustering columns in the exact order that they are defined on your origin cluster.

      To retrieve schema properties, you can run the DESCRIBE KEYSPACE command on your origin cluster:

      DESCRIBE KEYSPACE_NAME;

      Replace KEYSPACE_NAME with the name of the keyspace that contains the tables you want to migrate, such as DESCRIBE smart_home;.

      Then, get the schema properties from the result:

      CREATE TABLE smart_home.sensor_readings (
          device_id UUID,
          room_id UUID,
          reading_type TEXT,
          reading_value DOUBLE,
          reading_timestamp TIMESTAMP,
          PRIMARY KEY (device_id, room_id, reading_timestamp)
      ) WITH CLUSTERING ORDER BY (room_id ASC, reading_timestamp DESC);
  2. Re-create the schemas in your target database:

    1. In the Astra Portal navigation menu, click Databases, and then click the name of your Astra DB database.

    2. Create a keyspace with the exact same name as your origin cluster’s keyspace.

    3. In your database’s CQL console, create tables with the exact same names and schemas as your origin cluster.

      cql console create identical schema

      Astra DB rejects or ignores some table properties, such as compaction strategy. See Astra DB Serverless database limits for more information.

  3. In your terminal, set environment variables for your target database:

    export dbID=DATABASE_ID
    export token=APPLICATION_TOKEN

    Replace the following:

    Later, you will add another environment variable for the migration ID.

    The curl commands in this guide assume that you have set environment variables for token, database ID, and migration ID. Running the commands without these environment variables causes error messages like <a href="/v2/databases/migrations/">Moved Permanently</a> and 404 page not found.

    Additionally, the curl command use jq to format the JSON responses. If you don’t have jq installed, remove | jq . from the end of each command.

Initialize the migration

Use the DevOps API to initialize the migration and get your migration directory path and credentials.

To learn more about the initialization process, see About Astra DB Sideloader: Initialize a migration.

The initialization process can take several minutes to complete, especially if the migration bucket doesn’t already exist.

Get a migration ID

  1. In your terminal, use the DevOps API to initialize the data migration:

    curl -X POST \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/initialize \
        | jq .
  2. Get the migrationID from the response:

    {
      "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
      "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
      "status": "Initializing",
      "progressInfo": "",
      "uploadBucketDir": "",
      "uploadCredentials": {
        "name": "",
        "keys": null,
        "credentialExpiration": null
      },
      "expectedCleanupTime": "2025-03-04T15:14:38Z"
    }

    The migrationID is a unique identifier (UUID) for the migration.

    The response also includes the migration status. You will refer to this status multiple times throughout the migration process.

  3. Assign the migration ID to an environment variable:

    export migrationID=MIGRATION_ID

    Replace MIGRATION_ID with the migrationID returned by the initialize endpoint.

Check the migration status to verify initialization

  1. Check the migration status:

    curl -X GET \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \
        | jq .

    A successful response contains a MigrationStatus object. It can take a few minutes for the DevOps API to reflect status changes during a migration. Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status.

  2. Check the status field in the response:

    • "status": "ReceivingFiles": Initialization is complete and your upload credentials are available. Proceed to the next step.

    • "status": "Initializing": The migration is still initializing. Wait a few minutes before you check the status again.

Get migration directory path and upload credentials

Get your migration directory path and upload credentials from the response. You need these values to upload snapshots to the migration directory.

Get AWS credentials from MigrationStatus

Securely store the uploadBucketDir, accessKeyID, secretAccessKey, and sessionToken from the response:

{
  "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
  "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
  "status": "ReceivingFiles",
  "progressInfo": "",
  "uploadBucketDir": "s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/",
  "uploadCredentials": {
    "name": "sessionToken",
    "keys": {
      "accessKeyID": "ASXXXXXXXXXXXXXXXXXX",
      "secretAccessKey": "2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw",
      "sessionToken": "XXXXXXXXXX"
    },
    "credentialExpiration": "2024-01-18T19:45:09Z",
    "hint": "\nexport AWS_ACCESS_KEY_ID=ASXXXXXXXXXXXXXXXXXX\nexport AWS_SECRET_ACCESS_KEY=2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw\nexport AWS_SESSION_TOKEN=XXXXXXXXXXXXXX\n"
  },
  "expectedCleanupTime": "2024-01-25T15:14:38Z"
}

uploadBucketDir is the migration directory URL. Note the trailing slash.

uploadCredentials contains the AWS credentials that authorize uploads to the migration directory, namely accessKeyID, secretAccessKey, and sessionToken.

The sessionToken expires after one hour. If your total migration takes longer than one hour, generate new credentials, and then resume the migration with the fresh credentials.

If you use automation to handle Astra DB Sideloader migrations, you might need to script a pause every hour so you can generate new credentials without unexpectedly interrupting the migration.

Get Google Cloud credentials from MigrationStatus

  1. Find the uploadBucketDir and the uploadCredentials in the response:

    {
      "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
      "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
      "status": "ReceivingFiles",
      "progressInfo": "",
      "uploadBucketDir": "gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/",
      "uploadCredentials": {
        "name": "TYPE_GOOGLE_CREDENTIALS_FILE",
        "keys": {
          "file": "CREDENTIALS_FILE"
        },
        "credentialExpiration": "2024-08-07T18:51:39Z"
      },
      "expectedCleanupTime": "2024-08-14T15:14:38Z"
    }

    uploadBucketDir is the migration directory URL. Note the trailing slash.

    uploadCredentials contains a base64-encoded file containing Google Cloud credentials that authorize uploads to the migration directory.

  2. Pipe the Google Cloud credentials file to a creds.json file:

    curl -X GET \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \
        | jq -r '.uploadCredentials.keys.file' \
        | base64 -d > creds.json
  3. Securely store the uploadBucketDir and creds.json.

Get Azure credentials from MigrationStatus

Securely store the uploadBucketDir and urlSignature from the response:

{
  "migrationID": "456ca4a9-0551-46c4-b8bb-90fcd136a0c3",
  "dbID": "ccefd141-8fda-4e4d-a746-a102a96657bc",
  "status": "ReceivingFiles",
  "progressInfo": "",
  "uploadBucketDir": "https://muztx5cqmp3jhe3j2guebksz.blob.core.windows.net/mig-upload-456ca4a9-0551-46c4-b8bb-90fcd136a0c3/sstables/",
  "uploadCredentials": {
    "name": "URL signature",
    "keys": {
      "url": "https://UPLOAD_BUCKET_DIR/?si=AZURE_SAS_TOKEN",
      "urlSignature": "si=AZURE_SAS_TOKEN"
    },
    "credentialExpiration": "2025-04-02T15:14:31Z"
  },
  "expectedCleanupTime": "2025-03-04T15:14:38Z"
}

uploadBucketDir is the migration directory URL. Note the trailing slash.

uploadCredentials contains url and urlSignature keys that represent an Azure Shared Access Signature (SAS) token. You need the urlSignature to upload snapshots to the migration directory. In the preceding example, these strings are truncated for readability.

Upload snapshots to the migration directory

Use your cloud provider’s CLI and your upload credentials to upload snapshots for each origin node into the migration directory.

Be aware of the following requirements for the upload commands:

  • You must include the asterisk (*) character as shown in the commands, otherwise the commands won’t work properly.

  • With the exception of the leading :// in the migration directory path, your paths must not include double slashes (//).

  • Use the CLI that corresponds with your target database’s cloud provider. For more information, see Prepare to use Astra DB Sideloader.

  • These commands assume that you installed the cloud provider’s CLI on the nodes in your origin cluster. For more information, see Prepare to use Astra DB Sideloader.

  • You might need to modify these commands depending on your environment, node names, directory structures, and other variables.

Upload snapshots to AWS

  1. Set environment variables for the AWS credentials that were generated when you initialized the migration:

    export AWS_ACCESS_KEY_ID=ACCESS_KEY_ID
    export AWS_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY
    export AWS_SESSION_TOKEN=SESSION_TOKEN
  2. Use the AWS CLI to upload one snapshot from one node into the migration directory:

    du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \
    aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/SNAPSHOT_NAME*' CASSANDRA_DATA_DIR/ MIGRATION_DIRNODE_NAME

    Replace the following:

    • CASSANDRA_DATA_DIR: The absolute file system path to where Cassandra data is stored on the node. For example, /var/lib/cassandra/data.

    • KEYSPACE_NAME: The name of the keyspace that contains the tables you want to migrate.

    • SNAPSHOT_NAME: The name of the snapshot backup that you created with nodetool snapshot.

    • MIGRATION_DIR: The entire uploadBucketDir value that was generated when you initialized the migration, including the trailing slash.

    • NODE_NAME: The host name of the node that your snapshots are from. It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket.

    Example: Upload a snapshot with AWS CLI
    # Set environment variables
    export AWS_ACCESS_KEY_ID=XXXXXXXX
    export AWS_SECRET_ACCESS_KEY=XXXXXXXXXX
    export AWS_SESSION_TOKEN=XXXXXXXXXX
    
    # Upload "sensor_readings" snapshot from "dse0" node
    du -sh /var/lib/cassandra/data/smart_home/*/snapshots/*sensor_readings*; \
    aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/sensor_readings*' /var/lib/cassandra/data/ s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0
  3. Monitor upload progress:

    1. Use the AWS CLI to get a list of cloud storage keys for the files that have been successfully uploaded to the migration directory:

      aws s3 ls --human-readable --summarize --recursive MIGRATION_DIR

      Replace MIGRATION_DIR with the uploadBucketDir that was generated when you initialized the migration.

    2. Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.

      You can potentially increase upload speeds by adjusting the max_concurrent_requests, multipart_threshold, and multipart_chunksize parameters in your AWS CLI S3 configuration. However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.

  4. Repeat the upload process for each snapshot (SNAPSHOT_NAME) and node (NODE_NAME) in your origin cluster.

    If your credentials expire, see Get new upload credentials.

    Use a for loop to simplify snapshot uploads

    If the nodes in your origin cluster have predictable names (for example, dse0, dse1, and dse2), then you can use a for loop to streamline the execution of the upload commands. For example:

    # Set environment variables
    export AWS_ACCESS_KEY_ID=ACCESS_KEY_ID
    export AWS_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY
    export AWS_SESSION_TOKEN=SESSION_TOKEN
    
    # Loop over the sync command for all nodes
    for i in 0 1 2; do ssh dse${i} \
    "du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \
    aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/SNAPSHOT_NAME*' CASSANDRA_DATA_DIR/ MIGRATION_DIRdse${i}" & done

Uploaded snapshots are staged in the migration directory, but the data is not yet written to the target database. After uploading snapshots, you must import the data to finish the migration.

Idle migration directories are evicted

As an added security measure, migrations that remain continuously idle for one week are subject to automatic cleanup, which deletes all associated snapshots, revokes any unexpired upload credentials, and then closes the migration.

DataStax recommends that you manually reschedule the cleanup if you don’t plan to launch the migration within one week or if you need several days to upload snapshots or import data.

For large migrations, it can take several days to upload snapshots and import data. Make sure you manually reschedule the cleanup to avoid automatic cleanup.

Upload snapshots to Google Cloud Storage

  1. Authenticate to Google Cloud with the creds.json file that you created when you initialized the migration:

    gcloud auth activate-service-account --key-file=creds.json

    If necessary, modify the --key-file path to match the location of your creds.json file, such as --key-file=~/.gcloud_credentials/creds.json.

    You can also use gcloud auth login --cred-file creds.json.

  2. Use gsutil to upload one snapshot from one node into the migration directory:

    gsutil -m rsync -r -d CASSANDRA_DATA_DIR/KEYSPACE_NAME/**/snapshots/SNAPSHOT_NAME/ MIGRATION_DIRNODE_NAME/

    Replace the following:

    • CASSANDRA_DATA_DIR: The absolute file system path to where Cassandra data is stored on the node. For example, /var/lib/cassandra/data.

    • KEYSPACE_NAME: The name of the keyspace that contains the tables you want to migrate.

    • SNAPSHOT_NAME: The name of the snapshot backup that you created with nodetool snapshot.

    • MIGRATION_DIR: The entire uploadBucketDir value that was generated when you initialized the migration, including the trailing slash.

    • NODE_NAME: The host name of the node that your snapshots are from. It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket.

    Example: Upload a snapshot with gcloud and gsutil
    # Authenticate
    gcloud auth activate-service-account --key-file=creds.json
    
    # Upload "sensor_readings" snapshot from "dse0" node
    gsutil -m rsync -r -d /var/lib/cassandra/data/smart_home/**/snapshots/sensor_readings/ gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0
  3. Monitor upload progress:

    1. Use gsutil to get a list of objects that have been successfully uploaded to the migration directory:

      gsutil ls -r MIGRATION_DIR

      Replace MIGRATION_DIR with the uploadBucketDir that was generated when you initialized the migration.

    2. Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.

      The -m flag in gsutil -m rsync enables parallel synchronization, which can improve upload speed. However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.

  4. Repeat the upload process for each snapshot (SNAPSHOT_NAME) and node (NODE_NAME) in your origin cluster.

    Use a for loop to simplify snapshot uploads

    If the nodes in your origin cluster have predictable names (for example, dse0, dse1, and dse2), then you can use a for loop to streamline the execution of the gsutil rsync commands. For example:

    for i in 0 1 2; do ssh dse${i} \
    du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \
    gsutil -m rsync -r -d CASSANDRA_DATA_DIR/KEYSPACE_NAME/**/snapshots/SNAPSHOT_NAME/ MIGRATION_DIRdse${i} & done

Uploaded snapshots are staged in the migration directory, but the data is not yet written to the target database. After uploading snapshots, you must import the data to finish the migration.

Idle migration directories are evicted

As an added security measure, migrations that remain continuously idle for one week are subject to automatic cleanup, which deletes all associated snapshots, revokes any unexpired upload credentials, and then closes the migration.

DataStax recommends that you manually reschedule the cleanup if you don’t plan to launch the migration within one week or if you need several days to upload snapshots or import data.

For large migrations, it can take several days to upload snapshots and import data. Make sure you manually reschedule the cleanup to avoid automatic cleanup.

Upload snapshots to Azure

  1. Set environment variables for the following values:

    • AZURE_SAS_TOKEN: The urlSignature key that was generated when you initialized the migration.

    • CASSANDRA_DATA_DIR: The absolute file system path to where Cassandra data is stored on the node, including the trailing slash. For example, /var/lib/cassandra/data/.

    • SNAPSHOT_NAME: The name of the snapshot backup that you created with nodetool snapshot.

    • MIGRATION_DIR: The entire uploadBucketDir value that was generated when you initialized the migration, including the trailing slash.

    • NODE_NAME: The host name of the node that your snapshots are from. It is important to use the specific node name to ensure that each node has a unique directory in the migration bucket.

    export AZURE_SAS_TOKEN="AZURE_CREDENTIALS_URL"
    export CASSANDRA_DATA_DIR="CASSANDRA_DATA_DIR"
    export SNAPSHOT_NAME="SNAPSHOT_NAME"
    export MIGRATION_DIR="MIGRATION_DIR"
    export NODE_NAME="NODE_NAME"
  2. Use the Azure CLI to upload one snapshot from one node into the migration directory:

    for dir in $(find "$CASSANDRA_DATA_DIR" -type d -path "*/snapshots/${SNAPSHOT_NAME}*"); do
        REL_PATH="${dir#"$CASSANDRA_DATA_DIR"}"  # Remove the base path
        DEST_PATH="${MIGRATION_DIR}${NODE_NAME}/${REL_PATH}/?${AZURE_SAS_TOKEN}"
    
        azcopy sync "$dir" "$DEST_PATH" --recursive
    done
  3. Monitor upload progress:

    1. Use the Azure CLI to get the curent contents of the migration directory:

      azcopy list ${MIGRATION_DIR}?${AZURE_SAS_TOKEN}
    2. Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.

      Upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.

  4. Repeat the upload process for each snapshot and node in your origin cluster. Be sure to change the SNAPSHOT_NAME and NODE_NAME environment variables as needed.

Uploaded snapshots are staged in the migration directory, but the data is not yet written to the target database. After uploading snapshots, you must import the data to finish the migration.

Idle migration directories are evicted

As an added security measure, migrations that remain continuously idle for one week are subject to automatic cleanup, which deletes all associated snapshots, revokes any unexpired upload credentials, and then closes the migration.

DataStax recommends that you manually reschedule the cleanup if you don’t plan to launch the migration within one week or if you need several days to upload snapshots or import data.

For large migrations, it can take several days to upload snapshots and import data. Make sure you manually reschedule the cleanup to avoid automatic cleanup.

Import data

After you completely upload snapshots for each origin node, import the data into your target database.

Data import is a multi-step operation that requires complete success. If one step fails, then the entire import operation stops and the migration fails.

To learn more about the data import process, see About Astra DB Sideloader: Import data.

  • Before you start the import process, make sure all snapshots are completely uploaded. For commands to monitor upload progress and compare uploaded data against the original snapshots, see Upload snapshots to the migration directory.

  • If necessary, you can pause or abort the migration during the import process. You can abort a migration up until the point at which Astra DB Sideloader starts importing SSTable metadata. After this point, you must wait for the migration to finish, and then you can use the CQL shell (cqlsh) to drop the keyspace/table in your target database before repeating the entire migration procedure.

  1. Use the DevOps API to launch the data import:

    curl -X POST \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID}/launch \
        | jq .

    Although this call returns immediately, the import process takes time.

  2. Check the migration status periodically:

    curl -X GET \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \
        | jq .

    A successful response contains a MigrationStatus object. It can take a few minutes for the DevOps API to reflect status changes during a migration. Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status.

  3. Check the status field in the response:

    • "status": "ImportInProgress": The data is still being imported. Wait a few minutes before you check the status again.

    • "status": "MigrationDone": The import is complete, and you can proceed to Validate the migrated data.

  4. If the migration takes more than a few days, manually reschedule the cleanup to avoid automatic cleanup.

  5. If the migration fails, see Troubleshoot Astra DB Sideloader.

Validate the migrated data

After the migration is complete, you can query the migrated data using the CQL shell (cqlsh) or Data API.

You can run CDM in validation mode for more thorough validation. CDM also offers an AutoCorrect mode to reconcile any differences that it detects.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM