Migrate data with Astra DB Sideloader

You can use Astra DB Sideloader to migrate data to Astra DB from Apache Cassandra®, DataStax Enterprise (DSE), or Hyper-Converged Database (HCD).

Create snapshots

On each node in your origin cluster, use nodetool to create a backup of the data that you want to migrate, including all keyspaces and CQL tables that you want to migrate.

  1. Be aware of the Astra DB Sideloader limitations related to materialized views, secondary indexes, and encrypted data that are described in Origin cluster requirements. If necessary, modify the data model on your origin cluster to prepare for the migration.

  2. Optional: Before you create snapshots, consider running nodetool cleanup to remove data that no longer belongs to your nodes. This command is particularly useful after adding more nodes to a cluster because it helps ensure that each node only contains the data that it is responsible for, according to the current cluster configuration and partitioning scheme.

    If you run nodetool cleanup before you take a snapshot, you can ensure that the snapshot only includes relevant data, potentially reducing the size of the snapshot. Smaller snapshots can lead to lower overall migration times and lower network transfer costs.

    However, take adequate precautions before you run this command because the cleanup operations can introduce additional load on your origin cluster.

  3. Use nodetool snapshot to create snapshots for the tables that you want to migrate.

    Don’t create snapshots of system tables or tables that you don’t want to migrate. The migration can fail if you attempt to migrate snapshots that don’t have a matching schema in the target database. Astra DB Sideloader ignores system keyspaces.

    The structure of the nodetool snapshot command depends on the keyspaces and tables that you want to migrate.

    • All keyspaces

    • Specific keyspaces

    • Specific tables

    Create a snapshot of all tables in all keyspaces:

    nodetool snapshot -t SNAPSHOT_NAME

    Replace SNAPSHOT_NAME with a descriptive name for the snapshot. Use the same snapshot name on each node. This makes it easier to programmatically upload the snapshots to the migration directory.

    Optional: Use a for loop to simplify snapshot creation

    If the nodes in your origin cluster are named in a predictable way (for example, dse0, dse1, dse2, etc.), you can use a for loop to simplify snapshot creation. For example:

    for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME; done

    You can use the same for loop to verify that each snapshot was successfully created:

    for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done

    Create a snapshot of all tables in one or more keyspaces:

    Single keyspace
    nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME
    Multiple keyspaces
    nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME_1 KEYSPACE_NAME_2

    Replace the following:

    • KEYSPACE_NAME: The name of the keyspace that contains the tables you want to migrate.

      To include multiple keyspaces, list each keyspace separated by a space as shown in the example above.

    • SNAPSHOT_NAME: A descriptive name for the snapshot.

      Use the same snapshot name on each node. This makes it easier to programmatically upload the snapshots to the migration directory.

    Optional: Use a for loop to simplify snapshot creation

    If the nodes in your origin cluster are named in a predictable way (for example, dse0, dse1, dse2, etc.), you can use a for loop to simplify snapshot creation. For example:

    for i in 0 1 2; do ssh dse${i} nodetool snapshot -t SNAPSHOT_NAME KEYSPACE_NAME; done

    To include multiple keyspaces in the snapshot, include multiple comma-separated KEYSPACE_NAME values, such as keyspace1,keyspace2.

    You can use the same for loop to verify that each snapshot was successfully created:

    for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done

    Create a snapshot of specific tables within one or more keyspaces:

    Single table
    nodetool snapshot -kt KEYSPACE_NAME.TABLE_NAME -t SNAPSHOT_NAME
    Multiple tables from one or more keyspaces
    nodetool snapshot -kt KEYSPACE_NAME_1.TABLE_NAME_A KEYSPACE_NAME_1.TABLE_NAME_B KEYSPACE_NAME_2.TABLE_NAME_X -t SNAPSHOT_NAME

    Replace the following:

    • KEYSPACE_NAME: The name of the keyspace that contains the table you want to migrate.

    • TABLE_NAME: The name of the table you want to migrate.

      To include multiple tables from one or more keyspaces, list each KEYSPACE_NAME.TABLE_NAME pair separated by a space as shown in the example above.

    • SNAPSHOT_NAME: A descriptive name for the snapshot.

      Use the same snapshot name on each node. This makes it easier to programmatically upload the snapshots to the migration directory.

    Optional: Use a for loop to simplify snapshot creation

    If the nodes in your origin cluster are named in a predictable way (for example, dse0, dse1, dse2, etc.), you can use a for loop to simplify snapshot creation. For example:

    for i in 0 1 2; do ssh dse${i} nodetool snapshot -kt KEYSPACE_NAME.TABLE_NAME -t SNAPSHOT_NAME; done

    To include multiple tables in the snapshot, include multiple comma-separated KEYSPACE_NAME.TABLE_NAME pairs, such as keyspace1.table1,keyspace1.table2.

    You can use the same for loop to verify that each snapshot was successfully created:

    for i in 0 1 2; do ssh dse${i} nodetool listsnapshots; done
  4. Use nodetool listsnapshots to verify that the snapshots were created:

    nodetool listsnapshots

    Snapshots have a specific directory structure, such as KEYSPACE_NAME/TABLE_NAME/snapshots/SNAPSHOT_NAME/…​. Astra DB Sideloader relies on this fixed structure to properly interpret the SSTable components. With the exception of secondary index directories (as explained in the following step), don’t modify the snapshot’s directory structure.

  5. If your origin cluster has secondary indexes (2i), remove all directories related to those indexes from all snapshots before you upload the snapshots.

    Secondary indexes defined in the origin cluster are ignored by Astra DB, but they will cause the migration to fail. To avoid errors, you must remove all secondary index directories from your snapshots before you upload them.

    You can find secondary index directories in the table’s snapshot directory:

    NODE_UUID/KEYSPACE_NAME/TABLE_NAME-TABLE_UUID/snapshots/SNAPSHOT_NAME/.INDEX_NAME

    For example, given the following table schema, the index directory is found at NODE_UUID/smart_home/sensor_readings-TABLE_UUID/snapshots/SNAPSHOT_NAME/.roomidx:

    CREATE TABLE IF NOT EXISTS smart_home.sensor_readings (
        device_id UUID,
        room_id UUID,
        reading_type TEXT,
        PRIMARY KEY ((device_id))
    );
    CREATE INDEX IF NOT EXISTS roomidx ON smart_home.sensor_readings(room_id);

Configure the target database

To prepare your target database for the migration, you must record the schema for each table in your origin cluster that you want to migrate, recreate these schemas in your target database, and then set environment variables required to connect to your database.

For the migration to succeed, your target database must meet the schema requirements described in this section. Additionally, your snapshots must contain compatible data and directories, as described in Origin cluster requirements and Create snapshots. For example, Astra DB doesn’t support materialized views, and Astra DB Sideloader can’t migrate encrypted data.

However, indexes don’t need to match. You can define indexes in your target database independently from the origin cluster because Astra DB Sideloader ignores Storage Attached Indexes (SAI) defined on the origin cluster. During the migration, Astra DB Sideloader automatically populates any SAI defined in your target database, even if those SAI weren’t present in your origin cluster.

  1. Get the following schema properties for each table that you want to migrate:

    • Exact keyspace name.

    • Exact table name.

    • Exact column names, data types, and the order in which they appear in the table creation DDL.

    • Exact primary key definition as defined in your origin cluster, including the partition key, clustering columns, and ascending/descending ordering clauses. You must define partition key columns and clustering columns in the exact order that they are defined on your origin cluster.

      To retrieve schema properties, you can run the DESCRIBE KEYSPACE command on your origin cluster:

      DESCRIBE KEYSPACE_NAME;

      Replace KEYSPACE_NAME with the name of the keyspace that contains the tables you want to migrate, such as DESCRIBE smart_home;.

      Then, get the schema properties from the result:

      CREATE TABLE smart_home.sensor_readings (
          device_id UUID,
          room_id UUID,
          reading_type TEXT,
          reading_value DOUBLE,
          reading_timestamp TIMESTAMP,
          PRIMARY KEY (device_id, room_id, reading_timestamp)
      ) WITH CLUSTERING ORDER BY (room_id ASC, reading_timestamp DESC);
  2. Recreate the schemas in your target database:

    1. In the Astra Portal navigation menu, click Databases, and then click the name of your Astra DB database.

    2. Create a keyspace with the exact same name as your origin cluster’s keyspace.

    3. In your database’s CQL console, create tables with the exact same names and schemas as your origin cluster.

      cql console create identical schema

      Astra DB rejects or ignores some table properties, such as compaction strategy. See Database limits for more information.

  3. In your terminal, set environment variables for your target database:

    export dbID=DATABASE_ID
    export token=TOKEN

    Replace DATABASE_ID with the database ID, and replace TOKEN with an application token with the Database Administrator role.

    Later, you will add another environment variable for the migration ID.

    The curl commands in this guide assume that you have set environment variables for token, database ID, and migration ID. Running the commands without these environment variables causes error messages like <a href="/v2/databases/migrations/">Moved Permanently</a> and 404 page not found.

    Additionally, the curl command use jq to format the JSON responses. If you don’t have jq installed, remove | jq . from the end of each command.

Initialize the migration

Use the DevOps API to initialize the migration and get your migration directory path and credentials.

What happens during initialization?

After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the Astra DevOps API to initialize the migration.

data importer workflow

When you initialize a migration, Astra DB Sideloader does the following:

  1. Creates a secure migration bucket.

    The migration bucket is only created during the first initialization. All subsequent migrations use different directories in the same migration bucket.

    DataStax owns the migration bucket, and it is located within the Astra perimeter.

  2. Generates a migration ID that is unique to the new migration.

  3. Creates a migration directory within the migration bucket that is unique to the new migration.

    The migration directory is also referred to as the uploadBucketDir. In the next phase of the migration process, you will upload your snapshots to this migration directory.

  4. Generates upload credentials that grant read/write access to the migration directory.

    The credentials are formatted according to the cloud provider where your target database is deployed.

The initialization process can take several minutes to complete, especially if the migration bucket doesn’t already exist.

  1. In your terminal, use the DevOps API to initialize the data migration:

    curl -X POST \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/initialize \
        | jq .
  2. Get the migrationID from the response:

    {
      "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
      "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
      "status": "Initializing",
      "progressInfo": "",
      "uploadBucketDir": "",
      "uploadCredentials": {
        "name": "",
        "keys": null,
        "credentialExpiration": null
      },
      "expectedCleanupTime": "2025-03-04T15:14:38Z"
    }

    The migrationID is a unique identifier (UUID) for the migration.

    The response also includes the migration status. You will refer to this status multiple times throughout the migration process.

  3. Assign the migration ID to an environment variable:

    export migrationID=MIGRATION_ID

    Replace MIGRATION_ID with the migrationID returned by the initialize endpoint.

  4. Check the migration status:

    curl -X GET \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \
        | jq .

    A successful response contains a MigrationStatus object. It can take a few minutes for the DevOps API to reflect status changes during a migration. Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status.

  5. Check the status field in the response:

    • "status": "ReceivingFiles": Initialization is complete and your upload credentials are available. Proceed to the next step.

    • "status": "Initializing": The migration is still initializing. Wait a few minutes before you check the status again.

  6. Get your migration directory path and upload credentials from the response. You need these values to upload snapshots to the migration directory.

    • AWS

    • Google Cloud

    • Microsoft Azure

    MigrationStatus with AWS credentials
    {
      "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
      "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
      "status": "ReceivingFiles",
      "progressInfo": "",
      "uploadBucketDir": "s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/",
      "uploadCredentials": {
        "name": "sessionToken",
        "keys": {
          "accessKeyID": "ASXXXXXXXXXXXXXXXXXX",
          "secretAccessKey": "2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw",
          "sessionToken": "XXXXXXXXXX"
        },
        "credentialExpiration": "2024-01-18T19:45:09Z",
        "hint": "\nexport AWS_ACCESS_KEY_ID=ASXXXXXXXXXXXXXXXXXX\nexport AWS_SECRET_ACCESS_KEY=2XXXXXXXXXXXXXXXWqcdV519ZubYbyfuNxbZg1Rw\nexport AWS_SESSION_TOKEN=XXXXXXXXXXXXXX\n"
      },
      "expectedCleanupTime": "2024-01-25T15:14:38Z"
    }

    Securely store the uploadBucketDir, accessKeyID, secretAccessKey, and sessionToken:

    • uploadBucketDir is the migration directory URL. Note the trailing slash.

    • uploadCredentials contains the AWS credentials that authorize uploads to the migration directory, namely accessKeyID, secretAccessKey, and sessionToken.

    The sessionToken expires after one hour. If your total migration takes longer than one hour, generate new credentials, and then resume the migration with the fresh credentials.

    If you use automation to handle Astra DB Sideloader migrations, you might need to script a pause every hour so you can generate new credentials without unexpectedly interrupting the migration.

    MigrationStatus with Google Cloud credentials
    {
      "migrationID": "272eac1d-df8e-4d1b-a7c6-71d5af232182",
      "dbID": "b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d",
      "status": "ReceivingFiles",
      "progressInfo": "",
      "uploadBucketDir": "gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/",
      "uploadCredentials": {
        "name": "TYPE_GOOGLE_CREDENTIALS_FILE",
        "keys": {
          "file": "CREDENTIALS_FILE"
        },
        "credentialExpiration": "2024-08-07T18:51:39Z"
      },
      "expectedCleanupTime": "2024-08-14T15:14:38Z"
    }
    1. Find the uploadBucketDir and the uploadCredentials in the response:

      • uploadBucketDir is the migration directory URL. Note the trailing slash.

      • uploadCredentials includes a base64-encoded file containing Google Cloud credentials that authorize uploads to the migration directory.

    2. Pipe the Google Cloud credentials file to a creds.json file:

      curl -X GET \
          -H "Authorization: Bearer ${token}" \
          https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \
          | jq -r '.uploadCredentials.keys.file' \
          | base64 -d > creds.json
    3. Securely store the uploadBucketDir and creds.json.

    MigrationStatus with Azure credentials
    {
      "migrationID": "456ca4a9-0551-46c4-b8bb-90fcd136a0c3",
      "dbID": "ccefd141-8fda-4e4d-a746-a102a96657bc",
      "status": "ReceivingFiles",
      "progressInfo": "",
      "uploadBucketDir": "https://muztx5cqmp3jhe3j2guebksz.blob.core.windows.net/mig-upload-456ca4a9-0551-46c4-b8bb-90fcd136a0c3/sstables/",
      "uploadCredentials": {
        "name": "URL signature",
        "keys": {
          "url": "https://UPLOAD_BUCKET_DIR/?si=AZURE_SAS_TOKEN",
          "urlSignature": "si=AZURE_SAS_TOKEN"
        },
        "credentialExpiration": "2025-04-02T15:14:31Z"
      },
      "expectedCleanupTime": "2025-03-04T15:14:38Z"
    }

    Securely store the uploadBucketDir and urlSignature:

    • uploadBucketDir is the migration directory URL. Note the trailing slash.

    • uploadCredentials contains url and urlSignature keys that represent an Azure Shared Access Signature (SAS) token. In the preceding example, these strings are truncated for readability.

      You need the urlSignature to upload snapshots to the migration directory.

Upload snapshots to the migration directory

Use your cloud provider’s CLI and your upload credentials to upload snapshots for each origin node into the migration directory.

Be aware of the following requirements for the upload commands:

  • You must include the asterisk (*) character as shown in the commands, otherwise the commands won’t work properly.

  • With the exception of the leading :// in the migration directory path, your paths must not include double slashes (//).

  • Use the CLI that corresponds with your target database’s cloud provider. For more information, see Prepare to use Astra DB Sideloader.

  • These commands assume that you installed the cloud provider’s CLI on the nodes in your origin cluster. For more information, see Prepare to use Astra DB Sideloader.

  • You might need to modify these commands depending on your environment, node names, directory structures, and other variables.

  • AWS

  • Google Cloud

  • Microsoft Azure

  1. Set environment variables for the AWS credentials that were generated when you initialized the migration:

    export AWS_ACCESS_KEY_ID=ACCESS_KEY_ID
    export AWS_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY
    export AWS_SESSION_TOKEN=SESSION_TOKEN
  2. Use the AWS CLI to upload one snapshot from one node into the migration directory:

    du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \
    aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/SNAPSHOT_NAME*' CASSANDRA_DATA_DIR/ MIGRATION_DIRNODE_NAME

    Replace the following:

    • CASSANDRA_DATA_DIR: The absolute file system path to where Cassandra data is stored on the node. For example, /var/lib/cassandra/data.

    • KEYSPACE_NAME: The name of the keyspace that contains the tables you want to migrate.

    • SNAPSHOT_NAME: The name of the snapshot backup that you created with nodetool snapshot.

    • MIGRATION_DIR: The entire uploadBucketDir value that was generated when you initialized the migration, including the trailing slash.

    • NODE_NAME: The host name of the current node you are uploading the snapshot from.

    Example: Upload a snapshot with AWS CLI
    # Set environment variables
    export AWS_ACCESS_KEY_ID=XXXXXXXX
    export AWS_SECRET_ACCESS_KEY=XXXXXXXXXX
    export AWS_SESSION_TOKEN=XXXXXXXXXX
    
    # Upload "sensor_readings" snapshot from "dse0" node
    du -sh /var/lib/cassandra/data/smart_home/*/snapshots/*sensor_readings*; \
    aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/sensor_readings*' /var/lib/cassandra/data/ s3://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0
  3. Monitor upload progress:

    1. Use the AWS CLI to get a list of cloud storage keys for the files that have been successfully uploaded to the migration directory:

      aws s3 ls --human-readable --summarize --recursive MIGRATION_DIR

      Replace MIGRATION_DIR with the uploadBucketDir that was generated when you initialized the migration.

    2. Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.

      You can potentially increase upload speeds by adjusting the max_concurrent_requests, multipart_threshold, and multipart_chunksize parameters in your AWS CLI S3 configuration. However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.

  4. Repeat the upload process for each snapshot (SNAPSHOT_NAME) and node (NODE_NAME) in your origin cluster.

    If your credentials expire, see Get new upload credentials.

Optional: Use a for loop to simplify snapshot uploads

If the nodes in your origin cluster have predictable names (for example, dse0, dse1, and dse2), then you can use a for loop to streamline the execution of the upload commands. For example:

# Set environment variables
export AWS_ACCESS_KEY_ID=ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY
export AWS_SESSION_TOKEN=SESSION_TOKEN

# Loop over the sync command for all nodes
for i in 0 1 2; do ssh dse${i} \
"du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \
aws s3 sync --only-show-errors --exclude '*' --include '*/snapshots/SNAPSHOT_NAME*' CASSANDRA_DATA_DIR/ MIGRATION_DIRdse${i}" & done
  1. Authenticate to Google Cloud with the creds.json file that you created when you initialized the migration:

    gcloud auth activate-service-account --key-file=creds.json

    If necessary, modify the --key-file path to match the location of your creds.json file, such as --key-file=~/.gcloud_credentials/creds.json.

    You can also use gcloud auth login --cred-file creds.json.

  2. Use gsutil to upload one snapshot from one node into the migration directory:

    gsutil -m rsync -r -d CASSANDRA_DATA_DIR/KEYSPACE_NAME/**/snapshots/SNAPSHOT_NAME/ MIGRATION_DIRNODE_NAME/

    Replace the following:

    • CASSANDRA_DATA_DIR: The absolute file system path to where Cassandra data is stored on the node. For example, /var/lib/cassandra/data.

    • KEYSPACE_NAME: The name of the keyspace that contains the tables you want to migrate.

    • SNAPSHOT_NAME: The name of the snapshot backup that you created with nodetool snapshot.

    • MIGRATION_DIR: The entire uploadBucketDir value that was generated when you initialized the migration, including the trailing slash.

    • NODE_NAME: The host name of the current node you are uploading the snapshot from.

    Example: Upload a snapshot with gcloud and gsutil
    # Authenticate
    gcloud auth activate-service-account --key-file=creds.json
    
    # Upload "sensor_readings" snapshot from "dse0" node
    gsutil -m rsync -r -d /var/lib/cassandra/data/smart_home/**/snapshots/sensor_readings/ gs://ds-mig-b7e7761f-6f7f-4116-81a5-e8eefcf0cc1d/272eac1d-df8e-4d1b-a7c6-71d5af232182/sstables/dse0
  3. Monitor upload progress:

    1. Use gsutil to get a list of objects that have been successfully uploaded to the migration directory:

      gsutil ls -r MIGRATION_DIR

      Replace MIGRATION_DIR with the uploadBucketDir that was generated when you initialized the migration.

    2. Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.

      The -m flag in gsutil -m rsync enables parallel synchronization, which can improve upload speed. However, upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.

  4. Repeat the upload process for each snapshot (SNAPSHOT_NAME) and node (NODE_NAME) in your origin cluster.

Optional: Use a for loop to simplify snapshot uploads

If the nodes in your origin cluster have predictable names (for example, dse0, dse1, and dse2), then you can use a for loop to streamline the execution of the gsutil rsync commands. For example:

for i in 0 1 2; do ssh dse${i} \
du -sh CASSANDRA_DATA_DIR/KEYSPACE_NAME/*/snapshots/*SNAPSHOT_NAME*; \
gsutil -m rsync -r -d CASSANDRA_DATA_DIR/KEYSPACE_NAME/**/snapshots/SNAPSHOT_NAME/ MIGRATION_DIRdse${i} & done
  1. Set environment variables for the following values:

    • AZURE_SAS_TOKEN: The urlSignature key that was generated when you initialized the migration.

    • CASSANDRA_DATA_DIR: The absolute file system path to where Cassandra data is stored on the node, including the trailing slash. For example, /var/lib/cassandra/data/.

    • SNAPSHOT_NAME: The name of the snapshot backup that you created with nodetool snapshot.

    • MIGRATION_DIR: The entire uploadBucketDir value that was generated when you initialized the migration, including the trailing slash.

    • NODE_NAME: The host name of the current node you are uploading the snapshot from.

    export AZURE_SAS_TOKEN="AZURE_CREDENTIALS_URL"
    export CASSANDRA_DATA_DIR="CASSANDRA_DATA_DIR"
    export SNAPSHOT_NAME="SNAPSHOT_NAME"
    export MIGRATION_DIR="MIGRATION_DIR"
    export NODE_NAME="NODE_NAME"
  2. Use the Azure CLI to upload one snapshot from one node into the migration directory:

    for dir in $(find "$CASSANDRA_DATA_DIR" -type d -path "*/snapshots/${SNAPSHOT_NAME}*"); do
        REL_PATH="${dir#"$CASSANDRA_DATA_DIR"}"  # Remove the base path
        DEST_PATH="${MIGRATION_DIR}${NODE_NAME}/${REL_PATH}/?${AZURE_SAS_TOKEN}"
    
        azcopy sync "$dir" "$DEST_PATH" --recursive
    done
  3. Monitor upload progress:

    1. Use the Azure CLI to get the curent contents of the migration directory:

      azcopy list ${MIGRATION_DIR}?${AZURE_SAS_TOKEN}
    2. Compare the returned list against the files in your snapshot directory. When the lists match, the upload is complete.

      Upload time primarily depends on the snapshot size, network throughput from your origin cluster to the migration bucket, and whether the origin cluster and migration bucket are in the same region.

  4. Repeat the upload process for each snapshot and node in your origin cluster. Be sure to change the SNAPSHOT_NAME and NODE_NAME environment variables as needed.

Uploaded snapshots are staged in the migration directory, but the data is not yet written to the target database. After uploading snapshots, you must import the data to finish the migration.

Idle migration directories are evicted

As an added security measure, migrations that remain continuously idle for one week are subject to automatic cleanup, which deletes all associated snapshots, revokes any unexpired upload credentials, and then closes the migration.

DataStax recommends that you manually reschedule the cleanup if you don’t plan to launch the migration within one week or if you need several days to upload snapshots or import data.

For large migrations, it can take several days to upload snapshots and import data. Make sure you manually reschedule the cleanup to avoid automatic cleanup.

Import data

After you upload snapshots for each origin node, import the data into your target database.

Data import is a multi-step operation that requires complete success. If one step fails, then the entire import operation stops and the migration fails.

What happens during data import?

After uploading the snapshots to the migration directory, use the DevOps API to start the data import process.

During the import process, Astra DB Sideloader does the following:

  1. Revokes access to the migration directory.

    You cannot read or write to the migration directory after starting the data import process.

  2. Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets.

  3. Runs validation checks on each subset.

  4. Converts all SSTables of each subset.

  5. Disables new compactions on the target database.

    This is the last point at which you can abort the migration.

    Once Astra DB Sideloader begins to import SSTable metadata (the next step), you cannot stop the migration.

  6. Imports metadata from each SSTable.

    If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. Since compaction is disabled, there is no risk of permanent inconsistencies. However, in the context of Zero Downtime Migration, it’s important that the ZDM proxy continues to read from the origin cluster.

  7. Re-enables compactions on the Astra DB Serverless database.

Each step must finish successfully. If one step fails, the import operation stops and no data is imported into your target database.

If all steps finish successfully, the migration is complete and you can access the imported data in your target database.

If necessary, you can pause or abort the migration during the import process.

You can abort a migration up until the point at which Astra DB Sideloader starts importing SSTable metadata. After this point, you must wait for the migration to finish, and then you can use the CQL shell to drop the keyspace/table in your target database before repeating the entire migration procedure.

  1. Use the DevOps API to launch the data import:

    curl -X POST \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID}/launch \
        | jq .

    Although this call returns immediately, the import process takes time.

  2. Check the migration status periodically:

    curl -X GET \
        -H "Authorization: Bearer ${token}" \
        https://api.astra.datastax.com/v2/databases/${dbID}/migrations/${migrationID} \
        | jq .

    A successful response contains a MigrationStatus object. It can take a few minutes for the DevOps API to reflect status changes during a migration. Immediately calling this endpoint after starting a new phase of the migration might not return the actual current status.

  3. Check the status field in the response:

    • "status": "ImportInProgress": The data is still being imported. Wait a few minutes before you check the status again.

    • "status": "MigrationDone": The import is complete, and you can proceed to Validate the migrated data.

  4. If the migration takes more than a few days, manually reschedule the cleanup to avoid automatic cleanup.

  5. If the migration fails, see Troubleshoot Astra DB Sideloader.

Validate the migrated data

After the migration is complete, you can query the migrated data using the CQL shell or Data API.

You can run Cassandra Data Migrator (CDM) in validation mode for more thorough validation. CDM also offers an AutoCorrect mode to reconcile any differences that it detects.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com