Prepare to use Astra DB Sideloader

Before you use Astra DB Sideloader, review the requirements and prepare your target database, origin cluster, and administration server.

Due to the nature of the Astra DB Sideloader process and the tools involved, you need to be familiar with using the command line, including the following:

Installing and using CLI tools
Issuing curl commands
Basic scripting
Modifying example commands to fit your environment
Security best practices

The Astra DB Sideloader process uses authentication credentials to write to the migration directory and your database.

Make sure you understand how to securely store and use sensitive credentials when working on the command line.

Target Astra DB database requirements

Your Astra organization must be on an Enterprise subscription plan.

Astra DB Sideloader is a premium feature that incurs costs based on usage. This includes the total amount (GB) of data processed as part of the Astra DB Sideloader workload, and the amount of data stored in the migration bucket is metered at the standard Astra DB storage rate.

Migration directories are automatically cleaned up after one week of idle time.

To minimize costs, you can manually clean up migration directories when you no longer need them.
Your target database must be an Astra DB Serverless database.

If you don’t already have one, create a database. You can use either a Serverless (Non-Vector) or Serverless (Vector) database.

Serverless (Vector) databases can store both vector and non-vector data.

Your target database must be in a Provisioned Capacity Unit (PCU) group. You can use either a flexible capacity PCU group or a committed capacity PCU group, depending on your long-term needs and other PCU group usage.

Flexible capacity PCU group
Committed capacity PCU group

Because Astra DB Sideloader operations are typically short-term, resource-intensive events, you can create a flexible capacity PCU group exclusively to support your target database during the migration.

DataStax recommends the following flexible capacity PCU group configuration for Astra DB Sideloader migrations. For instructions, see Create a flexible capacity PCU group.

Target database is a Serverless (Non-Vector) database
Target database is a Serverless (Vector) database

Minimum capacity: One or more, depending on the scale of the migration.
Maximum capacity: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration.

For non-trivial migrations, consider setting the maximum to 10. For extremely large migrations, contact your DataStax account representative or DataStax Support to request more than 10 units to support your migration.

By default, Serverless (Vector) databases can have no more than one unit per PCU group. For any non-trivial migration, contact your DataStax account representative or DataStax Support for assistance configuring a PCU group for your target Serverless (Vector) database.

After the migration, you can move your target database out of the flexible capacity PCU group, and then park or delete the group. Don’t park the PCU group during the Astra DB Sideloader process because databases in a parked PCU group are hibernated and unavailable for use.

For a long-term PCU group option, you can use a committed capacity PCU group for your target database. This could be your database’s permanent PCU group assignment, or it could be a long-lived PCU group that you use for many migrations over time, adding and removing databases from the group as needed.

The Astra DB Sideloader process can be extremely resource intensive. If there are any other databases in the same PCU group, the migration process can affect their performance due to resource contention.

To avoid interfering with other databases in the same PCU group, DataStax recommends isolating the database during the migration using either a single-database committed capacity PCU group or a flexible capacity PCU group.

DataStax recommends the following committed capacity PCU group configuration for Astra DB Sideloader migrations. For instructions, see Create a committed capacity PCU group.

Target database is a Serverless (Non-Vector) database
Target database is a Serverless (Vector) database

Reserved capacity: One or more, depending on the PCU group’s normal, long-term workload requirements.

This is the amount of long-term capacity that you want the group to have after the migration is complete.
Minimum capacity: Equal to or greater than the reserved capacity.

If the minimum is greater than the reserved capacity, the surplus capacity is prepared in advance, and there is no autoscaling required to access that capacity.
Maximum capacity: Greater than the minimum by several units to allow autoscaling during resource intensive stages of the migration.

For non-trivial migrations, consider setting the maximum to 10. For extremely large migrations, contact your DataStax account representative or DataStax Support to request more than 10 units to support your migration.

After the migration, you can reduce the minimum and maximum capacity down to the levels required for normal database operations.

For more information, see Provisioned Capacity Units for Astra DB Serverless.

Origin cluster requirements

The following requirements, recommendations, and limitations apply to origin clusters. Review all of these to ensure that your cluster is compatible with Astra DB Sideloader.

Cluster infrastructure

Your origin cluster can be hosted on premises or on any cloud provider.
Your origin cluster must run a supported database version:
- DSE 5.1 or later
- HCD 1.1 or later
- Apache Cassandra® 3.11 through 4.x
  
  Cassandra 5.0 tables aren’t supported unless the cluster is configured to explicitly use the Cassandra 4.x storage format (storage_compatibility_mode: CASSANDRA_4).
Your origin cluster must use the default partitioner, Murmur3Partitioner.

Older partitioners, such as RandomPartitioner, ByteOrderedPartitioner, and OrderPreservingPartitioner, are not supported.

Cloud provider CLI

To upload snapshots directly from the origin cluster, you must install your cloud provider’s CLI on each node in the origin cluster.

The tool you install depends on the region where your target Astra DB database is deployed:

AWS: Install AWS CLI
Google Cloud: Install gcloud and install gsutil
Microsoft Azure: Install Azure CLI

Alternatively, you can upload copies of the snapshots from a separate staging server that has the CLI installed, and you must coordinate this through the administration server. However, this process isn’t covered in this guide. The CLI commands in this guide assume you have installed your cloud provider’s CLI on the nodes in the origin cluster. If you choose the alternative option, you must modify the commands accordingly for your environment.

Incompatible data

Astra DB doesn’t support materialized views: You must replace these with SAI or an alternative data model design.
Astra DB Sideloader doesn’t support encrypted data: If your origin cluster uses DSE Transparent Data Encryption, be aware that Astra DB Sideloader cannot migrate these SSTables.

If you have a mix of encrypted and unencrypted data, you can use Astra DB Sideloader to migrate the unencrypted data. After the initial migration, you can use another strategy to move the encrypted data, such as Cassandra Data Migrator (CDM) or a manual export and reupload.
Astra DB Sideloader doesn’t support secondary indexes: If you don’t remove or replace these in your origin cluster, Astra DB Sideloader ignores these directories when importing the data to your Astra DB database.

Administration server requirements

You need a server where you can run the Astra DB Sideloader commands.

Your administration server must have SSH access to each node in your origin cluster.

DataStax recommends that you install the following additional software on your administration server:

Cassandra Data Migrator (CDM) to validate imported data and, with ZDM Proxy, reconcile it with the origin cluster.
jq to format JSON responses from the Astra DevOps API. The DevOps API commands in this guide use this tool.

Additional preparation for specific migration scenarios

The following information can help you prepare for specific migration scenarios, including multi-region migrations and multiple migrations to the same database.

Minimum migration scope

To minimize data reconciliation issues, the recommended minimum migration scope is one CQL table across all nodes.

This means that a single migration process, from start to finish, should encapsulate the data for one CQL table as it exists on all of your origin nodes. For example, if you are migrating one table, you need to upload snapshots of all SSTables from all nodes for that CQL table.

Avoid breaking one table into multiple migrations because migrating a subset of SSTables for one CQL table will likely result in data loss, corruption, or resurrection of previously deleted data.

Each migration is performed separately, and each migration has no awareness of prior migrations. This means that data from later migrations can be incorrectly applied to the table. For example, if your first migration includes tombstones, that data could be resurrected if it is present in a subsequent migration from another node.

In contrast, if you use a single large migration to migrate all SSTables for a CQL table across all nodes, Astra DB can reconcile the data across all nodes, ensuring that your migration is accurate and complete.

Multi-region migrations

Multi-region migrations require additional planning and action depending on factors like the amount of data, number of regions, data consistency requirements, and orchestration of multi-region traffic.

Multi-region migrations can include one or more of the following scenarios:

Your origin cluster is deployed to multiple regions, with or without enforced consistency.
Your target database is, or will be, deployed to multiple regions.
You need to support multiple regions in a live migration scenario.
You are migrating to Astra DB, and you need to prepare to follow the eventual consistency model for multi-region databases.

For relatively small migrations to Astra DB with consistent data across all regions, you can migrate the primary region first, and then add additional regions after the initial migration is complete. This strategy allows Astra DB’s eventual consistency model to replicate the data from the primary region to the additional regions. However, this approach isn’t suitable for all migrations.

It is difficult to provide a one-size-fits-all solution for multi-region migrations due to the potential complexity and variability of these scenarios. For assistance planning a multi-region migration, contact your DataStax account representative or DataStax Support.

Multi-node migrations

You can migrate data from any number of nodes in your origin cluster to the same target database or multiple target databases.

When you migrate data with Astra DB Sideloader, there is no difference in the core process when migrating from one node or multiple nodes. The following steps summarize the process and outline some considerations for migrating multiple nodes.

Migrate multiple nodes to one database
Migrate multiple nodes to multiple databases

On your origin cluster, make sure your data is valid and ready to migrate, as explained in Origin cluster requirements.
From your origin cluster, create snapshots for all of the nodes that you want to migrate.

Run nodetool snapshot as many times as necessary to capture all of your nodes.

For important warnings about multi-node migrations, see Minimum migration scope.
On your target database, replicate the schemas for all tables that you want to migrate.

This is critical for a successful migration. If the schemas don’t match, the migration fails.

You don’t need to make any changes based on the number of nodes, as long as the keyspaces and table schemas are replicated in the target database.
Initialize the migration to prompt Astra DB Sideloader to create a migration bucket for your target database.
Upload all of your node snapshots to the migration bucket.
Use Astra DB Sideloader to import the data to your target database.

Astra DB Sideloader imports snapshots from the migration bucket to your target database based on the matching schemas. The number of node snapshots that you uploaded to the migration bucket doesn’t determine the success of the import. The success of the import depends primarily on the validity of the schemas and the data in the snapshots.
After the import, validate the migrated data to ensure that it matches the data in the origin cluster. For example, you can run Cassandra Data Migrator (CDM) in validation mode.

Orchestrating concurrent migrations from multiple nodes to multiple target databases can be complex.

Consider focusing on one target database at a time, or create a migration plan to track origin nodes, target databases, migration bucket credentials, and timelines for each migration.

On your origin cluster, make sure your data is valid and ready to migrate, as explained in Origin cluster requirements.
From your origin cluster, create snapshots for all of the nodes that you want to migrate.

Run nodetool snapshot as many times as necessary to capture all of your nodes.

For important warnings about multi-node migrations, see Minimum migration scope.
On each of your target databases, replicate the schemas for the tables that you want to migrate to each database.

This is critical for a successful migration. If the schemas don’t match, the migration fails.

You don’t need to make any changes based on the number of nodes, as long as the keyspaces and table schemas are replicated in the target databases.

If you want to migrate the same data to multiple databases, you must recreate the schemas in each of those databases. Astra DB Sideloader requires a schema to be present in the target database in order to migrate data.
For each target database, initialize a migration to prompt Astra DB Sideloader to create migration buckets for each database.

At minimum, you must initialize one migration for each database.
Upload the node snapshots to their corresponding migration buckets.
Use Astra DB Sideloader to import the data to your target databases.

You can import data to multiple databases at once, but each import event must be triggered separately using the unique migration ID.

Astra DB Sideloader imports snapshots from the migration bucket to your target database based on the matching schemas. The number of node snapshots that you uploaded to the migration bucket doesn’t determine the success of the import. The success of the import depends primarily on the validity of the schemas and the data in the snapshots.\
After the import, validate the migrated data to ensure that it matches the data in the origin cluster. For example, you can run Cassandra Data Migrator (CDM) in validation mode.

Multiple migrations to the same database

When you initialize a migration with Astra DB Sideloader, a unique migration ID is generated for that specific migration workflow. For each migration ID, there is a unique migration directory and migration directory credentials.

If you initialize multiple migrations for the same database, you generate multiple migration IDs, each with its own migration directory and credentials.

This can be useful for breaking large migrations into smaller batches. For example, if you have 100 snapshots, you could initialize 10 migrations, and then upload 10 different snapshots to each migration directory. However, don’t break one CQL table into multiple migrations, as explained in Minimum migration scope.

You can upload snapshots to multiple migration directories at once. However, when you reach the import phase of the migration, Astra DB Sideloader can import from only one migration directory at a time per database. For example, if you have 10 migration IDs for the same database, you must run 10 separate import actions. Each import must completely finish before starting the next import.

After all of the imports are complete, validate the migrated data in your target database to ensure that it matches the data in the origin cluster. For example, you can run Cassandra Data Migrator (CDM) in validation mode.

Next steps

Migrate data with Astra DB Sideloader