About Astra DB Sideloader

Astra DB Sideloader is a service running in Astra DB that directly imports data from snapshot backups that you’ve uploaded to Astra DB from an existing Apache Cassandra®, DataStax Enterprise (DSE), or Hyper-Converged Database (HCD) cluster.

Because it imports data directly, Astra DB Sideloader can offer several advantages over CQL-based tools like DataStax Bulk Loader (DSBulk) and Cassandra Data Migrator (CDM), including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database.

Astra DB Sideloader concepts

Origin, origin cluster: In the context of Astra DB Sideloader, this refers to your existing Cassandra, DSE, or HCD cluster.
Target, target database: In the context of Astra DB Sideloader, this refers to the Astra DB Serverless database where you will migrate your data.
Administration server: A server where you run the migration commands, including CLI commands and Astra DevOps API calls. It must have SSH access to each node in your origin cluster.
Migration: A workflow that you initiate within Astra DB Sideloader that encompasses the lifecycle of uploading and importing snapshot backups of a specific set of keyspaces or CQL tables.

This process produces artifacts and parameters including migration buckets, migration IDs, migration directories, and upload credentials. You use these components throughout the migration workflow.

The Astra DB Sideloader process

Transferring data with Astra DB Sideloader is a multi-phase process. Before you use Astra DB Sideloader, learn about the events, outcomes, warnings, and requirements of each phase:

Prepare your infrastructure

There are requirements for using Astra DB Sideloader that you must consider before you start a migration. Additionally, you must take steps to prepare your target database, origin cluster, and administration server before you begin the migration.

For more information, see Prepare to use Astra DB Sideloader.

Create snapshot backups

Astra DB Sideloader uses snapshot backup files to import SSTable data from your existing origin cluster. Each snapshot for each node in the origin cluster must include all the keyspaces and individual CQL tables that you want to migrate.

These snapshots are ideal for database migrations because creating snapshots has a negligible performance impact on the origin cluster, and the snapshots preserve metadata like writetime and ttl values.

When using Astra DB Sideloader with ZDM Proxy, Cassandra’s last-write-wins semantics ensure that new, real-time writes accurately take precedence over historical writes. Last-write-wins compares the writetime of conflicting records, and then retains the most recent write. For example, if a new write occurs in your target database with a writetime of 2023-10-01T12:05:00Z, and then Astra DB Sideloader migrates a record against the same row with a writetime of 2023-10-01T12:00:00Z, the target database retains the data from the new write because it has the most recent writetime.

For more information, see Migrate data with Astra DB Sideloader: Create snapshots.

Prepare the target database

Because snapshots don’t store schema definitions, you must pre-configure the schema definition in your target Astra DB database so that it matches the origin cluster’s schema.

For the migration to succeed, the schema in your target database must align with the schema in the origin cluster. However, you might need to modify your schema or data model to be compatible with Astra DB.

For specific requirements and more information, see Migrate data with Astra DB Sideloader: Configure the target database.

Initialize a migration

After you create snapshots on the origin cluster and pre-configure the schema on the target database, use the Astra DevOps API to initialize the migration.

Astra DB Sideloader moves data from the migration bucket to Astra DB.

When you initialize a migration, Astra DB Sideloader does the following:

Creates a secure migration bucket.

The migration bucket is only created during the first initialization. All subsequent migrations use different directories in the same migration bucket.

DataStax owns the migration bucket, and it is located within the Astra perimeter.
Generates a migration ID that is unique to the new migration.
Creates a migration directory within the migration bucket that is unique to the new migration.

The migration directory is also referred to as the uploadBucketDir. In the next phase of the migration process, you will upload your snapshots to this migration directory.
Generates upload credentials that grant read/write access to the migration directory.

The credentials are formatted according to the cloud provider where your target database is deployed.

For instructions and more information, see Migrate data with Astra DB Sideloader: Initialize the migration.

Upload snapshots

When initialization is complete, use your cloud provider’s CLI to upload your snapshots to the migration directory.

To upload snapshots directly from the origin cluster, you must install your cloud provider’s CLI on each node in the origin cluster. While it is possible to orchestrate this process through a staging server, the commands given in this documentation assume you are uploading snapshots directly from the origin cluster.

The time required to upload the snapshots depends on the size of your dataset and the network throughput between the origin cluster and the migration bucket:

Speed Migration type Description

Speed	Migration type	Description
Fastest	Inter-datacenter	All else equal, snapshots take the least time to upload when the origin cluster is in the same cloud provider and region as the target database.
Fast	Cross-datacenter, co-located	Uploads are slower by default when they must exit the local datacenter. The delay increases relative to the physical distance between the datacenters. For example, all else equal, uploading from AWS `us-east-1` (Dulles, VA, USA) to AWS `ca-central-1` (Montréal, QC, Canada) is faster than uploading from `us-east-1` to `us-west-2` (The Dalles, OR, USA) because Oregon is significantly further from Virginia than Montréal.
Variable	Cross-provider, co-located	If the target database is in a different cloud provider than the origin cluster, the upload can be slower as the data passes from one provider’s infrastructure to another. This is considered a cross-datacenter transfer, and the delay increases relative to the physical distance between the datacenters.
Slowest	Transoceanic	The slowest uploads happen when the data must travel over transoceanic cables. If the data must also change cloud providers, there can be additional delays. In this case, consider creating your target database in a co-located datacenter, and then deploy your database to other regions after the migration.

Fastest

Inter-datacenter

All else equal, snapshots take the least time to upload when the origin cluster is in the same cloud provider and region as the target database.

Fast

Cross-datacenter, co-located

Uploads are slower by default when they must exit the local datacenter. The delay increases relative to the physical distance between the datacenters.

For example, all else equal, uploading from AWS us-east-1 (Dulles, VA, USA) to AWS ca-central-1 (Montréal, QC, Canada) is faster than uploading from us-east-1 to us-west-2 (The Dalles, OR, USA) because Oregon is significantly further from Virginia than Montréal.

Variable

Cross-provider, co-located

If the target database is in a different cloud provider than the origin cluster, the upload can be slower as the data passes from one provider’s infrastructure to another.

This is considered a cross-datacenter transfer, and the delay increases relative to the physical distance between the datacenters.

Slowest

Transoceanic

The slowest uploads happen when the data must travel over transoceanic cables. If the data must also change cloud providers, there can be additional delays.

In this case, consider creating your target database in a co-located datacenter, and then deploy your database to other regions after the migration.

Import data

After uploading the snapshots to the migration directory, use the DevOps API to start the data import process.

During the import process, Astra DB Sideloader does the following:

Revokes access to the migration directory.

You cannot read or write to the migration directory after starting the data import process.
Discovers all uploaded SSTables in the migration directory, and then groups them into approximately same-sized subsets.
Runs validation checks on each subset.
Converts all SSTables of each subset.
Disables new compactions on the target database.

This is the last point at which you can abort the migration.

Once Astra DB Sideloader begins to import SSTable metadata (the next step), you cannot stop the migration.
Imports metadata from each SSTable.

If the dataset contains tombstones, any read operations on the target database can return inconsistent results during this step. Since compaction is disabled, there is no risk of permanent inconsistencies. However, in the context of Zero Downtime Migration, it’s important that the ZDM proxy continues to read from the origin cluster.
Re-enables compactions on the Astra DB Serverless database.

Each step must finish successfully. If one step fails, the import operation stops and no data is imported into your target database.

If all steps finish successfully, the migration is complete and you can access the imported data in your target database.

For instructions and more information, see Migrate data with Astra DB Sideloader: Import data

Prepare to use Astra DB Sideloader

About Astra DB Sideloader

Astra DB Sideloader concepts

The Astra DB Sideloader process

Prepare your infrastructure

Create snapshot backups

Prepare the target database

Initialize a migration

Upload snapshots

Import data

Validate imported data

Use Astra DB Sideloader with ZDM Proxy

Next steps

Was this helpful?

Give Feedback