Phase 2: Migrate and validate data
In Phase 1, you deployed ZDM Proxy instances to orchestrate live traffic to your origin and target clusters, and you connected your applications to the proxies.
In Phase 2 of ZDM, you migrate data from the origin to the target, and then validate the migrated data.
To move and validate data, you use a dedicated data migration tool. Recommended tools include Astra DB Sideloader, Cassandra Data Migrator (CDM), and DataStax Bulk Loader (DSBulk). You can also write your own custom data migration scripts or use other tools if these aren’t sufficient for your use case.
Astra DB Sideloader
This tool is exclusively for migrations that move data to Astra DB Serverless databases.
Astra DB Sideloader is a service running in Astra that imports data from snapshots of an existing DataStax Enterprise (DSE), Hyper-Converged Database (HCD), or open-source Apache Cassandra® cluster. Because it imports data directly, Astra DB Sideloader can offer several advantages over CQL-based tools like DSBulk and CDM, including faster, more cost-effective data loading, and minimal performance impacts on your origin cluster and target database.
To migrate data with Astra DB Sideloader, you use nodetool, a cloud provider’s CLI, and the Astra DevOps API:
-
nodetool: Create snapshots of your existing cluster. For compatible origin clusters, see Migrate to Astra DB Serverless. -
Cloud provider CLI: Upload snapshots to a dedicated cloud storage bucket for your migration.
-
Astra DevOps API: Run the Astra DB Sideloader commands to write the data from cloud storage to your Astra DB Serverless database.
You can use Astra DB Sideloader alone or with ZDM Proxy.
Astra DB Sideloader requires an Astra Enterprise plan, and it incurs costs based on usage.
To get started with this tool, see About Astra DB Sideloader.
Cassandra Data Migrator (CDM)
You can use CDM for data migration and validation between Cassandra-based databases. It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.
You can use CDM alone, with ZDM Proxy, or for data validation after using another data migration tool.
This tool is free, open-source software.
To get started with this tool, see the CDM repository.
DataStax Bulk Loader (DSBulk)
DSBulk is a high-performance data loading and unloading tool for Cassandra-based databases. You can use it to load, unload, and count records.
Because DSBulk doesn’t have the same data validation capabilities as CDM, it is best for migrations that don’t require extensive data validation, aside from post-migration row counts.
You can use DSBulk alone or with ZDM Proxy.
This tool is free, open-source software.
To get started with this tool, see About DataStax Bulk Loader (DSBulk).
|
The DSBulk Migrator tool, which was an extension of DataStax Bulk Loader (DSBulk), is deprecated. This tool is no longer recommended. Instead, use the unload, load, and count commands included with DSBulk, or use another data migration tool, such as CDM. |
Other data migration tools and processes
Depending on your origin and target databases, there might be other data migration tools available for your migration, or you can write your own custom data migration scripts that use tools like Apache Spark™.
Data migration tools can be used alone or with ZDM Proxy. However, to use a data migration tool with ZDM Proxy, it must meet the following requirements:
-
Built-in data validation functionality or compatibility with another data validation tool, such as CDM. This is crucial to a successful migration.
-
Preserves the data model, including column names and data types, so that ZDM Proxy can send the same read/write statements to both databases successfully.
Migrations that perform significant data transformations might not be compatible with ZDM Proxy. The impact of data transformations depends on your specific data model, database platforms, and the scale of your migration.
For data-only migrations that aren’t concerned with live application traffic or minimizing downtime, your chosen tool depends on your origin and target databases, the compatibility of the schemas, and the scale of your migration. Describing all possible data migration tools is beyond the scope of this document, which focuses on full-scale platform migrations with the ZDM tools and verified ZDM-compatible data migration tools.
For more information about data transformation and migration tools, see Migrate to Astra DB Serverless.
Next steps
|
Don’t proceed to Phase 3 until you have replicated all preexisting data from your origin cluster to your target cluster, and you have taken time to validate that the data was migrated correctly and completely. The success of your migration and future performance of the target cluster depends on correct and complete data. If your chosen data migration tool doesn’t have built-in validation features, you must use a separate tool for validation. |
After using copying and thoroughly validating your data on the target cluster, proceed to Phase 3 to test your target cluster’s production readiness.