Phase 2: Migrate and validate data

In Phase 1, you deployed ZDM Proxy instances to orchestrate live traffic to your origin and target clusters, and you connected your applications to the proxies.

In Phase 2 of ZDM, you migrate data from the origin to the target, and then validate the migrated data.

To move and validate data, you use a dedicated data migration tool. Recommended tools include Cassandra Data Migrator (CDM) and DataStax Bulk Loader (DSBulk). You can also write your own custom data migration scripts or use other tools if these aren’t sufficient for your use case.

Cassandra Data Migrator (CDM)

You can use CDM for data migration and validation between Cassandra-based databases. It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.

You can use CDM alone, with ZDM Proxy, or for data validation after using another data migration tool.

This tool is free, open-source software.

To get started with this tool, see the CDM repository.

DataStax Bulk Loader (DSBulk)

DSBulk is a high-performance data loading and unloading tool for Cassandra-based databases. You can use it to load, unload, and count records.

Because DSBulk doesn’t have the same data validation capabilities as CDM, it is best for migrations that don’t require extensive data validation, aside from post-migration row counts.

You can use DSBulk alone or with ZDM Proxy.

This tool is free, open-source software.

To get started with this tool, see About DataStax Bulk Loader (DSBulk).

The DSBulk Migrator tool, which was an extension of DataStax Bulk Loader (DSBulk), is deprecated. This tool is no longer recommended. Instead, use the unload, load, and count commands included with DSBulk, or use another data migration tool, such as CDM.

Other data migration tools and processes

Depending on your origin and target databases, there might be other data migration tools available for your migration, or you can write your own custom data migration scripts that use tools like Apache Spark™.

Data migration tools can be used alone or with ZDM Proxy. However, to use a data migration tool with ZDM Proxy, it must meet the following requirements:

Built-in data validation functionality or compatibility with another data validation tool, such as CDM. This is crucial to a successful migration.
Preserves the data model, including column names and data types, so that ZDM Proxy can send the same read/write statements to both databases successfully.

Migrations that perform significant data transformations might not be compatible with ZDM Proxy. The impact of data transformations depends on your specific data model, database platforms, and the scale of your migration.

For data-only migrations that aren’t concerned with live application traffic or minimizing downtime, your chosen tool depends on your origin and target databases, the compatibility of the schemas, and the scale of your migration. Describing all possible data migration tools is beyond the scope of this document, which focuses on full-scale platform migrations with the ZDM tools and verified ZDM-compatible data migration tools.

For more information about data transformation and migration tools, see Migrate to DataStax Enterprise (DSE).

Next steps

Don’t proceed to Phase 3 until you have replicated all preexisting data from your origin cluster to your target cluster, and you have taken time to validate that the data was migrated correctly and completely.

The success of your migration and future performance of the target cluster depends on correct and complete data.

If your chosen data migration tool doesn’t have built-in validation features, you must use a separate tool for validation.

After using copying and thoroughly validating your data on the target cluster, proceed to Phase 3 to test your target cluster’s production readiness.

Phase 2: Migrate and validate data

Cassandra Data Migrator (CDM)

DataStax Bulk Loader (DSBulk)

Other data migration tools and processes

Next steps

Was this helpful?

Give Feedback