Compare DataStax migration tools

DataStax migration tools include the Zero Downtime Migration ZDM toolkit and three data migration tools.

ZDM is comprised of ZDM Proxy, ZDM Utility, and ZDM Proxy Automation, which orchestrate activity-in-transition on your clusters. To move and validate data, you use Astra DB Sideloader, Cassandra Data Migrator, or DSBulk Migrator.

You can also use Astra DB Sideloader, CDM, and DSBulk Migrator on their own, outside the context of ZDM.

ZDM Proxy

The main component of the DataStax Zero Downtime Migration toolkit is ZDM Proxy, which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process.

ZDM Proxy is open-source software that is available from the zdm-proxy GitHub repo. This project is open for public contributions.

The ZDM Proxy is an orchestrator for monitoring application activity and keeping multiple clusters in sync through dual writes. ZDM Proxy isn’t linked to the actual migration process. It doesn’t perform data migrations and it doesn’t have awareness of ongoing migrations. Instead, you use a data migration tool, like Astra DB Sideloader, Cassandra Data Migrator, or DSBulk Migrator, to perform the data migration and validate migrated data.

How ZDM Proxy works

DataStax created ZDM Proxy to function between the application and both the origin and target databases. The databases can be any CQL-compatible data store, such as Apache Cassandra®, DataStax Enterprise (DSE), and Astra DB. The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level:

  • If the write is successful in both clusters, it returns a successful acknowledgement to the client application.

  • If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy.

This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application. ZDM Proxy also sends all reads to the primary cluster, and then returns the result to the client application. The primary cluster is initially the origin cluster, and you change it to the target cluster at the end of the migration process.

ZDM Proxy is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers. ZDM Proxy can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration.

Key features of ZDM Proxy

  • Allows you to lift-and-shift existing application code from your origin cluster to your target cluster by changing only the connection string, if all else is compatible.

  • Reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster. You can determine an explicit cut-over point once you’re ready to commit to using the target cluster permanently.

  • Bifurcates writes synchronously to both clusters during the migration process.

  • Read operations return the response from the primary (origin) cluster, which is its designated source of truth.

    During a migration, the primary cluster is typically the origin cluster. Near the end of the migration, you shift the primary cluster to be the target cluster.

  • Option to read asynchronously from the target cluster as well as the origin cluster This capability is called Asynchronous Dual Reads or Read Mirroring, and it allows you to observe what read latencies and throughput the target cluster can achieve under the actual production load.

    • Results from the asynchronous reads executed on the target cluster are not sent back to the client application.

    • This design implies that a failure on asynchronous reads from the target cluster does not cause an error on the client application.

    • Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the ZDM Proxy instances.

When using Asynchronous Dual Reads, any additional read load on the target cluster may impact its ability to keep up with writes. This behavior is expected and desired. The idea is to mimic the full read and write load on the target cluster so there are no surprises during the last migration phase; that is, after cutting over completely to the target cluster.

Run multiple ZDM Proxy instances

ZDM Proxy has been designed to run in a clustered fashion so that it is never a single point of failure. Unless it is for a demo or local testing environment, a ZDM Proxy deployment should always comprise multiple ZDM Proxy instances.

Throughout the documentation, the term ZDM Proxy deployment refers to the entire deployment, and ZDM Proxy instance refers to an individual proxy process in the deployment.

You can use the ZDM Utility and ZDM Proxy Automation to set up and run Ansible playbooks that deploy and manage ZDM Proxy and its monitoring stack.

ZDM Utility and ZDM Proxy Automation

You can use the ZDM Utility and ZDM Proxy Automation to set up and run Ansible playbooks that deploy and manage ZDM Proxy and its monitoring stack.

Ansible is a suite of software tools that enables infrastructure as code. It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality. The Ansible automation for ZDM is organized into playbooks, each implementing a specific operation. The machine from which the playbooks are run is known as the Ansible Control Host. In ZDM, the Ansible Control Host runs as a Docker container.

You use the ZDM Utility to set up Ansible in a Docker container, and then you use ZDM Proxy Automation to run the Ansible playbooks from the Docker container created by ZDM Utility.

The ZDM Utility creates the Docker container acting as the Ansible Control Host, from which ZDM Proxy Automation allows you to deploy and manage the ZDM Proxy instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.

To use ZDM Utility and ZDM Proxy Automation, you must prepare the recommended infrastructure, as explained in Deployment and infrastructure considerations.

Astra DB Sideloader

Astra DB Sideloader is a service running in Astra DB that directly imports data from snapshots of your existing Cassandra-based cluster. This tool is exclusively for migrations that move data to Astra DB.

You can use Astra DB Sideloader alone or in the context of ZDM.

For more information, see Use Astra DB Sideloader with ZDM.

Cassandra Data Migrator

You can use Cassandra Data Migrator (CDM) to migrate and validate tables between Cassandra-based clusters. It is best for migrating large amounts of data and for migrations that need support for detailed logging, data verification, table column renaming, and reconciliation.

CDM offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.

You can use CDM by itself, in the context of ZDM, or for data validation after using another migration tool, such as Astra DB Sideloader.

For more information, see Use Cassandra Data Migrator with ZDM.

DSBulk Migrator

DSBulk Migrator is an extension of DSBulk Loader. It is best for smaller migrations or migrations that don’t require data validation during the migration process.

In addition to loading and unloading CSV and JSON data, you can use DSBulk Migrator to transfer data between databases. It can read data from a table in your origin database, and then write that data to a table in your target database.

You can use DSBulk Migrator alone or in the context of ZDM.

For more information, see Use DSBulk Migrator with ZDM.

Custom data migration processes

If you want to write your own custom data migration processes, you can use a tool like Apache Spark™.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com