Compare DataStax migration tools
The DataStax Zero Downtime Migration (ZDM) toolkit includes ZDM Proxy, ZDM Utility, ZDM Proxy Automation, and several data migration tools.
For live migrations, ZDM Proxy orchestrates activity-in-transition on your clusters. ZDM Utility and ZDM Proxy Automation facilitate the deployment and management of ZDM Proxy.
To move and validate data, you use data migration tools. You can use these tools alone or with ZDM Proxy.
ZDM Proxy
The main component of the DataStax Zero Downtime Migration toolkit is ZDM Proxy, which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process. This tool is open-source software that is open for public contributions.
ZDM Proxy is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes. ZDM Proxy isn’t linked to the actual migration process. It doesn’t perform data migrations and it doesn’t have awareness of ongoing migrations. Instead, you use a data migration tool to perform the data migration and validate migrated data.
ZDM Proxy reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster and maintaining consistency between both clusters. You decide when you want to switch permanently to the target cluster.
After migrating your data, changes to your application code are usually minimal, depending on your client’s compatibility with the origin and target clusters. Typically, you only need to update the connection string.
How ZDM Proxy handles reads and writes
DataStax created ZDM Proxy to orchestrate requests between a client application and both the origin and target clusters. These clusters can be any CQL-compatible data store, such as Apache Cassandra®, DataStax Enterprise (DSE), and Astra DB.
During the migration process, you designate one cluster as the primary cluster, which serves as the source of truth for reads. For the majority of the migration process, this is typically the origin cluster. Towards the end of the migration process, when you are ready to read from your target cluster, you set the target cluster as the primary cluster.
Writes
ZDM Proxy sends every write operation (INSERT
, UPDATE
, DELETE
) synchronously to both clusters at the requested consistency level:
-
If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request.
-
If the write fails in either cluster, then ZDM Proxy passes a write failure, originating from the primary cluster, back to the client. The client can then retry the request, if appropriate, based on the client’s retry policy.
This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application.
For information about how ZDM Proxy handles lightweight transactions (LWTs), see Lightweight Transactions and the applied flag.
Reads
By default, ZDM Proxy sends all reads to the primary cluster, and then returns the result to the client application.
If you enable asynchronous dual reads, ZDM Proxy sends asynchronous read requests to the secondary cluster (typically the target cluster) in addition to the synchronous read requests that are sent to the primary cluster.
This feature is designed to test the target cluster’s ability to handle a production workload before you permanently switch to the target cluster at the end of the migration process.
With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster. The results of asynchronous reads aren’t returned to the client because asynchronous reads are for testing purposes only.
For more information, see Enable asynchronous dual reads.
High availability and multiple ZDM Proxy instances
ZDM Proxy is designed to be highly available and run a clustered fashion to avoid a single point of failure.
With the exception of local test environments, DataStax recommends that all ZDM Proxy deployments have multiple ZDM Proxy instances. Deployments typically consist of three or more instances.
Throughout the ZDM documentation, the term ZDM Proxy deployment refers to the entire deployment, and ZDM Proxy instance refers to an individual proxy process in the deployment. |
You can scale ZDM Proxy instances horizontally and vertically. To avoid downtime when applying configuration changes, you can perform rolling restarts on your ZDM Proxy instances.
For simplicity, you can use ZDM Utility and ZDM Proxy Automation to set up and run Ansible playbooks that deploy and manage ZDM Proxy and its monitoring stack.
ZDM Utility and ZDM Proxy Automation
You can use ZDM Utility and ZDM Proxy Automation to set up and run Ansible playbooks that deploy and manage ZDM Proxy and the associated monitoring stack.
Ansible is a suite of software tools that enables infrastructure as code. It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality. The Ansible automation for ZDM is organized into playbooks, each implementing a specific operation. The machine from which the playbooks are run is known as the Ansible Control Host. In ZDM, the Ansible Control Host runs as a Docker container.
You use ZDM Utility to set up Ansible in a Docker container, and then you use ZDM Proxy Automation to run the Ansible playbooks from the Docker container created by ZDM Utility.
ZDM Utility creates the Docker container acting as the Ansible Control Host, from which ZDM Proxy Automation allows you to deploy and manage the ZDM Proxy instances and the associated monitoring stack, which includes Prometheus metrics and Grafana visualizations of the metrics data.
To use ZDM Utility and ZDM Proxy Automation, you must prepare the recommended infrastructure, as explained in Deployment and infrastructure considerations.
For more information, see Set up ZDM Proxy Automation with ZDM Utility and Deploy ZDM Proxy and monitoring.
Data migration tools
You use data migration tools to move data between clusters and validate the migrated data.
You can use these tools alone or with ZDM Proxy.
Astra DB Sideloader
Astra DB Sideloader is a service running in Astra DB that imports data from snapshots of your existing Cassandra-based cluster. This tool is exclusively for migrations that move data to Astra DB.
For more information, see Use Astra DB Sideloader with ZDM Proxy.
Cassandra Data Migrator
You can use Cassandra Data Migrator (CDM) for data migration and validation between Apache Cassandra®-based databases. It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.
You can use CDM by itself, with ZDM Proxy, or for data validation after using another data migration tool.
For more information, see Use Cassandra Data Migrator with ZDM Proxy.
DSBulk Migrator
DSBulk Migrator extends DSBulk Loader with migration-specific commands: migrate-live
, generate-script
, and generate-ddl
.
It is best for smaller migrations or migrations that don’t require extensive data validation, aside from post-migration row counts.
You can use DSBulk Migrator alone or with ZDM Proxy.
For more information, see Use DSBulk Migrator with ZDM Proxy.
Custom data migration processes
If you want to write your own custom data migration processes, you can use a tool like Apache Spark™.