Compare DataStax migration tools

The DataStax Zero Downtime Migration (ZDM) toolkit includes ZDM Proxy, ZDM Utility, ZDM Proxy Automation, and several data migration tools.

For live migrations, ZDM Proxy orchestrates activity-in-transition on your clusters. ZDM Utility and ZDM Proxy Automation facilitate the deployment and management of ZDM Proxy.

To move and validate data, you use data migration tools. You can use these tools alone or with ZDM Proxy.

ZDM Proxy

The main component of the DataStax Zero Downtime Migration toolkit is ZDM Proxy, which is designed to be a lightweight proxy that handles all real-time requests generated by your client applications during the migration process. This tool is open-source software.

ZDM Proxy is an orchestrator for monitoring application activity and keeping multiple clusters (databases) in sync through dual writes. ZDM Proxy isn’t linked to the actual migration process. It doesn’t perform data migrations and it doesn’t have awareness of ongoing migrations. Instead, you use a data migration tool to perform the data migration and validate migrated data.

ZDM Proxy reduces risks to upgrades and migrations by decoupling the origin (source) cluster from the target (destination) cluster and maintaining consistency between both clusters. You decide when you want to switch permanently to the target cluster.

After migrating your data, changes to your application code are usually minimal, depending on your client’s compatibility with the origin and target clusters. Typically, you only need to update the connection string.

How ZDM Proxy handles reads and writes

DataStax created ZDM Proxy to orchestrate requests between a client application and both the origin and target clusters. These clusters can be any CQL-compatible data store, such as Apache Cassandra®, DataStax Enterprise (DSE), and Astra DB.

During the migration process, you designate one cluster as the primary cluster, which serves as the source of truth for reads. For the majority of the migration process, this is typically the origin cluster. Towards the end of the migration process, when you are ready to read exclusively from your target cluster, you set the target cluster as the primary cluster.

The other cluster is referred to as the secondary cluster. While ZDM Proxy is active, write requests are sent to both clusters to ensure data consistency, but only the primary cluster serves read requests.

Writes (dual-write logic)

ZDM Proxy sends every write operation (INSERT, UPDATE, DELETE) synchronously to both clusters at the client application’s requested consistency level:

If the write is acknowledged in both clusters at the requested consistency level, then the operation returns a successful write acknowledgement to the client that issued the request.
If the write fails in either cluster, then ZDM Proxy passes a write failure, originating from the primary cluster, back to the client. The client can then retry the request, if appropriate, based on the client’s retry policy.

This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application.

For information about how ZDM Proxy handles Lightweight Transactions (LWTs), see Lightweight Transactions and the applied flag.

Reads

By default, ZDM Proxy sends all reads to the primary cluster, and then returns the result to the client application.

If you enable asynchronous dual reads, ZDM Proxy sends asynchronous read requests to the secondary cluster (typically the target cluster) in addition to the synchronous read requests that are sent to the primary cluster.

This feature is designed to test the target cluster’s ability to handle a production workload before you permanently switch to the target cluster at the end of the migration process.

With or without asynchronous dual reads, the client application only receives results from synchronous reads on the primary cluster. The results of asynchronous reads aren’t returned to the client because asynchronous reads are for testing purposes only.

For more information, see Phase 3: Enable asynchronous dual reads.

Consistency levels

ZDM Proxy doesn’t directly manage or track consistency levels. Instead, it passes the requested consistency level from the client application to each cluster (origin and target) when routing requests.

For reads, the consistency level is always passed to the primary cluster, which always receives read requests. The request is then executed within the primary cluster at the requested consistency level.

If asynchronous dual reads are enabled, the consistency level is passed to both clusters, and each cluster executes the read request at the requested consistency level independently. If the request fails to attain the required quorum on the primary cluster, the failure is returned to the client application as normal. However, failure of asynchronous reads on the secondary cluster are logged but not returned to the client application.

For writes, the consistency level is passed to both clusters, and each cluster executes the write request at the requested consistency level independently. If either request fails to attain the required quorum, the failure is returned to the client application as normal.

If either cluster is an Astra DB database, be aware that CL.ONE isn’t supported by Astra. Requests sent with CL.ONE to Astra DB databases always fail. ZDM Proxy doesn’t mute these failures because you need to be aware of them. You must adapt your client application to use a consistency level that is supported by both clusters to ensure that the migration is seamless and error-free.

Timeouts and connection failures

When requests are routed through ZDM Proxy, there is a proxy-side timeout and application-side timeout.

If a response isn’t received within the timeout period (zdm_proxy_request_timeout_ms), nothing is returned to the request handling thread, and, by extension, no response is sent to the client. This inevitably results in a client-side timeout, which is an accurate representation of the fact that at least one cluster failed to respond to the request. The clusters that are required to respond depend on the type of request and whether asynchronous dual reads are enabled.

High availability and multiple ZDM Proxy instances

ZDM Proxy is designed to be highly available and run a clustered fashion to avoid a single point of failure.

With the exception of local test environments, DataStax recommends that all ZDM Proxy deployments have multiple ZDM Proxy instances. Deployments typically consist of three or more instances.

Throughout the ZDM documentation, the term ZDM Proxy deployment refers to the entire deployment, and ZDM Proxy instance refers to an individual proxy process in the deployment.

You can scale ZDM Proxy instances horizontally and vertically. To avoid downtime when applying configuration changes, you can perform a rolling restart of your ZDM Proxy instances.

For simplicity, you can use ZDM Utility and ZDM Proxy Automation to set up and run Ansible playbooks that deploy and manage ZDM Proxy and its monitoring stack.

ZDM Utility and ZDM Proxy Automation

You can use ZDM Utility and ZDM Proxy Automation to set up and run Ansible playbooks that deploy and manage multiple ZDM Proxy instances and the associated monitoring stack (Prometheus metrics and associated Grafana visualizations).

Ansible is a suite of software tools that enable infrastructure as code. It is open source, and its capabilities include software provisioning, configuration management, and application deployment.

Ansible playbooks streamline and automate the deployment and management of ZDM Proxy instances and their monitoring components. Playbooks are YAML files that define a series of tasks to be executed on one or more remote machines, including installing software, configuring settings, and managing services. They are repeatable and reusable, and they simplify deployment and configuration management because each playbook focuses on a specific operation, such as rolling restarts.

You run playbooks from a centralized machine known as the Ansible Control Host. ZDM Utility, which is included with ZDM Proxy Automation, creates the Docker container that acts as the Ansible Control Host.

To use ZDM Utility and ZDM Proxy Automation, you must prepare the recommended infrastructure.

For more information about the role of Ansible and Ansible playbooks in the ZDM process, see Set up ZDM Proxy Automation with ZDM Utility and Deploy ZDM Proxy and monitoring.

Data migration tools

You use data migration tools to move data between clusters and validate the migrated data.

You can use these tools alone or with ZDM Proxy.

Astra DB Sideloader

Astra DB Sideloader is a service running in Astra DB that imports data from snapshots of your existing Cassandra-based cluster. This tool is exclusively for migrations that move data to Astra DB.

For more information, see About Astra DB Sideloader.

Cassandra Data Migrator

You can use Cassandra Data Migrator (CDM) for data migration and validation between Apache Cassandra®-based databases. It offers extensive functionality and configuration options to support large and complex migrations as well as post-migration data validation.

You can use CDM by itself, with ZDM Proxy, or for data validation after using another data migration tool.

For more information, see Use Cassandra Data Migrator with ZDM Proxy.

DSBulk Migrator

DSBulk Migrator extends DSBulk Loader with migration-specific commands: migrate-live, generate-script, and generate-ddl.

It is best for smaller migrations or migrations that don’t require extensive data validation, aside from post-migration row counts.

You can use DSBulk Migrator alone or with ZDM Proxy.

For more information, see Use DSBulk Migrator with ZDM Proxy.

Other data migration processes

Depending on your origin and target databases, there might be other data migration tools available for your migration. For example, if you want to write your own custom data migration processes, you can use a tool like Apache Spark™.

To use a data migration tool with ZDM Proxy, it must meet the following requirements:

Built-in data validation functionality or compatibility with another data validation tool, such as CDM.
Avoids or minimizes changes to your data model, including column names and data types.

Because ZDM Proxy requires that both databases can successfully process the same read/write statements, migrations that perform significant data transformations might not be compatible with ZDM Proxy. The impact of data transformations depends on your specific data model, database platforms, and the scale of your migration.

For data-only migrations that aren’t concerned with live application traffic or minimizing downtime, your chosen tool depends on your origin and target databases, the compatibility of the data models, and the scale of your migration. Describing the full range of these tools is beyond the scope of this document, which focuses on full-scale platform migrations with the ZDM tools and verified ZDM-compatible data migration tools.

In-place migrations

In-place migrations carry a higher risk of data loss or corruption, require progressive manual reconfiguration of the cluster, and are more cumbersome to rollback compared to the ZDM process.

Whenever possible, DataStax recommends using the ZDM process to orchestrate live migrations between separate clusters, which eliminates the need for progressive configuration changes, and allows you to seamlessly rollback to your origin cluster if there is a problem during the migration.

For certain migration paths, it is possible to perform in-place database platform replacements on the same cluster where you data already exists. Supported paths for in-place migrations include Apache Cassandra to DSE and DSE to HCD.

Compare DataStax migration tools

ZDM Proxy

How ZDM Proxy handles reads and writes

Writes (dual-write logic)

Reads

Consistency levels

Timeouts and connection failures

High availability and multiple ZDM Proxy instances

ZDM Utility and ZDM Proxy Automation

Data migration tools

Astra DB Sideloader

Cassandra Data Migrator

DSBulk Migrator

Other data migration processes

In-place migrations

See also

Was this helpful?

Give Feedback