Components
The main component of the DataStax Zero Downtime Migration product suite is ZDM Proxy, which by design is a simple and lightweight proxy that handles all the real-time requests generated by your client applications.
ZDM Proxy is open-source software (OSS) and available in its Public GitHub repo. You can view the source files and contribute code for potential inclusion via Pull Requests (PRs) initiated on a fork of the repo.
The ZDM Proxy itself doesn’t have any capability to migrate data or knowledge that a migration may be ongoing, and it is not coupled to the migration process in any way.
-
DataStax Zero Downtime Migration also provides the ZDM Utility and ZDM Proxy Automation to set up and run the Ansible playbooks that deploy and manage the ZDM Proxy and its monitoring stack.
-
Multiple data migration tools such as Cassandra Data Migrator and DSBulk Migrator are available.
Role of ZDM Proxy
DataStax created ZDM Proxy to function between the application and both the origin and target databases. The databases can be any CQL-compatible data store, such as Apache Cassandra®, DataStax Enterprise (DSE), and Astra DB. The proxy always sends every write operation (Insert, Update, Delete) synchronously to both clusters at the desired Consistency Level:
-
If the write is successful in both clusters, it returns a successful acknowledgement to the client application.
-
If the write fails on either cluster, the failure is passed back to the client application so that it can retry it as appropriate, based on its own retry policy.
This design ensures that new data is always written to both clusters, and that any failure on either cluster is always made visible to the client application. ZDM Proxy also sends all reads to the primary cluster, and then returns the result to the client application. The primary cluster is initially the origin cluster, and you change it to the target cluster at the end of the migration process.
ZDM Proxy is designed to be highly available. It can be scaled horizontally, so typical deployments are made up of a minimum of 3 servers. ZDM Proxy can be restarted in a rolling fashion, for example, to change configuration for different phases of the migration.
ZDM Proxy has been designed to run in a clustered fashion so that it is never a single point of failure. Unless it is for a demo or local testing environment, a ZDM Proxy deployment should always comprise multiple ZDM Proxy instances. The term ZDM Proxy indicates the whole deployment, and ZDM Proxy instance refers to an individual proxy process in the deployment. |
Key features of ZDM Proxy
-
Allows you to lift-and-shift existing application code from your origin cluster to your target cluster by changing only the connection string.
-
Reduces risks to upgrades and migrations by decoupling the origin cluster from the target cluster, and allowing you to determine an explicit cut-over point once you’re ready to commit to using the target cluster permanently.
-
Bifurcates writes synchronously to both clusters during the migration process.
-
Returns (for read operations) the response from the primary cluster, which is its designated source of truth. During a migration, the primary cluster is typically the origin cluster. Near the end of the migration, you shift the primary cluster to be the target cluster.
-
Can be configured to also read asynchronously from the target cluster. This capability is called Asynchronous Dual Reads (also known as Read Mirroring), and it allows you to observe what read latencies and throughput the target cluster can achieve under the actual production load.
-
Results from the asynchronous reads executed on the target cluster are not sent back to the client application.
-
This design implies that a failure on asynchronous reads from the target cluster does not cause an error on the client application.
-
Asynchronous dual reads can be enabled and disabled dynamically with a rolling restart of the ZDM Proxy instances.
-
When using Asynchronous Dual Reads, any additional read load on the target cluster may impact its ability to keep up with writes. This behavior is expected and desired. The idea is to mimic the full read and write load on the target cluster so there are no surprises during the last migration phase; that is, after cutting over completely to the target cluster. |
ZDM Utility and ZDM Proxy Automation
Ansible is a suite of software tools that enables infrastructure as code. It is open source and its capabilities include software provisioning, configuration management, and application deployment functionality.
The Ansible automation for ZDM is organized into playbooks, each implementing a specific operation. The machine from which the playbooks are run is known as the Ansible Control Host. In ZDM, the Ansible Control Host will run as a Docker container.
You will use the ZDM Utility to set up Ansible in a Docker container, and ZDM Proxy Automation to run the Ansible playbooks from the Docker container created by ZDM Utility. In other words,the ZDM Utility creates the Docker container acting as the Ansible Control Host, from which the ZDM Proxy Automation allows you to deploy and manage the ZDM Proxy instances and the associated monitoring stack - Prometheus metrics and Grafana visualization of the metric data.
ZDM Utility and ZDM Proxy Automation expect that you have already provisioned the recommended infrastructure, as outlined in Deployment and infrastructure considerations.
The source for both of these tools are in a public repo.
For details, see:
Data migration tools
As part of the overall migration process, you can use Cassandra Data Migrator and/or DSBulk Migrator to migrate your data. Other technologies such as Apache Spark™ can be used to write your own custom data migration process.
Cassandra Data Migrator
To use Cassandra Data Migrator, the schema on your origin and target clusters must match. |
Use Cassandra Data Migrator to:
-
Migrate your data from any CQL-supported origin cluster to any CQL-supported target cluster. Examples of databases that support CQL are Apache Cassandra®, DataStax Enterprise (DSE), and Astra DB.
-
Validate migration accuracy and performance using examples that provide a smaller, randomized data set.
-
Preserve internal
writetime
timestamps and Time To Live (TTL) values. -
Take advantage of advanced data types (Sets, Lists, Maps, UDTs).
-
Filter records from the origin cluster’s data, using Cassandra’s internal
writetime
timestamp. -
Use SSL Support, including custom cipher algorithms.
Cassandra Data Migrator is designed to:
-
Connect to and compare your target database/cluster with the origin database/cluster.
-
Report differences in a detailed log file.
-
Optionally reconcile any missing records and fix any data inconsistencies in the target cluster by enabling
autocorrect
in a config file.
DSBulk Migrator
You can also take advantage of DSBulk Migrator to migrate smaller sets of data.
For more about both tools, see Phase 2: Migrate and validate data.