Overview
Cassandra Data Migrator (CDM) is a tool designed for migrating and validating data between origin and target Apache Cassandra-compatible clusters. It facilitates the transfer of data, creating multiple jobs at once that can access the Cassandra cluster concurrently. This tool is also useful when dealing with large datasets and requires careful configuration to balance performance impact and migration speed.
The information below explains how to get started with CDM. Review your prerequisites and decide between the two installation options: as a container or as a JAR file.
Cassandra Data Migrator prerequisites
Read the prerequisites below before using the Cassandra Data Migrator.
-
Install or switch to Java 11. The Spark binaries are compiled with this version of Java.
-
Select a single VM to run this job and install Spark 3.5.3 there. No cluster is necessary for most one-time migrations. However, Spark cluster mode is also supported for complex migrations.
-
Optionally, install Maven
3.9.x
if you want to build the JAR for local development.
Run the following commands to install Apache Spark:
wget https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz
tar -xvzf spark-3.5.3-bin-hadoop3-scala2.13.tgz
Cassandra Data Migrator installation methods
Both installation methods require attention to version compatibility, especially with the cdm.properties
files.
Both environments also use spark-submit
to run the jobs.
Install Cassandra Data Migrator as a Container
Get the latest image that includes all dependencies from DockerHub.
All migration tools, cassandra-data-migrator
and dsbulk
and cqlsh
, are available in the /assets/
folder of the container.
Install Cassandra Data Migrator as a JAR file
Download the latest JAR file from the Cassandra Data Migrator GitHub repo.
Version 4.x of Cassandra Data Migrator is not backward-compatible with |