Overview

Cassandra Data Migrator (CDM) is a tool designed for migrating and validating data between origin and target Apache Cassandra-compatible clusters. It facilitates the transfer of data, creating multiple jobs at once that can access the Cassandra cluster concurrently. This tool is also useful when dealing with large datasets and requires careful configuration to balance performance impact and migration speed.

The information below explains how to get started with CDM. Review your prerequisites and decide between the two installation options: as a container or as a JAR file.

Cassandra Data Migrator prerequisites

Read the prerequisites below before using the Cassandra Data Migrator.

  • Install or switch to Java 11. The Spark binaries are compiled with this version of Java.

  • Select a single VM to run this job and install Spark 3.5.3 there. No cluster is necessary for most one-time migrations. However, Spark cluster mode is also supported for complex migrations.

  • Optionally, install Maven 3.9.x if you want to build the JAR for local development.

Run the following commands to install Apache Spark:

wget https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz

tar -xvzf spark-3.5.3-bin-hadoop3-scala2.13.tgz

Cassandra Data Migrator installation methods

Both installation methods require attention to version compatibility, especially with the cdm.properties files. Both environments also use spark-submit to run the jobs.

Install Cassandra Data Migrator as a Container

Get the latest image that includes all dependencies from DockerHub.

All migration tools, cassandra-data-migrator and dsbulk and cqlsh, are available in the /assets/ folder of the container.

Install Cassandra Data Migrator as a JAR file

Download the latest JAR file from the Cassandra Data Migrator GitHub repo. Latest release

Version 4.x of Cassandra Data Migrator is not backward-compatible with *.properties files created in previous versions, and package names have changed. If you’re starting new, use the latest released version if possible.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com