Migrate to DataStax Enterprise

DataStax Enterprise (DSE) is designed to handle large-scale data with high performance. Migrating data to DSE can improve the speed and efficiency of data processing and querying, especially for applications requiring high throughput and low latency.

DataStax provides and supports tools that you can use for data-only migrations or full-scale platform migrations:

Data-only migrations: Copy, stream, or bulk load data into DSE databases. For example, write data to a DSE database from a streaming source such as Apache Kafka™.
Platform migrations: Change your applications to use your DSE databases. This is a comprehensive migration that requires migrating data to a new DSE cluster, and then updating your applications to connect to your new databases using DSE-compatible APIs and libraries.

For platform migrations, you can use Zero Downtime Migration (ZDM) tools or perform an in-place upgrade.

Pre-migration data modeling

Before moving data to DSE, consider how your client application will query the tables. DataStax recommends pre-migration data modeling, particularly in scenarios where data types or query patterns might change, such as the following:

Changing the DDL, either before migration or with an ETL tool.
Moving from a relational database management system (RDBMS) to DSE.
Upgrading or changing the APIs or libraries that your applications use to access your data.

For example, the paradigm shift between relational databases and NoSQL means that moving data directly from an RDBMS to DSE will fail.

Cluster compatibility

Technically, you can migrate any data into DSE, but some sources are more compatible with DSE than others.

Because DSE is based on Apache Cassandra®, it expects data to be in a format that is compatible with Cassandra table schemas.

Migrations from open-source Apache Cassandra® and other Cassandra-based, NoSQL databases are the most compatible sources because they share the same foundational architecture as DSE. However, you must carefully evaluate the compatibility of your source database with DSE before migrating data. For example, when migrating from open-source Cassandra, make sure your version is compatible with DSE. If it isn’t, you must determine the differences between your version and DSE to ensure that your migration succeeds. Before migrating to DSE, you might need to modify your schema, disable incompatible features, or upgrade your platform to a compatible version.

Migrations from RDBMS and other non-Cassandra sources are more complex because of differences in data models, schemas, and query patterns. You might need to redesign your data models, prepare the data before migrating it, or manipulate the data during the migration process.

If your source data is schemaless or semi-structured, you can use techniques like super shredding to flatten, normalize, and map schemaless or semi-structured JSON/CSV data into a Cassandra-compatible fixed schema. Then, you can load the data into DSE with the DSBulk Loader or other data migration tools. However, super shredding can be complex and cumbersome, depending on the structure (or lack thereof) of the source data.

Data migration tools

DataStax offers several solutions for migrating data from other databases and data sources. Most of these tools require that the source data is in a DSE-compatible format, such as data from another NoSQL database, CSV, or JSON. If your source data is in an incompatible format, you can use an ETL tool to manipulate the data before writing it to your DSE cluster.

You can use these tools to write data to DSE databases only, or you can use them to support a full-scale platform migration.

Cassandra Data Migrator (CDM): Migrate and validate tables between origin Apache Cassandra® clusters and target DSE clusters, with available logging and reconciliation support.

You can use CDM alone or in conjunction with the Zero-Downtime Migration (ZDM) tools.

DataStax recommends CDM for large-scale and sensitive migrations because of its validation and reconciliation features.
DataStax Bulk Loader (dsbulk): Extract and load CSV and JSON files containing Cassandra table data. You can use DSBulk to move data between compatible NoSQL databases, including Cassandra and DSE, as long as the source and target schemas are compatible.

cqlsh COPY: Use these commands to read and write data in CSV format.

The COPY commands mirror what the PostgreSQL RDBMS uses for file import and export. When moving from an RDBMS, typically the RDBMS has unload utilities for writing table data to a file system.

sstableloader: Bulk load external data into a cluster, similar to DSBulk.

Streaming connectors

These tools are designed for data streaming use cases, where data is continuously ingested from a source into DSE databases:

The DataStax Apache Pulsar™ connector is open-source software installed in the Pulsar IO framework. The connector synchronizes records from a Pulsar topic with table rows in a DSE or Apache Cassandra database.
The DataStax Apache Kafka™ connector synchronizes records from a Kafka topic with rows in one or more DSE database tables.

ETL tools

If you need to change the source data before writing it to your DSE cluster, you can use an extract, transform, load (ETL) solution that is compatible with DSE, such as tools from Talend, Informatica, and Streamsets.

ETL tools provide transformation routines for manipulating source data before loading it into a DSE cluster, as well as other helpful features, such as visual interfaces and scheduling engines. This is useful when your source schema doesn’t match your target DSE schema, when you need to change data types or formats, or your source cluster is wholly incompatible with DSE, such as an RDBMS.

If you are performing a platform migration where you plan to stop using the source database completely, be aware that ETL tools can require some application downtime to run the ETL pipeline and move the data to the new cluster. Additionally, ETL pipelines aren’t compatible with the ZDM tools if the clusters have different schemas because the ZDM tools must send the same CQL read/write statements to both clusters.

Platform migration options

For platform migrations where the source and target schemas are compatible, you can use the ZDM tools or perform an in-place upgrade.

If your schemas aren’t compatible, you must use an ETL tool to manipulate the data before writing it to your new DSE cluster. This type of migration requires more manual oversight and planning because there will be some downtime while you stop writes, run the ETL pipeline, and switch your applications to the new cluster.

Zero Downtime Migration (ZDM)

DataStax strongly recommends using the ZDM tools for data migrations whenever possible.

For supported migration paths, see Cluster compatibility for Zero Downtime Migration.

The ZDM tools provide the safest upgrade approach with blue-green deployment capabilities that eliminate time pressure and ensure optimal availability and operational safety.

Here’s how the ZDM process works:

Set up your new, empty DSE cluster separate from your existing cluster.

The ZDM tools minimize risk and complexity by isolating the source and target clusters. You can configure your new cluster as needed from the start, without having to delicately orchestrate reconfiguration on a single cluster.
Use the ZDM tools to orchestrate live reads and writes while you use a data migration tool to replicate your existing data on the new cluster.
Validate the data on the new cluster and simulate production workloads before permanently switching your traffic to the new cluster.

Your original cluster remains running during the entire process, allowing you to seamlessly stop the migration at any point up until the last phase when you stop sending traffic to the original cluster.

Dual writes ensure that your new cluster doesn’t miss ongoing writes during the migration process, and you can maintain the ZDM proxy state as long as you need to validate and test the new cluster before switching traffic.

To learn more and get started on your zero downtime migration to DSE, see DataStax migration tools.

In-place upgrades

DataStax recommends that you use this option only if you cannot use the ZDM tools.

In-place upgrades replace the database platform on your current cluster with DSE without moving your data.

However, in-place upgrades are riskier, require downtime, and involve systematic manual reconfiguration of the cluster before, during, and after the migration.

There is a higher risk of data loss or corruption due to limited rollback options and the complexity of the upgrade process. Because this option manipulates a single cluster, a rollback requires that you revert to the previous platform version, and then restore a backup of your data. Any data written to the cluster after the backup was taken is lost.

In contrast, the ZDM tools isolate the source and target clusters, allowing you to cleanly discard the target cluster if something goes wrong during the migration process.

In-place upgrades are available only for specific migration paths between open-source Apache Cassandra and DSE. For more information and instructions, see Upgrade Apache Cassandra® to DataStax Enterprise.

Migrate your applications

Platform migrations require code changes.

At minimum, you must update your application’s connection strings to point to your new DSE cluster.

Aside from the database connection, your code might not require any other changes if you already use a compatible driver and CQL statements. Additional changes depend on the differences between your source database and DSE, such as changes to query syntax, data types, APIs or libraries, and enabling DSE-specific features.

Migrate to DSE Advanced Workloads

After migrating to DSE, you might need to take additional steps to prepare some of your data for use with DSE Advanced Workloads, like DSE Analytics and DSE Graph.

Migrate graph data to DSE Graph:
- Migrate to DSE Graph from a relational database
- Migrate to DSE Graph from Apache Cassandra
Migrate data with DSE Analytics:
- DSE Analytics can use Apache Spark™ to connect to a wide variety of data sources and save the data to DSE by using either the older RDD or newer DataFrame method.

Migrate Cassandra clusters to Mission Control

Get support for your migration

If you need help planning or executing your migration to DSE, contact your DataStax account representative or DataStax Support.

If you have a subscription to IBM Elite Support for Apache Cassandra, contact IBM Elite Support or your account representative to see if your plan includes migration assistance.

Migrate to DataStax Enterprise

Pre-migration data modeling

Cluster compatibility

Data migration tools

Streaming connectors

ETL tools

Platform migration options

Zero Downtime Migration (ZDM)

In-place upgrades

Migrate your applications

Migrate to DSE Advanced Workloads

Migrate Cassandra clusters to Mission Control

Get support for your migration

Was this helpful?

Give Feedback