Migrate to DataStax Enterprise
DataStax Enterprise (DSE) is designed to handle large-scale data with high performance.
Migrating data to DSE can improve the speed and efficiency of data processing and querying, especially for applications requiring high throughput and low latency.
DataStax Enterprise (DSE) provides and supports tools that you can use for two types of migrations:
-
Data transfer: Copy, stream, or bulk load data into your DSE databases. For example, write data to a DSE database from a streaming source such as Apache Kafka™.
-
Platform migration: Change your applications to use your DSE databases. Typically, this involves transferring data in addition to updating your client applications to use DSE APIs and libraries.
Before moving data to DSE, consider how your client application will query the tables. DataStax recommends pre-migration data modeling, particularly in scenarios where data types or query patterns might change, such as the following:
For example, the paradigm shift between relational databases and NoSQL means that moving data directly from an RDBMS to DSE will fail. |
Zero Downtime Migration (ZDM)
Use the Zero Downtime Migration (ZDM) tools when performing a platform migration where you need to maintain live traffic for your applications during the migration process.
The ZDM tools orchestrate traffic between two clusters while you use a data migration tool to copy data from one cluster to another. These tools provide a safe migration approach with blue-green deployment capabilities that eliminate time pressure and ensure optimal availability and operational safety.
Dual writes ensure that your new cluster doesn’t miss ongoing writes during the migration process, and you can maintain the ZDM proxy state as long as you need to validate and test the new cluster before switching traffic.
For more information and to get started with your zero downtime migration, see Phases of the Zero Downtime Migration process.
Data migration tools
DataStax Enterprise (DSE) uses several solutions for migrating data from other databases and data sources. You can use these tools on their own to write data to your DSE databases, or you can use them to support a full-scale platform migration.
-
The Apache Cassandra Data Migrator (CDM) includes extensive functionality and configuration options for large and complex migrations, including concurrent migration jobs and post-migration data validation.
-
Use the DataStax Bulk Loader (
dsbulk
) to load and unload CSV or JSON data in and out of a DSE database.Additionally, you can use DSBulk Migrator, which extends
dsbulk
with with migration commands, such asmigrate-live
andgenerate-ddl
. -
With
cqlsh
, use theCOPY
command to read CSV data to DSE and write CSV data from DSE to a file system.The
COPY
commands mirror what the PostgreSQL RDBMS uses for file import and export. When moving from an RDBMS, typically the RDBMS has unload utilities for writing table data to a file system. -
The
sstableloader
tool provides the ability to bulk load external data into a cluster. -
DSE Analytics can use Apache Spark™ to connect to a wide variety of data sources and save the data to DSE by using either the older RDD or newer DataFrame method.
-
The DataStax Apache Pulsar™ connector is open-source software (OSS) installed in the Pulsar IO framework. The connector synchronizes records from a Pulsar topic with table rows in DataStax Enterprise (DSE) and Apache Cassandra® databases.
-
The DataStax Apache Kafka™ connector synchronizes records from a Kafka topic with rows in one or more DSE database tables.
ETL tools
If you need to do more than extract and load data, you can use any extract, transform, load (ETL) solution that supports DSE, such as tools from Talend, Informatica, and Streamsets.
ETL tools provide transformation routines for manipulating source data before loading the data into a DSE database, as well as other helpful features, such as visual interfaces and scheduling engines.
Many ETL vendors who support DSE supply community editions of their products that are appropriate for many use cases. Enterprise editions are also available.
In platform migration situations where you want to stop using the source database completely, ETL tools can require some application downtime to run the ETL pipeline and move the data to the new cluster. Because the ZDM tools send the same CQL read/write statements to both clusters, ETL pipelines aren’t compatible with the ZDM tools if the clusters have different schemas.