Migrate data to DataStax Enterprise
DataStax Enterprise (DSE) uses several solutions for migrating data from other databases:
-
Use DataStax Bulk Loader (dsbulk) to load and unload CSV or JSON data in and out of the DSE database.
-
The CQL COPY TO command mirrors what the PostgreSQL RDBMS uses for file/export import.
You can use COPY in the CQL shell to read CSV data to DSE and write CSV data from DSE to a file system. Typically, an RDBMS has unload utilities for writing table data to a file system.
-
The sstableloader provides the ability to bulk load external data into a cluster.
-
DSE Analytics can use Apache Spark™ to connect to a wide variety of data sources and save the data to DSE using either the older RDD or newer DataFrame method.
-
The DataStax Apache Pulsar™ Connector is open-source software (OSS) installed in the Pulsar IO framework. It synchronizes records from a Pulsar topic with table rows in DSE and Cassandra databases.
The DataStax Apache Kafka™ Connector synchronizes records from a Kafka topic with rows in one or more DSE database tables.
ETL tools
If you need more sophistication applied to a data movement situation than just extract-load, you can use any number of extract-transform-load (ETL) solutions that support DSE. These tools provide transformation routines for manipulating source data and then loading the data into a DSE target. The tools offer features such as visual, point-and-click interfaces, scheduling engines, and more.
Many ETL vendors who support DSE supply community editions of their products that are free and able to solve many different use cases. Enterprise editions are also available.
You can download ETL tools that work with DSE from Talend, Informatica, and Streamsets.