About DataStax Bulk Loader - dsbulk

Introduction to the DataStax Bulk Loader tool.

The DataStax Bulk Loader tool dsbulk can be used to load and unload DSE data in either CSV or JSON format.

About this document 

Welcome to the DataStax Bulk Loader documentation. To ensure that you get the best experience in using this document, take a moment to look at the Tips for using DataStax documentation.

About DataStax Bulk Loader - dsbulk 

The DataStax Bulk Loader tool is designed to provide users with the ability to both load and unload data in and out of DSE efficiently and reliably. dsbulk efficiently loads small or large amounts of data, and supports both developer and production environments. Using dsbulk, CSV or JSON files can be rapidly loaded or unloaded to or from DSE. CSV or JSON files from both relational database exports and original data may be inserted into the DSE transactional database. The tool is supported for both Linux and Windows platforms.

Features in DataStax Bulk Loader 

  • Can migrate data into DSE from another DSE or Apache CassandraTM cluster
    • Can unload data from any Cassandra 2.1 or later data source
    • Can load data to DSE 5.0 or later
  • CSV and JSON are supported formats
  • Files, directories, stdin/stdout, and web URLs can be used for either source or destination
  • Performance improvements of 2-3 times faster compared to cqlsh COPY, due to multi-threaded operation
  • Secure authentication via Kerberos or username/password over SSL
  • Configurable data parsing (for instance, date formatting is configurable)
  • Performance and progress reporting
  • Command line tool for both Linux and Windows:
    • Can use configuration files to simplify command line calls to dsbulk
    • Tunable parameters to optimize loading and unloading times
    • Enhancements allow secure connections for loading and unloading data
See the DataStax blog post Introducing the DataStax Bulk Loader.