About DataStax Bulk Loader - dsbulk

Introduction to the DataStax Bulk Loader tool.

The DataStax Bulk Loader tool dsbulk can be used to load and unload DSE data in either CSV or JSON format.

About this document

Welcome to the DataStax Bulk Loader documentation. To ensure that you get the best experience in using this document, take a moment to look at the Tips for using DataStax documentation.

About DataStax Bulk Loader - dsbulk

The DataStax Bulk Loader tool is designed to provide users with the ability to both load and unload data in and out of DSE efficiently and reliably. dsbulk efficiently loads small or large amounts of data, and supports both developer and production environments. Using dsbulk, CSV or JSON files can be rapidly loaded or unloaded to or from DSE. CSV or JSON files from both relational database exports and original data may be inserted into the DSE transactional database. The tool is supported for both Linux and Windows platforms.

Features in DataStax Bulk Loader

  • Can migrate data into DSE from another DSE or Apache CassandraTM cluster
    • Can unload data from any Cassandra 1.2 or later data source
    • Can load data to DSE 3.2 or later
    Attention: All protocol versions are supported. Some features might not be available depending on the protocol version and server version.
  • CSV and JSON are supported formats
  • Files, directories, stdin/stdout, and web URLs can be used for either source or destination
  • Performance improvements of 2-3 times faster compared to cqlsh COPY, due to multi-threaded operation
  • Secure authentication via Kerberos or username/password over SSL
  • Configurable data parsing (for instance, date formatting is configurable)
  • Performance and progress reporting
  • Command line tool for both Linux and Windows:
    • Can use configuration files to simplify command line calls to dsbulk
    • Tunable parameters to optimize loading and unloading times
    • Enhancements allow secure connections for loading and unloading data
  • Ability to print basic information about the associated cluster when you request verbose logging on the dsbulk command. Refer to Printing cluster information.
  • Details to help diagnose any issues that may have occurred during write operations to a database table. Refer to Detection of CAS write failures.
Also see the DataStax blog post Introducing the DataStax Bulk Loader.