About DataStax Bulk Loader - dsbulk

Introduction to DataStax Bulk Loader.

You can use DataStax Bulk Loader (dsbulk) to load and unload CSV or JSON data in and out of DataStax Enterprise (DSE) or DataStax Distribution of Apache Cassandra™ (DDAC) databases.

About this document

Welcome to the DataStax Bulk Loader documentation. To ensure that you get the best experience in using this document, take a moment to read Tips for using DataStax documentation.

About DataStax Bulk Loader - dsbulk

DataStax Bulk Loader efficiently and reliably loads small or large amounts of data, supporting developer and production environments. Using dsbulk, CSV or JSON files can be rapidly loaded or unloaded to or from DSE 3.2 or later or DDAC databases. To migrate data, CSV or JSON files from relational database exports and original data may be inserted into DSE or DDAC databases. The tool is supported for Linux and Windows platforms.

Features in DataStax Bulk Loader

  • CSV and JSON are supported formats
  • Files, directories, stdin/stdout, and web URLs can be used for either source or destination
  • Performance improvements of 2-3 times faster compared to cqlsh COPY, due to multi-threaded operation
  • Secure authentication via Kerberos or username/password over SSL
  • Configurable data parsing (for instance, date formatting is configurable)
  • Performance and progress reporting
  • Command line tool for both Linux and Windows:
    • Can use configuration files to simplify command line calls to dsbulk
    • Tunable parameters to optimize loading and unloading times
    • Enhancements allow secure connections for loading and unloading data
  • Print basic information about the associated cluster when you request verbose logging on the dsbulk command. Refer to Printing cluster information.
  • Diagnose issues encountered during write operations. Refer to Detection of write failures.
Also see the DataStax blog post Introducing the DataStax Bulk Loader.