About DataStax Bulk Loader

DataStax Bulk Loader is open-source software that you can use DSBulk to load, unload, and count database records and return related information.

You can load and unload CSV and JSON files with options for parsing and progress reporting. Load/unload targets can be compressed or uncompressed, and they can be files, directories, stdin/stdout, or URLs. Performance is optimized through multi-threaded operation, and you can tune various parameters to improve load and unload times.

DataStax Bulk Loader is compatible with the following databases:

  • Astra DB

  • Hyper-Converged Database (HCD)

  • DataStax Enterprise (DSE) 5.1, 6.8, and 6.9

  • Apache Cassandra® 2.1 and later

DataStax Bulk Loader is supported on Linux, macOS, and Windows platforms.

DSBulk architecture

The DataStax workflow engine is the key architectural component responsible for the orchestration of DSBulk operations. The main features are as follows:

Configuration

The workflow engine collects user-supplied settings, merges them with default values and configures the loading/unloading operation to run.

Connection

The workflow engine handles the driver connection to your database. The workflow engine manages the driver-specific settings, and it supports authentication and SSL encryption.

Conversion

The engine handles data type conversions such as Boolean, number, or date conversions from anything (typically, strings or raw bytes as emitted by a connector) to appropriate internal representations (typically, Java Temporal or Number objects). It also handles NULL and UNSET values.

Mapping

The engine analyzes metadata gathered from the driver and infers the appropriate INSERT or SELECT prepared statement, then checks this information against user-supplied information about the data source, to infer the bound variables to use.

Monitoring

The engine reports metrics about all its internal components, mainly the connector and the bulk executor.

Error Handling

The engine handles errors from both connectors and the bulk executor, and reports read, parse, and write failures. These are redirected to a configurable "bad file" that contains sources that could not be loaded.

DSBulk loading workflow diagram

Workflow to load JSON or CSV data into an HCD

DSBulk unloading workflow diagram

Workflow to unload data from an HCD

For more information about this open-source project, including licensing, see the DSBulk GitHub repository.

To get started, see Install DataStax Bulk Loader.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com