• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Bulk Loader

    • About DataStax Bulk Loader
    • Release notes
    • Architecture
    • Installing
    • Getting Started
      • Loading data
      • Unloading data
      • Counting data in tables
      • Creating configuration files
      • Loading tables that contain static and non-static columns
      • Using SSL with dsbulk
      • Printing cluster information
    • Kerberos client authentication
    • Reference
      • dsbulk
        • Loading data examples
        • Unloading data examples
        • Counting data example
        • Exit codes
      • Common options
      • Connector options
      • Count options
      • Schema options
      • Batch options
      • Codec options
      • Driver options
      • Engine options
      • Executor options
      • Logging options
      • Monitoring options
  • DataStax Bulk Loader
  • About DataStax Bulk Loader

About DataStax Bulk Loader for Apache Cassandra

DataStax Bulk Loader for Apache Cassandra® is open-source software (OSS). The latest version is 1.10. It is supported on Linux, macOS, and Windows platforms. You can use DataStax Bulk Loader (dsbulk) to load, unload, and count database records and return related information.

This OSS product efficiently and reliably loads small or large amounts of data, supporting developer and production environments. Using dsbulk commands, CSV or JSON files can be rapidly loaded or unloaded to or from the following supported databases:

  • DataStax Astra DB cloud-native databases

  • DataStax Enterprise (DSE) 4.7 and later databases

  • Open source Apache Cassandra® 2.1 and later databases

Features in DataStax Bulk Loader for Apache Cassandra

  • Open source for contributions from a community of software developers. See the public GitHub repo: https://github.com/datastax/dsbulk

  • CSV and JSON are supported formats, and optionally you can load or unload the data from or to compressed files.

  • Files, directories, stdin/stdout, and web URLs can be used for either source or destination.

  • Performance improvements of 2-3 times faster compared to cqlsh COPY, due to multi-threaded operation.

  • Connect to a cloud-based Astra DB database by including the path to the secure connect bundle (SCB). You can download the SCB from Astra console after creating an Astra DB database.

  • DataStax Java driver options are available directly with dsbulk commands via the datastax-java-driver prefix.

  • Secure authentication via Kerberos or username/password over SSL options.

  • Configurable data parsing. For instance, date formatting is configurable.

  • Performance and progress reporting.

  • Command line tool for both Linux and Windows:

    • Can use configuration files to simplify command line calls to dsbulk

    • Tunable parameters to optimize loading and unloading times.

    • Enhancements allow secure connections for loading and unloading data.

  • In addition to the dsbulk load and dsbulk unload commands, you can use dsbulk count to return information about loaded records in supported database tables.

  • Print basic information about the associated cluster when you request verbose logging on the dsbulk command. Refer to Printing cluster information.

  • Diagnose issues encountered during write operations. Refer to Detection of write failures.

  • Resume a failed operation by using checkpoint files.

Also see the DataStax blog post Introducing the DataStax Bulk Loader.

Release notes

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage