Install DataStax Bulk Loader

DataStax Bulk Loader (DSBulk) is supported on Linux, macOS, and Windows platforms, and it is compatible with the following databases:

  • Astra DB

  • Hyper-Converged Database (HCD)

  • DataStax Enterprise (DSE) 5.1, 6.8, and 6.9

  • Apache Cassandra® 2.1 and later

DataStax recommends using the latest DataStax Bulk Loader release.

You can use DataStax Bulk Loader as a standalone tool that connects remotely to a cluster. The tool is not required to run locally on a cluster node, but can be used in this configuration.

Using DataStax Bulk Loader requires a Java executable, as explained in the post-install requirements and recommendations section.

Installation steps

Use one of the following options to install DSBulk.

  • Linux/macOS

  • Windows

  1. Download the DSBulk tarball or zip file from the DSBulk GitHub repo or Maven Central.

  2. Unpack the archive, replacing VERSION with the version number of your downloaded file:

    tar -xzvf dsbulk-VERSION.tar.gz
  1. Ensure you have Java installed on your Windows system.

  2. Download the DSBulk zip file for Windows from the DSBulk GitHub repo. Select .zip, not .zip.asc.

  3. Extract the contents to a directory.

  4. Optional: DSBulk attempts to find the Java executable automatically. Alternatively, you can manually configure Java for DSBulk to specify the Java executable to use. To do this, define the JAVA_HOME environment variable to specify which Java VM to use for DSBulk.

  5. Adjust the port-number to your specific configuration. The default is 9042.

  6. Open a command prompt, navigate to dsbulk-VERSION\bin directory, replacing VERSION with the version number of your downloaded archive.

  7. Run dsbulk commands from your DSBulk \bin directory.

    For example, to load data from a CSV file into your database:

    dsbulk load -url C:\PATH_TO_CSV -k KEYSPACE_NAME -t TABLE_NAME

    Replace the following:

    • PATH_TO_CSV: The path to the CSV file that you want to load.

    • KEYSPACE_NAME and TABLE_NAME: The keyspace and table in your database where you want to load the data.

You must escape all backslashes (\) in Windows paths. For more information, see Escaping command line arguments.

DSBulk has many options, such as data formatting, authentication, and performance tuning. Use dsbulk help to explore available commands and options.

Post-install requirements and recommendations

Review the following information to ensure your DSBulk installation functions as expected.

Java executable

A Java executable is required.

On macOS, Linux, and *nix systems, the rules used to find a Java executable are as follows:

  1. Use $JAVA if defined.

  2. Use ${JAVA_HOME}/bin/java if defined.

  3. Use $(/usr/libexec/java_home)/bin/java if defined.

  4. Use the first Java executable found on $PATH.

On Windows systems, the rules used to find a Java executable are as follows:

  1. Use %JAVA_HOME%\bin\java if defined.

  2. Use the first Java executable found on $PATH.

Export DSBULK_JAVA_OPTS

You can pass system properties to the DataStax Bulk Loader process by exporting the environment variable DSBULK_JAVA_OPTS. This step can be useful, for example, to configure JMX monitoring, or to configure advanced authentication schemes such as Kerberos. For example, on a Linux system:

# Remote JMX configuration
export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Dcom.sun.management.jmxremote.port=port-number"

# Kerberos configuration
export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Djava.security.krb5.conf=configuration-path-and-filename"

# Invoke DSBulk
bin/dsbulk load -h host1.com -k ks1 -t table1 -url data.csv

Prior package installs

If you previously used a package install of DSE on the node where you just installed dsbulk, and a prior version of dsbulk was included, such as 1.9.1.

  1. After unpacking the latest version of dsbulk from the standalone tarball or zip file, update your PATH so that it points to the new version.

    For example, on a macOS node, edit your $HOME/.bashrc file, adding a command such as the following:

    export PATH=path/to/unpacked/location/dsbulk-VERSION/bin:$PATH
  2. From the command line, execute your updated .bashrc, for example:

    source ~/.bashrc
  3. Verify the dsbulk version:

    dsbulk --version

Get started with DSBulk

  1. Create CQL tables.

    The dsbulk command can load, unload, and count data, but they cannot create keyspaces or tables.

  2. Run some commands. Try loading data, unloading data, and counting data.

  3. When you are comfortable with the basic commands, try these advanced and optional features:

Explore the documentation to learn more about all available features and options.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com