Installing DataStax Bulk Loader for Apache Cassandra

Install DataStax Bulk Loader for Apache Cassandra.

Introduction

DataStax Bulk Loader for Apache Cassandra® 1.7.0 lets you efficiently and reliably load and unload CSV/JSON data in and out of:
  • Open source Apache Cassandra® 2.1 and later databases
  • DataStax Astra cloud databases
  • DataStax Enterprise (DSE) 4.7 and later databases
Important: As of version 1.6.0, DataStax Bulk Loader for Apache Cassandra is open-source software. Join the community of developers who contribute to the product! See the public GitHub repo: https://github.com/datastax/dsbulk
DataStax recommends using the latest DataStax Bulk Loader for Apache Cassandra version, which is currently 1.7.0. DataStax Bulk Loader for Apache Cassandra is supported on Linux, macOS, and Windows platforms.

You can use DataStax Bulk Loader for Apache Cassandra as a standalone tool that connects remotely to a cluster. The tool is not required to run locally on a cluster node, but can be used in this configuration.

Using DataStax Bulk Loader for Apache Cassandra requires a Java executable, as explained in the post-install section below.
Attention: If you are using a pre-4.7 DSE release, the new DataStax Java driver options are not supported and you must use or remain on DataStax Bulk Loader 1.3.4.

For Apache Cassandra 2.1 and later databases, DataStax Bulk Loader 1.4.1 added support for load and count operations; previous DataStax Bulk Loader releases supported unload operations only.

Installation steps

Important: Apache-2.0 license agreement. By downloading this DataStax product, you agree to the terms of the open-source Apache-2.0 license agreement.
  1. Download the DataStax Bulk Loader for Apache Cassandra tarball or zip file from the Tools section of the DataStax download page.
  2. Select the package for your OS: A tar file is provided for Linux and macOS; a zip file is provided for Windows.
  3. If you agree, enable the Terms checkbox and click the Download button.
    Tip: As an alternative to steps 1-3, you can use curl to download the file. Example:
    curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.7.0.tar.gz
  4. Unpack the downloaded distribution. Linux example:
    tar -xzvf dsbulk-1.7.0.tar.gz

Result: the files are downloaded and extracted into the current directory.

Post-install requirements and recommendations

Java executable is required

Using DataStax Bulk Loader for Apache Cassandra requires a Java executable.

On macOS, Linux, and *nix systems, the rules used to find a Java executable are:
  1. Use $JAVA if defined
  2. Use ${JAVA_HOME}/bin/java if defined
  3. Use $(/usr/libexec/java_home)/bin/java if defined
  4. Use the first Java executable found on $PATH
On Windows systems, the rules used to find a Java executable are:
  1. Use %JAVA_HOME%\bin\java if defined
  2. Use the first Java executable found on $PATH
You can pass system properties to the DataStax Bulk Loader for Apache Cassandra process by exporting the environment variable DSBULK_JAVA_OPTS. This step can be useful, for example, to configure JMX monitoring, or to configure advanced authentication schemes such as Kerberos. For example, on a Linux system:
# Remote JMX configuration
export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Dcom.sun.management.jmxremote.port=port-number"
# Kerberos configuration
export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Djava.security.krb5.conf=configuration-path-and-filename"
# Invoke DSBulk
bin/dsbulk load -h host1.com -k ks1 -t table1 -url data.csv

Regarding any prior package installs

If you previously used a package install of DSE on the node where you just installed dsbulk, a prior version of dsbulk was included, such as 1.3.0. After unpacking the latest version of dsbulk from the standalone tarball, update your PATH so that it points to the new version.

For example, on a macOS node, edit your $HOME/.bashrc file, adding a command such as:
export PATH=path-to-unpacked-location/dsbulk-1.7.0/bin:$PATH
From the command line, execute your updated .bashrc, and verify the dsbulk version. Example:
source ~/.bashrc
dsbulk --version
DataStax Bulk Loader v1.7.0

What's next?

Learn how to get started with DataStax Bulk Loader for Apache Cassandra.