Installing DataStax Bulk Loader for Apache Cassandra
Install DataStax Bulk Loader for Apache Cassandra.
Introduction
- DataStax Astra cloud databases
- DataStax Enterprise (DSE) 4.7 and later databases
- Open source Apache Cassandra® 2.1 and later databases
You can use DataStax Bulk Loader for Apache Cassandra as a standalone tool that connects remotely to a cluster. The tool is not required to run locally on a cluster node, but can be used in this configuration.
For Apache Cassandra 2.1 and later databases, DataStax Bulk Loader 1.4.1 added support for load and count operations; previous DataStax Bulk Loader releases supported unload operations only.
Installation steps
- Download the DataStax Bulk Loader for Apache Cassandra tarball or zip file from the Tools section of the DataStax download page.
- Select the package for your OS: A tar file is provided for Linux and macOS; a zip file is provided for Windows.
- If you agree, enable the Terms checkbox and click the
Download button. Tip: As an alternative to steps 1-3, you can use
curl
to download the file. Example:curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.7.0.tar.gz
- Unpack the downloaded distribution. Linux example:
tar -xzvf dsbulk-1.7.0.tar.gz
Result: the files are downloaded and extracted into the current directory.
Post-install requirements and recommendations
Java executable is required
Using DataStax Bulk Loader for Apache Cassandra requires a Java executable.
- Use
$JAVA
if defined - Use
${JAVA_HOME}/bin/java
if defined - Use
$(/usr/libexec/java_home)/bin/java
if defined - Use the first Java executable found on
$PATH
- Use
%JAVA_HOME%\bin\java
if defined - Use the first Java executable found on
$PATH
DSBULK_JAVA_OPTS
.
This step can be useful, for example, to configure JMX monitoring, or to configure
advanced authentication schemes such as Kerberos. For example, on a Linux system:
# Remote JMX configuration export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Dcom.sun.management.jmxremote.port=port-number" # Kerberos configuration export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Djava.security.krb5.conf=configuration-path-and-filename" # Invoke DSBulk bin/dsbulk load -h host1.com -k ks1 -t table1 -url data.csv
Regarding any prior package installs
If you previously used a package install of DSE on the node where you just installed
dsbulk
, a prior version of dsbulk
was
included, such as 1.3.0. After unpacking the latest version of
dsbulk
from the standalone tarball, update your
PATH
so that it points to the new version.
export PATH=path-to-unpacked-location/dsbulk-1.7.0/bin:$PATH
From the command line, execute your updated .bashrc
, and verify the
dsbulk
version.
Example:source ~/.bashrc
dsbulk --version
DataStax Bulk Loader v1.7.0
What's next?
Learn how to get started with DataStax Bulk Loader for Apache Cassandra.