Install DataStax Bulk Loader
DataStax Bulk Loader (DSBulk) is supported on Linux, macOS, and Windows platforms, and it is compatible with the following databases:
-
Astra DB
-
Hyper-Converged Database (HCD)
-
DataStax Enterprise (DSE) 5.1, 6.8, and 6.9
-
Apache Cassandra® 2.1 and later
DataStax recommends using the latest DSBulk release.
You can use DSBulk as a standalone tool that connects to a local or remote cluster. Remote connections don’t require DSBulk to run locally on the cluster node, but this configuration is supported.
DSBulk requires a Java executable, as explained in Post-installation requirements and recommendations.
Install on Linux or macOS
-
Download the DSBulk tarball or zip file from the DSBulk GitHub repo or Maven Central.
-
Unpack the archive, replacing
VERSIONwith the version number of your downloaded file:tar -xzvf dsbulk-VERSION.tar.gz -
Continue to Post-installation requirements and recommendations.
Install on Windows
-
Ensure you have Java installed on your Windows system.
-
Download the DSBulk zip file for Windows from the DSBulk GitHub repo. Select
.zip, not.zip.asc. -
Extract the contents to a directory.
-
Optional: DSBulk attempts to find the Java executable automatically. Alternatively, you can manually configure Java for DSBulk to specify the Java executable to use. To do this, define the
JAVA_HOMEenvironment variable to specify which Java VM to use for DSBulk. -
Adjust the
port-numberto your specific configuration. The default is9042. -
Open a command prompt, navigate to
dsbulk-VERSION\bindirectory, replacingVERSIONwith the version number of your downloaded archive. -
Run
dsbulkcommands from your DSBulk\bindirectory.For example, to load data from a CSV file into your database:
dsbulk load -url C:\PATH_TO_CSV -k KEYSPACE_NAME -t TABLE_NAMEReplace the following:
-
PATH_TO_CSV: The path to the CSV file that you want to load. -
KEYSPACE_NAMEandTABLE_NAME: The keyspace and table in your database where you want to load the data.
You must escape all backslashes (
\) in Windows paths. For more information, see Escaping command line arguments. -
-
Continue to Post-installation requirements and recommendations.
Post-installation requirements and recommendations
Review the following information to ensure your DSBulk installation functions as expected.
Java executable
A Java executable is required.
On macOS, Linux, and *nix systems, the rules used to find a Java executable are as follows:
-
Use
$JAVAif defined. -
Use
${JAVA_HOME}/bin/javaif defined. -
Use
$(/usr/libexec/java_home)/bin/javaif defined. -
Use the first Java executable found on
$PATH.
On Windows systems, the rules used to find a Java executable are as follows:
-
Use
%JAVA_HOME%\bin\javaif defined. -
Use the first Java executable found on
$PATH.
Export DSBULK_JAVA_OPTS
You can pass system properties to the DSBulk process by exporting the environment variable DSBULK_JAVA_OPTS.
This step can be useful, for example, to configure JMX monitoring, or to configure advanced authentication schemes such as Kerberos.
For example, on a Linux system:
# Remote JMX configuration
export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Dcom.sun.management.jmxremote.port=port-number"
# Kerberos configuration
export DSBULK_JAVA_OPTS="$DSBULK_JAVA_OPTS -Djava.security.krb5.conf=configuration-path-and-filename"
# Invoke DSBulk
bin/dsbulk load -h host1.com -k ks1 -t table1 -url data.csv
Prior package installs
If you previously used a package install of DSE on the node where you just installed dsbulk, and a prior version of dsbulk was included, such as 1.9.1.
-
After unpacking the latest version of
dsbulkfrom the standalone tarball or zip file, update yourPATHso that it points to the new version.For example, on a macOS node, edit your $HOME/.bashrc file, adding a command such as the following:
export PATH=path/to/unpacked/location/dsbulk-VERSION/bin:$PATH -
From the command line, execute your updated
.bashrc, for example:source ~/.bashrc -
Verify the
dsbulkversion:dsbulk --version
Get started with DSBulk
-
The
dsbulkcommand can load, unload, and count data, but they cannot create keyspaces or tables. -
Run some commands. Try loading data, unloading data, and counting data.
-
When you are comfortable with the basic commands, try these advanced and optional features:
-
Explore available commands and options with
dsbulk helpand the DSBulk documentation.