About DataStax Bulk Loader
DataStax Bulk Loader® is open-source software (OSS). The latest version is 1.11. It is supported on Linux, macOS, and Windows platforms. You can use DataStax Bulk Loader (dsbulk) to load, unload, and count database records and return related information.
This OSS product efficiently and reliably loads small or large amounts of data, supporting developer and production environments.
Using dsbulk
commands, CSV or JSON files can be rapidly loaded/unloaded to/from the following:
-
DataStax Astra DB databases
-
DataStax Enterprise (DSE) 5.1 and 6.8 databases
-
Open source Apache Cassandra® 2.1 and later databases
Features in DataStax Bulk Loader
-
Open source for contributions from a community of software developers. See the public DSBulk Loader GitHub repo: https://github.com/datastax/dsbulk.
-
CSV and JSON are supported formats, and optionally you can load or unload the data from or to compressed files.
-
Files, directories, stdin/stdout, and web URLs can be used for either source or destination.
-
Performance improvements of 2-3 times faster compared to cqlsh
COPY
, due to multi-threaded operation. -
Connect to a cloud-native Astra DB database by including the path to the secure connect bundle (SCB). You can download the SCB from Astra console after creating an Astra DB database.
-
Support in DSBulk Loader 1.11+ for the
vector<type, dimension>
data type, when used with Astra DB databases created with the Vector Search feature. -
DataStax Java driver options are available directly with dsbulk commands via the
datastax-java-driver
prefix. -
Secure authentication via Kerberos or username/password over SSL options.
-
Configurable data parsing. For instance, date formatting is configurable.
-
Performance and progress reporting.
-
Command line tool for both Linux and Windows:
-
Can use configuration files to simplify command line calls to
dsbulk
-
Tunable parameters to optimize loading and unloading times.
-
Enhancements allow secure connections for loading and unloading data.
-
-
In addition to the
dsbulk load
anddsbulk unload
commands, you can usedsbulk count
to return information about loaded records in supported database tables. -
Print basic information about the associated cluster when you request verbose logging on the
dsbulk
command. Refer to Printing cluster information. -
Diagnose issues encountered during write operations. Refer to Detection of write failures.
-
Resume a failed operation by using checkpoint files.