dsbulk
DataStax Bulk Loader provides the dsbulk
command for loading, unloading, and counting data.
Three subcommands, load
, unload
, and count
are straightforward.
The subcommands require the options keyspace
and table
, or a schema.query
.
The load
and unload
commands also require a designated data source (CSV or JSON).
A wide variety of options are also available to help you tailor how DataStax Bulk Loader operates.
These options have defined default values or values inferred from the input data, if the operation is loading, or from the database data, if the operation is unloading.
The options described here are grouped functionally, so that additional requirements can be noted.
For example, if loading or unloading CSV data, the connector.csv.url
option must be set, specifying the path or URL of the CSV data file used for loading or unloading.
The standalone tool is launched using the command dsbulk
from within the bin
directory of your distribution.
The tool also provides inline help for all settings.
A configuration file specifying option values can be used, or options can be specified on the command line.
Options specified on the command line will override the configuration file option settings.
Synopsis
dsbulk ( load | unload | count ) [options]
(( -k | --schema.keyspace ) keyspace_name
( -t | --schema.table ) table_name)
| ( --schema.query string )
[ help | --help ]
Syntax conventions | Description |
---|---|
Capitalized words or bold |
Variable value to be replaced with a user-defined value |
|
Square brackets surround optional command arguments. Don’t type the square brackets. |
|
Parentheses can identify a group to choose from. Don’t type the parentheses. |
|
Pipe separates alternative options within a group or argument. Type any one of the elements. Don’t type the pipe. |
|
Separate the command line options from the command arguments with two hyphens. This syntax is useful when arguments can be mistaken for options. |
Get help
Get help and information about dsbulk
and its options:
dsbulk help
Get help with a specific option:
dsbulk help connector.csv
Get short form options and help:
dsbulk -c csv --help
Get the version
Get your current version of DSBulk:
dsbulk --version
Escape and quote command line arguments
When supplied on the command line, all option values must be in valid HOCON syntax.
Control characters, the backslash character, and the double-quote character must be properly escaped, and string values containing special characters must be double-quoted, as required by the HOCON syntax.
For example, use \t
to escape the tab character, and use \\
to escape the the backslash character:
dsbulk load -delim '\t' -url 'C:\\Users\\My Folder'
The following syntactic sugar is available for string, list, and map arguments passed directly on the command line only. All other types, and all options specified in configuration files, must be fully compliant with HOCON syntax. You must ensure that these values are properly escaped and quoted. |
On the command line, when an argument expects a string, you can omit the surrounding double-quotes for convenience. For example, the following two lines are equivalent:
dsbulk load -url '"C:\\Users\\My Folder"'
dsbulk load -url 'C:\\Users\\My Folder'
On the command line, when an argument is a list, you can omit the surrounding square brackets. The following two lines are equivalent:
dsbulk load --codec.nullStrings 'NIL, NULL'
dsbulk load --codec.nullStrings '[NIL, NULL]'
On the command line, when an argument is a map, you can omit the surrounding curly braces. The following two lines are equivalent:
dsbulk load --connector.json.deserializationFeatures '{ USE_BIG_DECIMAL_FOR_FLOATS : true }'
dsbulk load --connector.json.deserializationFeatures 'USE_BIG_DECIMAL_FOR_FLOATS : true'
Detect write failures
In the Apache Cassandra documentation, you may have encountered one or more of the following terms, all of which have the same meaning:
-
Lightweight Transactions (LWT), used in this topic
-
Compare-And-Set (CAS)
-
Paxos protocol
DataStax Bulk Loader detects any failures due to failed LWT write operations. In 1.3.2 or later, records that could not be inserted are shown in two files:
-
paxos.bad
is the "bad file" devoted to LWT write failures. -
paxos-erros.log
is the debug file devoted to LWT write failures.
DataStax Bulk Loader also writes any failed records to one of the following files in the operation’s directory, depending on when the failure occurred. If the failure occurred while:
-
parsing data, the records are written to
connector.bad
. -
mapping data to the supported DSE, Astra DB, and Cassandra databases, the records are written to
mapping.bad
. -
inserting data into any of those supported databases, the records are written to
load.bad
.
The operation’s directory is the logs subdirectory under the location from which you ran the dsbulk
command.
Exit codes
The dsbulk
command has exit codes that are returned to a calling process.
The following values link the integer value returned with the status:
Integer value | Status value |
---|---|
0 |
STATUS_OK |
1 |
STATUS_COMPLETED_WITH_ERRORS |
2 |
STATUS_ABORTED_TOO_MANY_ERRORS |
3 |
STATUS_ABORTED_FATAL_ERROR |
4 |
STATUS_INTERRUPTED |
5 |
STATUS_CRASHED |
Options
For dsbulk
options, see the following: