dsbulk

DataStax Bulk Loader provides the dsbulk command for loading, unloading, and counting data.

Three subcommands, load, unload, and count are straightforward. The subcommands require the options keyspace and table, or a schema.query. The load and unload commands also require a designated data source (CSV or JSON).

A wide variety of options are also available to help you tailor how DataStax Bulk Loader operates. These options have defined default values or values inferred from the input data, if the operation is loading, or from the database data, if the operation is unloading. The options described here are grouped functionally, so that additional requirements can be noted. For example, if loading or unloading CSV data, the connector.csv.url option must be set, specifying the path or URL of the CSV data file used for loading or unloading.

The standalone tool is launched using the command dsbulk from within the bin directory of your distribution. The tool also provides inline help for all settings. A configuration file specifying option values can be used, or options can be specified on the command line. Options specified on the command line will override the configuration file option settings.

Synopsis

dsbulk ( load | unload | count ) [options]
  (( -k | --schema.keyspace ) keyspace_name
  ( -t | --schema.table ) table_name)
  | ( --schema.query string )
  [ help | --help ]

Syntax conventions Description

Syntax conventions	Description
Capitalized words or bold	Variable value to be replaced with a user-defined value
`[ ]`	Square brackets surround optional command arguments. Don’t type the square brackets.
`( )`	Parentheses can identify a group to choose from. Don’t type the parentheses.
`\|`	Pipe separates alternative options within a group or argument. Type any one of the elements. Don’t type the pipe.
`[ -- ]`	Separate the command line options from the command arguments with two hyphens. This syntax is useful when arguments can be mistaken for options.

Capitalized words or bold

Variable value to be replaced with a user-defined value

[ ]

Square brackets surround optional command arguments. Don’t type the square brackets.

( )

Parentheses can identify a group to choose from. Don’t type the parentheses.

|

Pipe separates alternative options within a group or argument. Type any one of the elements. Don’t type the pipe.

[ -- ]

Separate the command line options from the command arguments with two hyphens. This syntax is useful when arguments can be mistaken for options.

Get help

Get help and information about dsbulk and its options:

dsbulk help

Get help with a specific option:

dsbulk help connector.csv

Get short form options and help:

dsbulk -c csv --help

Get the version

Get your current version of DSBulk:

dsbulk --version

Escape and quote command line arguments

When supplied on the command line, all option values must be in valid HOCON syntax.

Control characters, the backslash character, and the double-quote character must be properly escaped, and string values containing special characters must be double-quoted, as required by the HOCON syntax.

For example, use \t to escape the tab character, and use \\ to escape the the backslash character:

dsbulk load -delim '\t' -url 'C:\\Users\\My Folder'

The following syntactic sugar is available for string, list, and map arguments passed directly on the command line only. All other types, and all options specified in configuration files, must be fully compliant with HOCON syntax. You must ensure that these values are properly escaped and quoted.

On the command line, when an argument expects a string, you can omit the surrounding double-quotes for convenience. For example, the following two lines are equivalent:

dsbulk load -url '"C:\\Users\\My Folder"'
dsbulk load -url 'C:\\Users\\My Folder'

On the command line, when an argument is a list, you can omit the surrounding square brackets. The following two lines are equivalent:

dsbulk load --codec.nullStrings 'NIL, NULL'
dsbulk load --codec.nullStrings '[NIL, NULL]'

On the command line, when an argument is a map, you can omit the surrounding curly braces. The following two lines are equivalent:

dsbulk load --connector.json.deserializationFeatures '{ USE_BIG_DECIMAL_FOR_FLOATS : true }'
dsbulk load --connector.json.deserializationFeatures 'USE_BIG_DECIMAL_FOR_FLOATS : true'

Detect write failures

In the Apache Cassandra documentation, you may have encountered one or more of the following terms, all of which have the same meaning:

Lightweight Transactions (LWT), used in this topic
Compare-And-Set (CAS)
Paxos protocol

DataStax Bulk Loader detects any failures due to failed LWT write operations. In 1.3.2 or later, records that could not be inserted are shown in two files:

paxos.bad is the "bad file" devoted to LWT write failures.
paxos-erros.log is the debug file devoted to LWT write failures.

DataStax Bulk Loader also writes any failed records to one of the following files in the operation’s directory, depending on when the failure occurred. If the failure occurred while:

parsing data, the records are written to connector.bad.
mapping data to the supported DSE, Astra DB, and Cassandra databases, the records are written to mapping.bad.
inserting data into any of those supported databases, the records are written to load.bad.

The operation’s directory is the logs subdirectory under the location from which you ran the dsbulk command.

Exit codes

The dsbulk command has exit codes that are returned to a calling process. The following values link the integer value returned with the status:

Exit code values
Integer value	Status value
0	STATUS_OK
1	STATUS_COMPLETED_WITH_ERRORS
2	STATUS_ABORTED_TOO_MANY_ERRORS
3	STATUS_ABORTED_FATAL_ERROR
4	STATUS_INTERRUPTED
5	STATUS_CRASHED

Options

For dsbulk options, see the following: