dsbulk
Bulk loading and unloading tool for DSE, DataStax Astra, Apache Cassandra databases.
dsbulk
command for
loading, unloading, and counting data to or from: - DataStax Astra cloud databases
- DataStax Enterprise (DSE) 4.7 and later databases
- Open source Apache Cassandra® 2.1 and later databases
load
, unload
, and
count
are straightforward. The subcommands require the options
keyspace
and table
, or a schema.query
.
The load
and unload
commands also require a designated
data source (CSV or JSON).A wide variety of options are also available to help you tailor how DataStax Bulk Loader
operates. These options have defined default values or values inferred from the input data,
if the operation is loading, or from the database data, if the operation is unloading. The
options described here are grouped functionally, so that additional requirements can be
noted. For example, if loading or unloading CSV data, the connector.csv.url
option must be set, specifying the path or URL of the CSV data file used for loading or
unloading.
The standalone tool is launched using the command dsbulk
from within the
bin
directory of your distribution. The tool also provides inline help
for all settings. A configuration file specifying option values can be used, or options can
be specified on the command line. Options specified on the command line will override the
configuration file option settings.
Synopsis
dsbulk ( load | unload | count ) [options] (( -k | --keyspace ) keyspace_name ( -t | --table ) table_name) | ( --schema.query string ) [ help | --help ]
Syntax conventions | Description |
---|---|
Italics |
Variable value. Replace with a user-defined value. |
[ ] |
Optional. Square brackets ( [ ] ) surround optional command
arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to choose from. Do
not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative elements. Type
any one of the elements. Do not type the vertical bar. |
[ -- ] |
Separate the command line options from the command arguments with two hyphens (
-- ). This syntax is useful when arguments might be mistaken for
command line options. |
General use
dsbulk
and the common
options:dsbulk help
dsbulk
options, such as
connector.csv
options using the help
subcommand:dsbulk help connector.csv
dsbulk -c csv
with --help
option to see its short
options, along with the general
help:dsbulk -c csv --help
dsbulk --version
Escaping and Quoting Command Line Arguments
\t
is the escape sequence that corresponds to the tab character, whereas
\\
is the escape sequence for the backslash
character:dsbulk load -delim '\t' -url 'C:\\Users\\My Folder'
dsbulk load -url '"C:\\Users\\My Folder"'However, when the expected type of an option is a string, it is possible to omit the surrounding double-quotes, for convenience. Thus, note the absence of the double-quotes in the first example. Similarly, when an argument is a list, it is possible to omit the surrounding square brackets; making the following two lines equivalent:
dsbulk load --codec.nullStrings 'NIL, NULL' dsbulk load --codec.nullStrings '[NIL, NULL]'The same applies for arguments of type map: it is possible to omit the surrounding curly braces, making the following two lines equivalent:
dsbulk load --connector.json.deserializationFeatures '{ USE_BIG_DECIMAL_FOR_FLOATS : true }'
dsbulk load --connector.json.deserializationFeatures 'USE_BIG_DECIMAL_FOR_FLOATS : true'
This syntactic sugar is only available for command line arguments of type string, list or map; all other option types, as well as all options specified in a configuration file must be fully compliant with HOCON syntax, and it is the user's responsibility to ensure that such options are properly escaped and quoted.
Detection of write failures
- Lightweight Transactions (LWT), used in this topic
- Compare-And-Set (CAS)
- Paxos protocol
- paxos.bad is the "bad file" devoted to LWT write failures.
- paxos-erros.log is the debug file devoted to LWT write failures.
- If while parsing data, the records are written to connector.bad.
- If while mapping data to the supported DSE, DataStax Astra, Apache Cassandra databases, the records are written to mapping.bad.
- If while inserting data into any of those supported databases, the records are written to load.bad.
dsbulk
command.