DSBulk synopsis and usage
DataStax Bulk Loader provides the dsbulk command and the subcommands load, unload, count, and help.
For example, the following command loads data from a CSV file into a table in a self-managed cluster, such as DataStax Enterprise (DSE), that requires authentication with a username and password:
dsbulk load \
-u username -p password -h '10.200.1.3, 10.200.1.4' -port 9876 \
-k ks1 -t table1 \
-url filename.csv
dsbulk is launched from the bin directory of your DSBulk installation.
If your machine cannot automatically detect the path to the dsbulk executable, you might need to provide the path on the command line.
For more information, see Post-install requirements and recommendations.
|
To test a |
Synopsis
dsbulk SUBCOMMAND OPTIONS
Replace SUBCOMMAND with the operation to perform:
Replace OPTIONS with any of the DSBulk options, including general options and subcommand-specific options.
Many options have defined or inferred default values.
Some options are required depending on the subcommand and cluster configuration:
-
Connection and authentication options: Clusters with authentication enabled require driver authentication and connection options. DataStax recommends using a configuration file for sensitive values, such as passwords.
-
Schema options:
load,unload, andcountsubcommands always require that you specify the target schema for the operation. There are multiple ways to specify the schema. For all options, see Schema options.-
Target by keyspace and table name: To target a keyspace and table by name, use the
-k(--schema.keyspace) and-t(--schema.table) options. This is best when you want to target the entire table, such as when loading data. -
Target by CQL statement: To select data using a CQL statement, use the
-query(--schema.query) option. This is best when you want to target a subset of the data in a table, such as when unloading or counting data.If you specify
-kand-query, you can omit the keyspace name from the CQL statement in-query. -
Target graph data: The
-g(--schema.graph),-v(--schema.vertex), and-e(--schema.edge) options can be used to target graph data.
-
-
Data source or destination: For the
loadandunloadsubcommands, you must specify a data source or destination. When loading, this is the source of the data you want to load into the table. When unloading, this is the destination where you want to write the unloaded data.For options and examples, see the following:
Get help
Get help and information about dsbulk and its options:
dsbulk help
Get help with a specific option:
dsbulk help connector.csv
Get short form options and help:
dsbulk -c csv --help
Get the version
Get your current version of DSBulk:
dsbulk --version
Escape and quote command line arguments
When supplied on the command line, all option values must be in valid HOCON syntax:
-
Control characters, backslashes, and double-quotes must be properly escaped.
For example, use
\tto escape the tab character, and use\\to escape the the backslash character:dsbulk load -delim '\t' -url 'C:\\Users\\My Folder' -
Strings containing special characters must be double-quoted.
For example, if your file path contains spaces, you must quote the entire string, and escape the backslashes in the path:
dsbulk load -url "C:\\Users\\My Folder\\filename.csv"The following schema options accept string-like values that must not be quoted:
--schema.keyspace,--schema.table,--schema.graph,--schema.vertex,--schema.edge,--schema.from, and--schema.to. -
When passing CQL statements in the
-queryoption, double-quotes must be escaped.For example, if your CQL statement contains a mixed-case table name, you must quote and escape the table name:
dsbulk unload -query "SELECT id, row1, row2 FROM ks1.\\\"tableMixedCase\\\" WHERE row1='some-value'"
Syntactic sugar for string, list, and map arguments
|
The following syntactic sugar is available for string, list, and map arguments passed directly on the command line only. All other types, and all options specified in configuration files, must be fully compliant with HOCON syntax. You must ensure that these values are properly escaped and quoted. |
On the command line, when an argument expects a string, you can omit the surrounding double-quotes for convenience. For example, the following two lines are equivalent:
dsbulk load -url '"C:\\Users\\My Folder"'
dsbulk load -url 'C:\\Users\\My Folder'
On the command line, when an argument is a list, you can omit the surrounding square brackets. The following two lines are equivalent:
dsbulk load --codec.nullStrings 'NIL, NULL'
dsbulk load --codec.nullStrings '[NIL, NULL]'
On the command line, when an argument is a map, you can omit the surrounding curly braces. The following two lines are equivalent:
dsbulk load --connector.json.deserializationFeatures '{ USE_BIG_DECIMAL_FOR_FLOATS : true }'
dsbulk load --connector.json.deserializationFeatures 'USE_BIG_DECIMAL_FOR_FLOATS : true'
Exit codes
The dsbulk command can return the following exit codes to indicate the status of an operation:
-
0:STATUS_OK -
1:STATUS_COMPLETED_WITH_ERRORS -
2:STATUS_ABORTED_TOO_MANY_ERRORS -
3:STATUS_ABORTED_FATAL_ERROR -
4:STATUS_INTERRUPTED -
5:STATUS_CRASHED