DSBulk synopsis and usage

DataStax Bulk Loader provides the dsbulk command and the subcommands load, unload, count, and help.

For example, the following command loads data from a CSV file into a table in a self-managed cluster, such as DataStax Enterprise (DSE), that requires authentication with a username and password:

dsbulk load \
-u username -p password -h '10.200.1.3, 10.200.1.4' -port 9876 \
-k ks1 -t table1 \
-url filename.csv

dsbulk is launched from the bin directory of your DSBulk installation. If your machine cannot automatically detect the path to the dsbulk executable, you might need to provide the path on the command line. For more information, see Post-install requirements and recommendations.

To test a dsbulk load operation without writing the data to your database, use the --dryRun option.

Synopsis

dsbulk SUBCOMMAND OPTIONS

Replace SUBCOMMAND with the operation to perform:

load
unload
count
help

Replace OPTIONS with any of the DSBulk options, including general options and subcommand-specific options. Many options have defined or inferred default values. Some options are required depending on the subcommand and cluster configuration:

Connection and authentication options: Clusters with authentication enabled require driver authentication and connection options. DataStax recommends using a configuration file for sensitive values, such as passwords.
Schema options: load, unload, and count subcommands always require that you specify the target schema for the operation. There are multiple ways to specify the schema. For all options, see Schema options.
- Target by keyspace and table name: To target a keyspace and table by name, use the -k (--schema.keyspace) and -t (--schema.table) options. This is best when you want to target the entire table, such as when loading data.
- Target by CQL statement: To select data using a Cassandra Query Language (CQL) statement, use the -query (--schema.query) option. This is best when you want to target a subset of the data in a table, such as when unloading or counting data.
  
  If you specify -k and -query, you can omit the keyspace name from the CQL statement in -query.
- Target graph data: The -g (--schema.graph), -v (--schema.vertex), and -e (--schema.edge) options can be used to target graph data.
Data source or destination: For the load and unload subcommands, you must specify a data source or destination. When loading, this is the source of the data you want to load into the table. When unloading, this is the destination where you want to write the unloaded data.

For options and examples, see the following:

Get help

Get help and information about dsbulk and its options:

dsbulk help

Get help with a specific option:

dsbulk help connector.csv

Get short form options and help:

dsbulk -c csv --help

Get the version

Get your current version of DSBulk:

dsbulk --version

Escape and quote command line arguments

When supplied on the command line, all option values must be in valid HOCON syntax:

Control characters, backslashes, and double-quotes must be properly escaped.

For example, use \t to escape the tab character, and use \\ to escape the the backslash character:
```
dsbulk load -delim '\t' -url 'C:\\Users\\My Folder'
```

Strings containing special characters must be double-quoted.

For example, if your file path contains spaces, you must quote the entire string, and escape the backslashes in the path:

dsbulk load -url "C:\\Users\\My Folder\\filename.csv"

The following schema options accept string-like values that must not be quoted: --schema.keyspace, --schema.table, --schema.graph, --schema.vertex, --schema.edge, --schema.from, and --schema.to.

When passing CQL statements in the -query option, double-quotes must be escaped.

For example, if your CQL statement contains a mixed-case table name, you must quote and escape the table name:
```
dsbulk unload -query
  "SELECT id, row1, row2
    FROM ks1.\\\"tableMixedCase\\\"
    WHERE row1='some-value'"
```

Syntactic sugar for string, list, and map arguments

The following syntactic sugar is available for string, list, and map arguments passed directly on the command line only. All other types, and all options specified in configuration files, must be fully compliant with HOCON syntax. You must ensure that these values are properly escaped and quoted.

On the command line, when an argument expects a string, you can omit the surrounding double-quotes for convenience. For example, the following two lines are equivalent:

dsbulk load -url '"C:\\Users\\My Folder"'
dsbulk load -url 'C:\\Users\\My Folder'

On the command line, when an argument is a list, you can omit the surrounding square brackets. The following two lines are equivalent:

dsbulk load --codec.nullStrings 'NIL, NULL'
dsbulk load --codec.nullStrings '[NIL, NULL]'

On the command line, when an argument is a map, you can omit the surrounding curly braces. The following two lines are equivalent:

dsbulk load --connector.json.deserializationFeatures '{ USE_BIG_DECIMAL_FOR_FLOATS : true }'
dsbulk load --connector.json.deserializationFeatures 'USE_BIG_DECIMAL_FOR_FLOATS : true'

Exit codes

The dsbulk command can return the following exit codes to indicate the status of an operation:

0: STATUS_OK
1: STATUS_COMPLETED_WITH_ERRORS
2: STATUS_ABORTED_TOO_MANY_ERRORS
3: STATUS_ABORTED_FATAL_ERROR
4: STATUS_INTERRUPTED
5: STATUS_CRASHED

DSBulk synopsis and usage

Synopsis

Get help

Get the version

Escape and quote command line arguments

Syntactic sugar for string, list, and map arguments

Exit codes

Was this helpful?

Give Feedback