Loading and unloading data with DataStax Bulk Loader

Use dsbulk to load and unload data in your DataStax Astra database.

Use DataStax Bulk Loader (dsbulk) to load and unload data in CSV or JSON format in your DataStax Astra database efficiently and reliably.

You can use dsbulk as a standalone tool to remotely connect to a cluster. The tool is not required to run locally on a cluster node, but can be used in this configuration.

Tip: The dsbulk command examples often show a parameter such as -url filename.csv or -url filename.json. Optionally, you can load or unload data from/to compressed CSV or JSON files. For details, refer to the --connector.(csv|json).compression option.

Prerequisites

  1. Download dsbulk.
  2. Unpack the distribution to your machine.
    tar -xzfv dsbulk-1.4.1.tar.gz

Procedure

Connect dsbulk to your Astra database by including the path to the secure connect bundle, and the username and password entered when creating the database.

Use the -b option to specify the location of the secure connect bundle. The specified location must be a path on the local filesystem or a valid URL.

Use the following examples for loading and unloading data:

Note: If a secure connect bundle is specified, any of the following options are ignored and a warning is logged:
  • Contact points
  • Consistency level other than LOCAL_QUORUM (only for loading operations)
  • SSL configurations

See the --driver.basic.cloud.secureConnectBundle parameter for more information.

Loading data

Load CSV or JSON data with a dsbulk load command.

Load data from a local file

Load data from a local file export.csv with headers into keyspace ks1 and table table1:

dsbulk load -url export.csv -k ks1 -t table1 \ 
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password -header true

url can designate the path to a resource, such as a local file, or a web URL from which to read/write data.

Specify an external data source

Specify an external source of data:

dsbulk load -url https://svr/data/export.csv -k ks1 -t table1 \ 
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

Specify a file with URLs

Specify a file that contains a list of multiple, well-formed URLs for the CSV or JSON data files to load:

dsbulk load --connector.json.urlfile "my/local/multiple-input-data-urls.txt" -k ks1 -t table1 \ 
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

Load CSV data from stdin

Load CSV data from stdin as it is generated from a loading script generate_data. The data is loaded to the keyspace ks1 and table table1. If not specified, the field names are read from a header row in the input file.

generate_data | dsbulk load -url stdin:/ -k ks1 -t table1 \ 
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

Unloading data

Unload CSV or JSON data with a dsbulk unload command.

Unload data from an external file

Specify the external file to write the data to from keyspace ks1 and table table1:

dsbulk unload -url myData.csv -k ks1 -t table1 \ 
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

url can designate a path on the local filesystem or a valid URL.

Unload CSV data from stdin

Unload CSV data from stdin as it is generated from a loading script generate_data. The data is unloaded from the keyspace ks1 and table table1. If not specified, the field names are read from a header row in the input file.

generate_data | dsbulk load -url stdin:/ -k ks1 -t table1 \ 
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

What's next

Review the DataStax Bulk Loader Getting Started Guide to learn more about using dsbulk.