Loading data without a configuration file

The dsbulk command examples often show a parameter such as -url filename.csv or -url filename.json. Optionally, you can load or unload data from/to compressed CSV or JSON files. For details, refer to the --connector.(csv|json).compression option.

Load CSV or JSON data with a dsbulk load command.

To load data into a cloud-based DataStax Astra DB database, specify the path to the secure connect bundle ZIP file. It contains the security certificates and credentials for your database. Also specify the username and password entered when creating the database. For information about downloading the secure connect bundle ZIP via the Astra Portal, in advance of entering the dsbulk command, see Manage application tokens in the Astra DB documentation.

Load data from a local file

Load data from a local file export.csv with headers into keyspace ks1 and table table1:

DataStax Astra databases

dsbulk load -url export.csv -k ks1 -t table1 \
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password -header true

HCD / DSE / open source Cassandra databases

This dsbulk example shows how you can load a previously exported CSV data file into an HCD, DSE, or Cassandra database:

dsbulk load -url export.csv -k ks1 -t table1 -h '10.200.1.3, 10.200.1.4' -header true

The url value can designate the path to a resource, such as a local file, or a web URL from which to read/write data.

Specify an external data source

DataStax Astra databases

dsbulk load -url https://svr/data/export.csv -k ks1 -t table1 \
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

HCD / DSE / open source Cassandra databases

For loading data into an HCD, DSE, or Cassandra database, you can indicate a port for the cluster hosts.

dsbulk load -url https://svr/data/export.csv -k ks1 -t table1 -h '10.200.1.3, 10.200.1.4' -port 9876

Specify a file with URLs

Specify a file that contains a list of multiple, well-formed URLs for the CSV or JSON data files to load:

DataStax Astra databases

dsbulk load --connector.json.urlfile "my/local/multiple-input-data-urls.txt" -k ks1 -t table1 \
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

HCD / DSE / open source Cassandra databases

dsbulk load --connector.json.urlfile "my/local/multiple-input-data-urls.txt" -k ks1 -t table1 -h '10.200.1.3'

Load CSV data from stdin

Load CSV data from stdin as it is generated from a loading script generate_data. The data is loaded to the keyspace ks1 and table table1. If not specified, the field names are read from a header row in the input file.

DataStax Astra databases

generate_data | dsbulk load -url stdin:/ -k ks1 -t table1 \
-b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

HCD / DSE / open source Cassandra databases

generate_data | dsbulk load -url stdin:/ -k ks1 -t table1

Load CSV data from a CSV file to a graph vertex label table

Load CSV data from person.csv. The data is loaded to the graph graph1 and table vertex_label1.

HCD / DSE databases

dsbulk load -url data/vertices/person.csv -g graph1 -v vertex_label1 \
-delim '|' -header true --schema.allowMissingFields true

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com