Unloading data examples

Unloading is the process of using the dsbulk unload command to extract data from a database table into a CSV or JSON file. Many dsbulk options used in loading and unloading are the same.

The dsbulk command examples often show a parameter such as -url filename.csv or -url filename.json. Optionally, you can load or unload data from/to compressed CSV or JSON files. For details, refer to the --connector.(csv|json).compression option.

Databases supported by DataStax Bulk Loader

DataStax Bulk Loader® supports the use of the dsbulk load, dsbulk unload, and dsbulk count commands with:

  • DataStax Astra DB

  • DataStax Enterprise (DSE) 5.1 and 6.8 databases

  • Open source Apache Cassandra® 2.1 and later databases

dsbulk unload examples

Unload data to stdout from the ks1.table1 table in a cluster with a localhost contact point. Column names in the table map to field names in the data. Field names must be emitted in a header row in the output:

dsbulk unload -k ks1 -t table1

Unload data to stdout from the ks1.table1 table and gzip the result:

dsbulk unload -k ks1 -t table1 | gzip > table1.gz

Unload data to a local directory (which may not yet exist):

dsbulk unload -url ~/data-export -k ks1 -t table1

Unload data on a cluster with authentication and SSL enabled:

dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -u myaccount -p mypassword
      --driver.auth.provider DsePlainTextAuthProvider
      --driver.ssl.provider JDK
      --driver.ssl.keystore.path /etc/dse/keystores/client.keystore
      --driver.ssl.keystore.password mysslkspass
      --driver.ssl.truststore.path /etc/dse/keystores/client.truststore
      --driver.ssl.truststore.password myssltrustpass
      -k mykeyspace -t mytable -url mytable_backup

For more details, refer to SSL options.

Unload and count data from a materialized view named health_data_mv to a local directory:

dsbulk unload count -k healthcare -t health_data_mv -url ~/export-mv

In DataStax Bulk Loader 1.3.2 and later, you can use dsbulk unload count and specify a materialized view on the -t or --table option. Previously, you could only use dsbulk unload count with tables. For related information, refer to Creating a materialized view.

Unload data using a search index:

dsbulk unload -query
  "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species
    FROM dsbulkblog.iris_with_id
    WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'"
 --executor.continuousPaging.enabled false

where a search index exists for the table iris_with_id. Continuous Paging must be disabled because DSE Search does not work with that feature. The solr_query requires double-escaping of the double-quotes in the solr_query string.

Faceted queries are not supported.

Unload data using a CQL query using function calls:

dsbulk unload -query
  "SELECT id, species, writetime(species) AS TIMESTAMP, ttl(species) AS TTL FROM dsbulkblog.iris_with_id"

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com