Unloading data examples

Unloading is the process of using the dsbulk unload command to extract data from a database table into a CSV or JSON file. Many dsbulk options used in loading and unloading are the same.

The dsbulk command examples often show a parameter such as -url filename.csv or -url filename.json. Optionally, you can load or unload data from/to compressed CSV or JSON files. For details, refer to the --connector.(csv|json).compression option.

Databases supported by DataStax Bulk Loader

DataStax Bulk Loader for Apache Cassandra® supports the use of the dsbulk load, dsbulk unload, and dsbulk count commands with:

  • DataStax Astra cloud databases

  • DataStax Enterprise (DSE) 4.7 and later databases

  • Open source Apache Cassandra® 2.1 and later databases

dsbulk unload examples

Unload data to stdout from the ks1.table1 table in a cluster with a localhost contact point. Column names in the table map to field names in the data. Field names must be emitted in a header row in the output:

dsbulk unload -k ks1 -t table1

Unload data to stdout from the ks1.table1 table and gzip the result:

dsbulk unload -k ks1 -t table1 | gzip > table1.gz

Unload data to a local directory (which may not yet exist):

dsbulk unload -url ~/data-export -k ks1 -t table1

Unload data on a cluster with authentication and SSL enabled:

dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -u myaccount -p mypassword
      --driver.auth.provider DsePlainTextAuthProvider
      --driver.ssl.provider JDK
      --driver.ssl.keystore.path /etc/dse/keystores/client.keystore
      --driver.ssl.keystore.password mysslkspass
      --driver.ssl.truststore.path /etc/dse/keystores/client.truststore
      --driver.ssl.truststore.password myssltrustpass
      -k mykeyspace -t mytable -url mytable_backup

For more details, refer to SSL options.

Unload and count data from a materialized view named health_data_mv to a local directory:

dsbulk unload count -k healthcare -t health_data_mv -url ~/export-mv

In DataStax Bulk Loader 1.3.2 and later, you can use dsbulk unload count and specify a materialized view on the -t or --table option. Previously, you could only use dsbulk unload count with tables. For related information, refer to Creating a materialized view.

Unload data using a search index:

dsbulk unload -query
  "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species
    FROM dsbulkblog.iris_with_id
    WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'"
 --executor.continuousPaging.enabled false

where a search index exists for the table iris_with_id. Continuous Paging must be disabled because DSE Search does not work with that feature. The solr_query requires double-escaping of the double-quotes in the solr_query string.

Faceted queries are not supported.

Unload data using a CQL query using function calls:

dsbulk unload -query
  "SELECT id, species, writetime(species) AS TIMESTAMP, ttl(species) AS TTL FROM dsbulkblog.iris_with_id"