• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Bulk Loader

    • About DataStax Bulk Loader
    • Release notes
    • Architecture
    • Installing
    • Getting Started
      • Loading data
      • Unloading data
      • Counting data in tables
      • Creating configuration files
      • Loading tables that contain static and non-static columns
      • Using SSL with dsbulk
      • Printing cluster information
    • Kerberos client authentication
    • Reference
      • dsbulk
        • Loading data examples
        • Unloading data examples
        • Counting data example
        • Exit codes
      • Common options
      • Connector options
      • Count options
      • Schema options
      • Batch options
      • Codec options
      • Driver options
      • Engine options
      • Executor options
      • Logging options
      • Monitoring options
  • DataStax Bulk Loader
  • Reference
  • dsbulk
  • Unloading data examples

Unloading data examples

Unloading is the process of using the dsbulk unload command to extract data from a database table into a CSV or JSON file. Many dsbulk options used in loading and unloading are the same.

The dsbulk command examples often show a parameter such as -url filename.csv or -url filename.json. Optionally, you can load or unload data from/to compressed CSV or JSON files. For details, refer to the --connector.(csv|json).compression option.

Databases supported by DataStax Bulk Loader

DataStax Bulk Loader for Apache Cassandra® supports the use of the dsbulk load, dsbulk unload, and dsbulk count commands with:

  • DataStax Astra cloud databases

  • DataStax Enterprise (DSE) 4.7 and later databases

  • Open source Apache Cassandra® 2.1 and later databases

dsbulk unload examples

Unload data to stdout from the ks1.table1 table in a cluster with a localhost contact point. Column names in the table map to field names in the data. Field names must be emitted in a header row in the output:

dsbulk unload -k ks1 -t table1

Unload data to stdout from the ks1.table1 table and gzip the result:

dsbulk unload -k ks1 -t table1 | gzip > table1.gz

Unload data to a local directory (which may not yet exist):

dsbulk unload -url ~/data-export -k ks1 -t table1

Unload data on a cluster with authentication and SSL enabled:

dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -u myaccount -p mypassword
      --driver.auth.provider DsePlainTextAuthProvider
      --driver.ssl.provider JDK
      --driver.ssl.keystore.path /etc/dse/keystores/client.keystore
      --driver.ssl.keystore.password mysslkspass
      --driver.ssl.truststore.path /etc/dse/keystores/client.truststore
      --driver.ssl.truststore.password myssltrustpass
      -k mykeyspace -t mytable -url mytable_backup

For more details, refer to SSL options.

Unload and count data from a materialized view named health_data_mv to a local directory:

dsbulk unload count -k healthcare -t health_data_mv -url ~/export-mv

In DataStax Bulk Loader 1.3.2 and later, you can use dsbulk unload count and specify a materialized view on the -t or --table option. Previously, you could only use dsbulk unload count with tables. For related information, refer to Creating a materialized view.

Unload data using a search index:

dsbulk unload -query
  "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species
    FROM dsbulkblog.iris_with_id
    WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'"
 --executor.continuousPaging.enabled false

where a search index exists for the table iris_with_id. Continuous Paging must be disabled because DSE Search does not work with that feature. The solr_query requires double-escaping of the double-quotes in the solr_query string.

Faceted queries are not supported.

Unload data using a CQL query using function calls:

dsbulk unload -query
  "SELECT id, species, writetime(species) AS TIMESTAMP, ttl(species) AS TTL FROM dsbulkblog.iris_with_id"
Loading data examples Counting data example

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage