Unload data

Unloading is the process of using the dsbulk unload command to extract data from a database table into a CSV or JSON file. The load and unload commands share many of the same options.

When unloaded to a CSV file, column names in a table map to column names in the CSV data. When unloaded to a JSON file, column names in a table map to field names in the JSON data. Field names must be emitted in a header row in the output.

Unload locally without authentication

At minimum, you must specify the keyspace (-k, --schema.keyspace) and table (-t, --schema.table) from which to unload the data to a CSV or JSON file. The minimal command unloads data to stdout from a local cluster (localhost contact point) without authentication:

dsbulk unload -k ks1 -t table1

Unload with authentication

Pass the relevant options with your unload commands if your cluster requires authentication or uses SSL encryption.

You can pass the values directly or use a configuration file.

DSE, HCD, and Cassandra
Astra DB

If your database requires authentication, provide the username and password using the -u and -p options, respectively:

dsbulk unload -k ks1 -t table1 \
-u username -p password

If the cluster is remote, set the host and contact points in the driver options.

If your cluster has both authentication and SSL enabled, pass -u, -p, and the SSL options. For example:

dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -u myaccount -p mypassword
      --driver.auth.provider DsePlainTextAuthProvider
      --driver.ssl.provider JDK
      --driver.ssl.keystore.path /etc/dse/keystores/client.keystore
      --driver.ssl.keystore.password mysslkspass
      --driver.ssl.truststore.path /etc/dse/keystores/client.truststore
      --driver.ssl.truststore.password myssltrustpass
      -k mykeyspace -t mytable

dsbulk unload -k ks1 -t table1 \
-b "path/to/SCB.zip" -u token -p AstraCS:...

For Astra DB, the expected connection credentials are:

-b: Provide the path to the database’s Secure Connect Bundle (SCB) zip file. The SCB includes certificates and key files for SSL-encrypted connections as well as information about the database’s contact points.
-u: Set to the literal string token.
-p: Provide an application token. DataStax recommends using secure references to tokens, such as environment variables, rather than specifying them directly on the command line.

Set the unload destination

Use the -url option to specify the path and filename where you want to unload the data. This can be a path on the local file system or a valid remote URL. You can specify a filename only, path only, or both filename and path. The default path is export.csv in the DSBulk installation directory.

This example unloads data to a local CSV file named myData.csv. The file is created if it does not already exist. If it does exist, then it is overwritten.

dsbulk unload -url myData.csv -k ks1 -t table1

If the path includes directories that don’t exist, the command creates the necessary directories:

dsbulk unload -url ~/data-export/myData.csv -k ks1 -t table1

Unload to a compressed file

There are two ways to unload compressed files:

For supported file types, use the --connector.csv|json.compression option to specify the compression type to allow DSBulk to create the compressed file.
Pipe the unload command to your preferred compressor and filename:
```
dsbulk unload -k ks1 -t table1 | gzip > table1.gz
```

Unload data matching a CQL query

You can use the -query option to unload only data matching a given SELECT statement.

Faceted queries are not supported.

Unload data using a CQL query with a WHERE clause:

dsbulk unload -query
  "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species
    FROM dsbulkblog.iris_with_id
    WHERE species='Iris-setosa' AND token(id) > :start AND token(id) <= :end"

Unload data using a CQL query with function calls:

dsbulk unload -query
  "SELECT id, species, writetime(species) AS TIMESTAMP, ttl(species) AS TTL FROM dsbulkblog.iris_with_id"

Unload data with a secondary index query:

If a secondary index exists for a table, you can use a SELECT statement to unload data based on a given query on the secondary index.

The following example uses solr_query. With DSE Search, continuous paging must be disabled because DSE Search doesn’t work with that feature. The solr_query requires double-escaping of the double-quotes in the solr_query string.
```
dsbulk unload -query
  "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species
    FROM dsbulkblog.iris_with_id
    WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'"
 --executor.continuousPaging.enabled false
```

Unload a materialized view (MV)

Specify the materialized view name in the -t (--schema.table) parameter.

The following command unloads and counts data from a materialized view to a local directory:

dsbulk unload count -k healthcare -t health_data_mv -url ~/export-mv

Unload graph data from a vertex table to a CSV file

For DSE Graph tables, pass graph and vertex options with your unload command.

The following example unloads graph data from graph graph1 and table vertex_label1 to a CSV file:

dsbulk unload --schema.graph graph1 --schema.vertex vertex_label1 \
-delim '|' > unloaded_person_data.csv

Unload vector data

See Load and unload vector data.