Unload data
Unloading is the process of using the dsbulk unload
command to extract data from a database table into a CSV or JSON file.
The load
and unload
commands share many of the same options.
When unloaded to a CSV file, column names in a table map to column names in the CSV data. When unloaded to a JSON file, column names in a table map to field names in the JSON data. Field names must be emitted in a header row in the output.
Unload locally without authentication
At minimum, you must specify the keyspace (-k
, --schema.keyspace
) and table (-t
, --schema.table
) from which to unload the data to a CSV or JSON file.
The minimal command unloads data to stdout
from a local cluster (localhost
contact point) without authentication:
dsbulk unload -k ks1 -t table1
Unload with authentication
Pass the relevant options with your unload
commands if your cluster requires authentication or uses SSL encryption.
You can pass the values directly or use a configuration file.
-
DSE, HCD, and Cassandra
-
Astra DB
If your database requires authentication, provide the username and password using the -u
and -p
options, respectively:
dsbulk unload -k ks1 -t table1 \
-u username -p password
If the cluster is remote, set the host and contact points in the driver options.
If your cluster has both authentication and SSL enabled, pass -u
, -p
, and the SSL options.
For example:
dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -u myaccount -p mypassword
--driver.auth.provider DsePlainTextAuthProvider
--driver.ssl.provider JDK
--driver.ssl.keystore.path /etc/dse/keystores/client.keystore
--driver.ssl.keystore.password mysslkspass
--driver.ssl.truststore.path /etc/dse/keystores/client.truststore
--driver.ssl.truststore.password myssltrustpass
-k mykeyspace -t mytable
dsbulk unload -k ks1 -t table1 \
-b "path/to/SCB.zip" -u token -p AstraCS:...
For Astra DB, the expected connection credentials are:
-
-b
: Provide the path to the database’s Secure Connect Bundle (SCB) zip file. The SCB includes certificates and key files for SSL-encrypted connections as well as information about the database’s contact points. -
-u
: Set to the literal stringtoken
. -
-p
: Provide an application token. DataStax recommends using secure references to tokens, such as environment variables, rather than specifying them directly on the command line.
Set the unload destination
Use the -url
option to specify the path and filename where you want to unload the data.
This can be a path on the local file system or a valid remote URL.
You can specify a filename only, path only, or both filename and path.
The default path is export.csv
in the DSBulk installation directory.
This example unloads data to a local CSV file named myData.csv
.
The file is created if it does not already exist.
If it does exist, then it is overwritten.
dsbulk unload -url myData.csv -k ks1 -t table1
If the path includes directories that don’t exist, the command creates the necessary directories:
dsbulk unload -url ~/data-export/myData.csv -k ks1 -t table1
Unload to a compressed file
There are two ways to unload compressed files:
-
For supported file types, use the
--connector.csv|json.compression
option to specify the compression type to allow DSBulk to create the compressed file. -
Pipe the
unload
command to your preferred compressor and filename:dsbulk unload -k ks1 -t table1 | gzip > table1.gz
Unload data matching a CQL query
You can use the -query
option to unload only data matching a given SELECT
statement.
Faceted queries are not supported. |
-
Unload data using a CQL query with a WHERE clause:
dsbulk unload -query "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species FROM dsbulkblog.iris_with_id WHERE species='Iris-setosa' AND token(id) > :start AND token(id) <= :end"
-
Unload data using a CQL query with function calls:
dsbulk unload -query "SELECT id, species, writetime(species) AS TIMESTAMP, ttl(species) AS TTL FROM dsbulkblog.iris_with_id"
-
Unload data with a secondary index query:
If a secondary index exists for a table, you can use a
SELECT
statement to unload data based on a given query on the secondary index.The following example uses
solr_query
. With DSE Search, continuous paging must be disabled because DSE Search doesn’t work with that feature. Thesolr_query
requires double-escaping of the double-quotes in thesolr_query
string.dsbulk unload -query "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species FROM dsbulkblog.iris_with_id WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'" --executor.continuousPaging.enabled false
Unload a materialized view (MV)
Specify the materialized view name in the -t
(--schema.table
) parameter.
The following command unloads and counts data from a materialized view to a local directory:
dsbulk unload count -k healthcare -t health_data_mv -url ~/export-mv
Unload graph data from a vertex table to a CSV file
For DSE Graph tables, pass graph and vertex options with your unload
command.
The following example unloads graph data from graph graph1
and table vertex_label1
to a CSV file:
dsbulk unload --schema.graph graph1 --schema.vertex vertex_label1 \
-delim '|' > unloaded_person_data.csv