Unload data
Use the dsbulk unload command to extract data from a database table into a CSV or JSON file:
dsbulk unload \
-url DESTINATION \
-k KEYSPACE_NAME -t TABLE_NAME \
OPTIONS
Replace the following:
-
DESTINATION: The location where you want to write the extracted data.When unloaded to a CSV file, column names in a table map to column names in the CSV data.
When unloaded to a JSON file, column names in a table map to field names in the JSON data.
Field names must be emitted in a header row in the output.
-
KEYSPACE_NAMEandTABLE_NAME: The name of the keyspace (-k,--schema.keyspace) and table (-t,--schema.table) from which you want to unload the data.Or, use other schema options to specify the target table, such as
-queryto select a table with a CQL statement or-gto unload graph data. -
OPTIONS: Any options for theunloadcommand, authentication options, or other DSBulk options.
|
To avoid unloading excessive amounts of tombstones from DSE, HCD, or open-source Cassandra clusters, DataStax recommends that you run garbage collection (GC) at the cell level before you run |
Unload locally without authentication
The following command unloads data to stdout from a local cluster (localhost contact point) without authentication:
dsbulk unload -k ks1 -t table1
Unload with authentication
Pass the relevant options with your unload commands if your cluster requires authentication or uses SSL encryption.
You can pass the values directly or use a configuration file.
-
DSE, HCD, and Cassandra
-
Astra DB
If your database requires authentication, provide the username and password using the -u and -p options, respectively:
dsbulk unload -k ks1 -t table1 \
-u username -p password
If the cluster is remote, set the host and contact points in the driver options.
If your cluster has both authentication and SSL enabled, pass -u, -p, and the SSL options.
For example:
dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -u myaccount -p mypassword
--driver.auth.provider DsePlainTextAuthProvider
--driver.ssl.provider JDK
--driver.ssl.keystore.path /etc/dse/keystores/client.keystore
--driver.ssl.keystore.password mysslkspass
--driver.ssl.truststore.path /etc/dse/keystores/client.truststore
--driver.ssl.truststore.password myssltrustpass
-k mykeyspace -t mytable
dsbulk unload -k ks1 -t table1 \
-b "path/to/SCB.zip" -u token -p AstraCS:...
For Astra DB, the expected connection credentials are:
-
-b: Provide the path to the database’s Secure Connect Bundle (SCB) zip file. The SCB includes certificates and key files for SSL-encrypted connections as well as information about the database’s contact points. -
-u: Set to the literal stringtoken. -
-p: Provide an application token. DataStax recommends using secure references to tokens, such as environment variables, rather than specifying them directly on the command line.
Set the unload destination
Use the -url option to specify the filesystem path or URL where you want to write the unloaded data.
Unload to stdout
If -url is omitted or you set -url stdout:/, the unloaded data is written to standard output (stdout).
This is useful if you want to pipe the output from stdout to another command.
For example, you could pipe the output to an AWS CLI command to upload the data to an S3 bucket.
Unload to a local directory
When -url is a filesystem path, the destination for unload is always treated as a directory.
If any directories in the path don’t exist, DSBulk attempts to create them.
The -url unfurls to the long form of file://hostname/path/to/directory.
Relative paths are resolved from the current working directory.
If the path starts with ~/, it expands to the user’s home directory.
|
The name of the output file is set by the If you run the same |
-
Unload to a subdirectory under the current working directory:
dsbulk unload -url myData -k ks1 -t table1 -
Unload to a subdirectory under the user’s home directory:
dsbulk unload -url ~/data-export/myData -k ks1 -t table1
Unload to a URL
If the destination begins with http: or https:, DSBulk attempts to write the unloaded data directly to the specified URL.
Some connector options are ignored when the destination is a URL, such as fileNameFormat, and not all URLs are supported for unloading.
Unload to a compressed file
There are two ways to unload compressed files:
-
For supported file types, use the
--connector.csv.compressionor--connector.json.compressionoption to specify the compression type to allow DSBulk to create the compressed file. -
Pipe the
unloadcommand to your preferred compressor and filename:dsbulk unload -k ks1 -t table1 | gzip > table1.gz
Unload data matching a CQL query
You can use the -query option to unload only data matching a given SELECT statement.
|
Faceted queries are not supported. |
-
Unload data using a CQL query with a WHERE clause:
dsbulk unload -query "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species FROM dsbulkblog.iris_with_id WHERE species='Iris-setosa' AND token(id) > :start AND token(id) <= :end" -
Unload data using a CQL query with function calls:
dsbulk unload -query "SELECT id, species, writetime(species) AS TIMESTAMP, ttl(species) AS TTL FROM dsbulkblog.iris_with_id" -
Unload data with a secondary index query:
If a secondary index exists for a table, you can use a
SELECTstatement to unload data based on a given query on the secondary index.The following example uses
solr_query. With DSE Search, continuous paging must be disabled because DSE Search doesn’t work with that feature. Thesolr_queryrequires double-escaping of the double-quotes in thesolr_querystring.dsbulk unload -query "SELECT id, petal_length, petal_width, sepal_length, sepal_width, species FROM dsbulkblog.iris_with_id WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'" --executor.continuousPaging.enabled false
Unload a materialized view (MV)
Specify the materialized view name in the -t (--schema.table) parameter.
The following command unloads and counts data from a materialized view to a local directory:
dsbulk unload count -k healthcare -t health_data_mv -url ~/export-mv
Unload graph data
You can use DSBulk to unload data from graphs created with DSE Graph 6.8.0 and later.
To target graph data, you must specify one of the following:
-
A graph name (
--schema.graph) and a vertex label (--schema.vertex). -
A graph name (
--schema.graph), edge label (--schema.edge), incoming vertex label (--schema.from), and outgoing vertex label (--schema.to). -
A plain CQL query (
--schema.query) that targets graph elements. Gremlin queries aren’t supported.
The following example unloads graph data from a graph named graph1 and a vertex label table named vertex_label1 to a CSV file:
dsbulk unload --schema.graph graph1 --schema.vertex vertex_label1 \
-delim '|' > unloaded_person_data.csv