Load and unload vector data

You can use dsbulk commands with CSV or JSON data that include vector<type, dimension> data.

This guide shows how to use DSBulk to load and unload vector data from an Astra DB database.

Create a table with a vector column and a vector index

  1. Use the Astra DB CQL console or standalone cqlsh to create a table with a vector column.

    This guide creates a table named foo in a keyspace named ks1. The table has two columns: Column i is an integer as well as the primary key, and column j is a vector with three dimensions.

    token@cqlsh> CREATE TABLE ks1.foo (
        i int PRIMARY KEY,
        j vector<float, 3>
    );
  2. Create a Storage-Attached Index (SAI) on the vector column to enable vector search:

    token@cqlsh> CREATE CUSTOM INDEX ann_index ON ks1.foo (j) USING 'StorageAttachedIndex';

    You can also use the Astra DB Data API to load vector data, create vector search indexes, and run vector searches on tables. For more information, see Find data with vector search.

  3. Optional: If you created a new table for this guide, run dsbulk unload to confirm that the ks1.foo table is empty and that you can connect to your Astra DB database:

    bin/dsbulk unload -k ks1 -t foo 2> /dev/null \
    -b "path/to/SCB.zip" -u token -p AstraCS:...

    The result should show zero rows unloaded:

    Result
    ...
    total | failed | rows/s | p50ms | p99ms | p999ms
        0 |      0 |      0 |  0.00 |  0.00 |   0.00
    ...

Load vector data

Load and unload vector data using your preferred file format.

In Astra DB, vector<type, dimension> is restricted to type float32. Use float type syntax in your JSON and CSV files, such as [8, 2.3, 58] for a vector with three dimensions.

Load vector data from a CSV file

  1. Prepare a sample data file with vector data:

    cat ../vector_test_data.csv
    vector_test_data.csv
    i,j
    1,"[8, 2.3, 58]"
    2,"[1.2, 3.4, 5.6]"
    5,"[23, 18, 3.9]"
  2. Load the data:

    bin/dsbulk load -url "./../vector_test_data.csv" -k ks1 -t foo \
    -b "path/to/SCB.zip" -u token -p AstraCS:...
    Result
    ...
    total | failed | rows/s | p50ms | p99ms | p999ms | batches
        3 |      0 |     22 |  5.10 |  6.91 |   6.91 |    1.00
    ...

Load vector data from a JSON file

  1. Create three sample JSON files with vector data, and store them in the same directory. Each file contains data for one row.

    1. Create a sample JSON file for primary key 1:

      cat ../vector_test_data_json/one.json
      one.json
      {
          "i":1,
          "j":[8, 2.3, 58]
      }
    2. Create a sample JSON file for primary key 2:

      cat ../vector_test_data_json/two.json
      two.json
      {
          "i":2,
          "j":[1.2, 3.4, 5.6]
      }
    3. Create a sample JSON file for primary key 5:

      cat ../vector_test_data_json/five.json
      five.json
      {
          "i":5,
          "j":[23, 18, 3.9]
      }
  2. Load the contents of all three sample JSON files from the directory where you created the files:

    bin/dsbulk load -url "./../vector_test_data_json" -k ks1 -t foo -c json \
    -b "path/to/SCB.zip" -u token -p AstraCS:...
    Result
    ...
    total | failed | rows/s | p50ms | p99ms | p999ms | batches
        3 |      0 |     16 | 37.18 | 39.58 |  39.58 |    1.00
    ...

Verify that the data was written to the table

Use the Astra DB CQL console or standalone cqlsh to read from the table and verify that the data was loaded correctly:

Run a vector search
token@cqlsh> select j from ks1.foo order by j ann of [3.4, 7.8, 9.1] limit 1;
Result
  j
 -----------------
  [1.2, 3.4, 5.6]

 (1 rows)
Select all rows (small tables only)
token@cqlsh> select * from ks1.foo;
Result
  i | j
 ---+-----------------
  5 |   [23, 18, 3.9]
  1 |    [8, 2.3, 58]
  2 | [1.2, 3.4, 5.6]

 (3 rows)

Unload vector data

Unload rows to a CSV or JSON file using the dsbulk unload command.

Unload in CSV format

Unload all rows in CSV format
bin/dsbulk unload -k test -t foo \
-b "path/to/SCB.zip" -u token -p AstraCS:...
Result
...
i,j
5,"[23.0, 18.0, 3.9]"
2,"[1.2, 3.4, 5.6]"
1,"[8.0, 2.3, 58.0]"
total | failed | rows/s | p50ms | p99ms | p999ms
    3 |      0 |     16 |  2.25 |  2.97 |   2.97
...
Unload specific rows with dsbulk unload -query

The -query parameter accepts a CQL statement that selects specific rows to unload. The built-in minimal Cassandra Query Language (CQL) parser supports these operations.

For tables with vector data, you can use a vector search (ann keyword) to select specific rows to unload. For example:

bin/dsbulk unload -query "select j from ks1.foo order by j ann of [3.4, 7.8, 9.1] limit 1" \
-b "path/to/SCB.zip" -u token -p AstraCS:...
Result
...
j
"[1.2, 3.4, 5.6]"
total | failed | rows/s | p50ms | p99ms | p999ms
    1 |      0 |      7 |  8.21 |  8.22 |   8.22
...

Unload in JSON format

Unload all rows in JSON format
bin/dsbulk unload -k ks1 -t foo \
-c json \
-b "path/to/SCB.zip" -u token -p AstraCS:...
Result
...
{"i":5,"j":[23.0,18.0,3.9]}
{"i":1,"j":[8.0,2.3,58.0]}
{"i":2,"j":[1.2,3.4,5.6]}
total | failed | rows/s | p50ms | p99ms | p999ms
    3 |      0 |     14 |  2.58 |  2.87 |   2.87
...
Unload specific rows with dsbulk unload -query

The -query parameter accepts a CQL statement that selects specific rows to unload.

For tables with vector data, you can use a vector search (ann keyword) to select specific rows to unload. For example:

bin/dsbulk unload -query "select j from ks1.foo order by j ann of [3.4, 7.8, 9.1] limit 1" \
-c json \
-b "path/to/SCB.zip" -u token -p AstraCS:...

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM