$vector in collections

$vector is a reserved field in documents. It stores a vector embedding that is used for vector search and for the vector search component of hybrid search.

Only vector-enabled collections support the $vector field. For more information, see Create a collection that can store vector embeddings.

Insert and update a document’s $vector field

When you insert or update a document, you can use the $vector field to store a vector embedding for the document. For an example, see Insert documents with vector embeddings.

All vector embeddings in a collection should be generated by the same model with the same dimensions. Using mismatched embeddings produces unreliable and incorrect results in vector searches. Astra DB only checks that the dimensions are the same; it doesn’t check whether the embeddings are from different models.

If your collection has an embedding provider integration, you can use the $vectorize field to automatically generate a vector embedding from a string. The Data API stores the generated vector embedding in the document’s $vector field. However, you can’t include both the $vector and $vectorize fields in the same insert or update operation.

When you find, update, replace, or delete documents, you can use the $vector field to perform a vector search. For an example, see Use vector search to find documents.

Similarly, when you find and rerank documents, you can use the $vector field to perform a hybrid search. For an example, see Find documents with a hybrid search.

Return the $vector field

By default, the Data API excludes the $vector field from returned documents. If you want the Data API to return the $vector field, you must use a projection to explicitly include the $vector field in the response.

Binary encoding of vector embeddings

  • Python

  • TypeScript

  • Java

  • curl

When inserting or updating documents, you can specify the $vector field as an array of floats or use the astrapy.data_types.DataAPIVector class to represent and encode the vector embedding. Similarly, for vector searches, you can provide the search vector as an array of floats or use the astrapy.data_types.DataAPIVector class. DataAPIVector is a wrapper around a list of floats.

from astrapy.data_types import DataAPIVector

vector = DataAPIVector([.08, .68, .30])

For collections and documents, regardless of whether you use a DataAPIVector object or a list of floats, vector embeddings are binary-encoded by default, which improves performance. To change the default encoding, see Serdes Options and Custom Data Types.

When you read the value of a $vector field, the client always returns a DataAPIVector object, unless you change the default ser/des behavior.

For more information, see DataAPIVector.

When inserting or updating documents, you can specify the $vector field as an array of floats or use the DataAPIVector class to represent and encode the vector embedding. Similarly, for vector searches, you can provide the search vector as an array of floats or use the DataAPIVector class. DataAPIVector is a wrapper around an array of floats.

import { DataAPIVector } from '@datastax/astra-db-ts';

const vector = new DataAPIVector([0.4, -0.6, 0.2]);

For collections and documents, regardless of whether you use a DataAPIVector object or a list of floats, the vector embeddings are binary-encoded by default, which improves performance. To change the default encoding, see Custom Ser/Des.

When inserting or updating documents, you can specify the $vector field as an array of floats or use the `DataAPIVector class to represent and encode the vector embedding. Similarly, for vector searches, you can provide the search vector as an array of floats or use the DataAPIVector class. DataAPIVector is a wrapper around an array of floats.

import com.datastax.astra.client.core.vector.DataAPIVector;

DataAPIVector vector = new DataAPIVector(new float[] {.1f, .2f});

When you send a DataAPIVector object, the vector embeddings are binary-encoded by default. DataStax recommends that you always use a DataAPIVector object instead of a list of floats to improve performance.

For more information, see DataAPIVector.

When inserting or updating documents with HTTP, you can specify the $vector field using either an array of floats or a Base64-encoded string with $binary. Similarly, for vector searches, you can provide the search vector as an array of floats or use $binary. $binary can be more performant. For examples, see Insert documents with vector embeddings and Use vector search to find documents.

If you use $binary, the underlying bytes must represent 32-bit floating point values in big-endian format. The byte sequence must be Base64-encoded, with = padding if needed. For the detailed specification of the encoding, see Write vector data.

When you read the value of a $vector field, the Data API returns either a list of floats or a binary-encoded string, depending on the format used when writing the document in the collection.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com