$vector in collections
$vector
is a reserved field in documents.
It stores a vector embedding that is used for vector search and for the vector search component of hybrid search.
Only vector-enabled collections support the $vector
field.
For more information, see Create a collection that can store vector embeddings.
Insert and update a document’s $vector
field
When you insert or update a document, you can use the $vector
field to store a vector embedding for the document.
For an example, see Insert documents with vector embeddings.
All vector embeddings in a collection should be generated by the same model with the same dimensions. Using mismatched embeddings produces unreliable and incorrect results in vector searches. Astra DB only checks that the dimensions are the same; it doesn’t check whether the embeddings are from different models.
If your collection has an embedding provider integration, you can use the $vectorize
field to automatically generate a vector embedding from a string.
The Data API stores the generated vector embedding in the document’s $vector
field.
However, you can’t include both the $vector
and $vectorize
fields in the same insert or update operation.
Use $vector
for vector search and hybrid search
When you find, update, replace, or delete documents, you can use the $vector
field to perform a vector search.
For an example, see Use vector search to find documents.
Similarly, when you find and rerank documents, you can use the $vector
field to perform a hybrid search.
For an example, see Find documents with a hybrid search.
Return the $vector
field
By default, the Data API excludes the $vector
field from returned documents.
If you want the Data API to return the $vector
field, you must use a projection to explicitly include the $vector
field in the response.
Binary encoding of vector embeddings
-
Python
-
TypeScript
-
Java
-
curl
When inserting or updating documents, you can specify the $vector
field as an array of floats or use the astrapy.data_types.DataAPIVector
class to represent and encode the vector embedding.
Similarly, for vector searches, you can provide the search vector as an array of floats or use the astrapy.data_types.DataAPIVector
class.
DataAPIVector
is a wrapper around a list of floats.
from astrapy.data_types import DataAPIVector
vector = DataAPIVector([.08, .68, .30])
For collections and documents, regardless of whether you use a DataAPIVector
object or a list of floats, vector embeddings are binary-encoded by default, which improves performance.
To change the default encoding, see Serdes Options and Custom Data Types.
When you read the value of a $vector
field, the client always returns a DataAPIVector
object, unless you change the default ser/des behavior.
For more information, see DataAPIVector.
When inserting or updating documents, you can specify the $vector
field as an array of floats or use the DataAPIVector
class to represent and encode the vector embedding.
Similarly, for vector searches, you can provide the search vector as an array of floats or use the DataAPIVector
class.
DataAPIVector
is a wrapper around an array of floats.
import { DataAPIVector } from '@datastax/astra-db-ts';
const vector = new DataAPIVector([0.4, -0.6, 0.2]);
For collections and documents, regardless of whether you use a DataAPIVector
object or a list of floats, the vector embeddings are binary-encoded by default, which improves performance.
To change the default encoding, see Custom Ser/Des.
When inserting or updating documents, you can specify the $vector
field as an array of floats or use the `DataAPIVector
class to represent and encode the vector embedding.
Similarly, for vector searches, you can provide the search vector as an array of floats or use the DataAPIVector
class.
DataAPIVector
is a wrapper around an array of floats.
import com.datastax.astra.client.core.vector.DataAPIVector;
DataAPIVector vector = new DataAPIVector(new float[] {.1f, .2f});
When you send a DataAPIVector
object, the vector embeddings are binary-encoded by default.
DataStax recommends that you always use a DataAPIVector
object instead of a list of floats to improve performance.
For more information, see DataAPIVector.
When inserting or updating documents with HTTP, you can specify the $vector
field using either an array of floats or a Base64-encoded string with $binary
.
Similarly, for vector searches, you can provide the search vector as an array of floats or use $binary
.
$binary
can be more performant.
For examples, see Insert documents with vector embeddings and Use vector search to find documents.
If you use $binary
, the underlying bytes must represent 32-bit floating point values in big-endian format.
The byte sequence must be Base64-encoded, with =
padding if needed. For the detailed specification of the encoding, see Write vector data.
When you read the value of a $vector
field, the Data API returns either a list of floats or a binary-encoded string, depending
on the format used when writing the document in the collection.