$vector and $vectorize in collections
When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data: $vector
and $vectorize
.
Which fields you can use depends on the collection configuration.
Embedding generation methods
When you create a collection, you decide if the collection can store structured vector data. This is known as a vector-enabled collection. For vector-enabled collections, you also decide how to provide embeddings. You must decide which options you need when you create the collection:
-
For all vector-enabled collections, you can provide embeddings when you insert data (also known as bring your own embeddings).
-
You can configure the collection to automatically generate embeddings with vectorize (the
$vectorize
reserved field).You can’t use
$vectorize
in a collection where you did not enable vectorize when you created the collection. If you want to use vectorize at all, then you must enable vectorize when you create the collection. -
If you enable vectorize, you can use both options interchangeably but not simultaneously. For example, you can use vectorize to generate embeddings for a batch of documents, and then insert a few documents with pre-generated embeddings.
To bring your own embeddings to a collection that uses vectorize, when you insert a document, include the document’s embedding in the
$vector
field.It is critical that all embeddings in a collection are generated by the same model with the same dimensions, regardless of whether you use vectorize, bring your own embeddings, or both.
Astra DB only checks that the dimensions are the same; it does not produce an error if the embeddings are from different models. You must ensure that the embeddings are compatible. Using mismatched embeddings produces unreliable and incorrect results in vector searches.
-
For all vector-enabled collections, you can insert non-vector data.
Reserved fields
- $vector
-
The
$vector
parameter is a reserved field that stores vectors.To bring your own embeddings when you insert documents, include
$vector
for each document that has an embedding.If the collection uses vectorize, you have the option to omit
$vector
when you insert documents. You can use$vectorize
to generate an embedding, and then Astra DB populates the document’s$vector
field with the automatically generated embedding. Alternatively, if you want to bring your own embeddings to a collection that uses vectorize, you can include the$vector
field when you insert documents.Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use
$vector
to fetch documents by vector search. You can also use projections to include$vector
in responses.-
Python
-
TypeScript
-
Java
-
curl
When inserting or updating documents, you can use the
astrapy.data_types.DataAPIVector
class to represent and encode vectors.DataAPIVector
is a wrapper around a list of floats.from astrapy.data_types import DataAPIVector vector = DataAPIVector([.08, .68, .30])
For collections and documents, regardless of whether you use a
DataAPIVector
object or a list of floats, the vector embeddings are binary-encoded by default, which improves performance. To change the default encoding, see Serdes Options and Custom Data Types.When you read the value of a vector field or column, the client always returns a
DataAPIVector
object, unless you change the default ser/des behavior.For more information, see DataAPIVector.
When inserting or updating documents, you can use the
DataAPIVector
class to represent and encode vectors.DataAPIVector
is a wrapper around an array of floats.import { DataAPIVector } from '@datastax/astra-db-ts'; const vector = new DataAPIVector([0.4, -0.6, 0.2]);
For collections and documents, regardless of whether you use a
DataAPIVector
object or a list of floats, the vector embeddings are binary-encoded by default, which improves performance. To change the default encoding, see Custom Ser/Des.For more information, see Data types for tables: Vector type.
When inserting or updating documents, you can use the
DataAPIVector
class to represent and encode vectors.DataAPIVector
is a wrapper around an array of floats.import com.datastax.astra.client.core.vector.DataAPIVector; DataAPIVector vector = new DataAPIVector(new float[] {.1f, .2f});
When you send a
DataAPIVector
object, the vector embeddings are binary-encoded by default. DataStax recommends that you always use aDataAPIVector
object instead of a list of floats to improve performance.For more information, see DataAPIVector.
When inserting or updating documents with HTTP, you can specify the
$vector
field using either an array of floats or a Base64-encoded string with$binary
. Similarly, for vector searches, you can provide the search vector as an array of floats or use$binary
.$binary
can be more performant. For examples, see Insert documents and Find documents.If you use
$binary
, the underlying bytes must represent 32-bit floating point values in big-endian format. The byte sequence must be Base64-encoded, with=
padding if needed. For the detailed specification of the encoding, see Write vector data.When you read the value of a
$vector
field, the Data API returns either a list of floats or a binary-encoded string, depending on the format used when writing the document in the collection. -
- $vectorize
-
The
$vectorize
parameter is a reserved field that generates embeddings automatically based on a given text string.You can’t use
$vectorize
in a collection where you did not enable vectorize when you created the collection. If you want to use vectorize at all, then you must enable vectorize when you create the collection.If the collection uses vectorize, you have the option to include this parameter when you insert documents. The value of
$vectorize
is the text string from which you want to generate a document’s embedding. Make sure the vectorize text string is compliant with the embedding provider’s requirements, such as the token count. Astra DB stores the resulting vector array in$vector
(in the form of a list of floats).When you find, update, replace, or delete documents in a collection that uses vectorize, you can use
$vectorize
to fetch documents by vector search with vectorize. You can also use projections to include$vectorize
in responses.For information about vectorize integrations and troubleshooting vectorize, see Auto-generate embeddings with vectorize.
$vector
and $vectorize
are excluded by default from Data API responses.
You can use projections to include these properties in responses.
Insert non-vector data in a vector-enabled collection
To insert a document that doesn’t need an embedding, then you can omit $vector
and $vectorize
.
When using the Astra Portal to load JSON or CSV data into a collection that uses vectorize, make sure the Vector Field is set to None (no embeddings).