$vector and $vectorize in collections
When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data: $vector
and $vectorize
.
Which fields you can use depends on the collection configuration.
Embedding generation methods
When you create a collection, you decide if the collection can store structured vector data. This is known as a vector-enabled collection. For vector-enabled collections, you also decide how to provide embeddings. You must decide which options you need when you create the collection:
-
For all vector-enabled collections, you can provide embeddings when you load data (also known as bring your own embeddings).
-
You can configure the collection to automatically generate embeddings with vectorize (the
$vectorize
reserved field).You can’t use
$vectorize
in a collection where you did not enable vectorize when you created the collection. If you want to use vectorize at all, then you must enable vectorize when you create the collection. -
If you enable vectorize, you can use both options interchangeably but not simultaneously. For example, you can use vectorize to generate embeddings for a batch of documents, and then insert a few documents with pre-generated embeddings.
To bring your own embeddings to a collection that uses vectorize, when you insert a document, include the document’s embedding in the
$vector
field.It is critical that all embeddings in a collection are generated by the same model with the same dimensions, regardless of whether you use vectorize, bring your own embeddings, or both.
Astra DB only checks that the dimensions are the same; it does not produce an error if the embeddings are from different models. You must ensure that the embeddings are compatible. Using mismatched embeddings produces unreliable and incorrect results in vector searches.
-
For all vector-enabled collections, you can insert non-vector data.
Reserved fields
- $vector
-
The
$vector
parameter is a reserved field that stores vectors.To bring your own embeddings when you insert documents, include
$vector
for each document that has an embedding.If the collection uses vectorize, you have the option to omit
$vector
when you insert documents. You can use$vectorize
to generate an embedding, and then Astra DB populates the document’s$vector
field with the automatically generated embedding. Alternatively, if you want to bring your own embeddings to a collection that uses vectorize, you can include the$vector
field when you insert documents.Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use
$vector
to fetch documents by vector search. You can also use projections to include$vector
in responses.
- $vectorize
-
The
$vectorize
parameter is a reserved field that generates embeddings automatically based on a given text string.You can’t use
$vectorize
in a collection where you did not enable vectorize when you created the collection. If you want to use vectorize at all, then you must enable vectorize when you create the collection.If the collection uses vectorize, you have the option to include this parameter when you insert documents. The value of
$vectorize
is the text string from which you want to generate a document’s embedding. Make sure the vectorize text string is compliant with the embedding provider’s requirements, such a token size. Astra DB stores the resulting vector array in$vector
.When you find, update, replace, or delete documents in a collection that uses vectorize, you can use
$vectorize
to fetch documents by vector search with vectorize. You can also use projections to include$vectorize
in responses.For information about vectorize integrations and troubleshooting vectorize, see Auto-generate embeddings with vectorize.
$vector
and $vectorize
are excluded by default from Data API responses.
You can use projections to include these properties in responses.
Insert non-vector data in a vector-enabled collection
To insert a document that doesn’t need an embedding, then you can omit $vector
and $vectorize
.
When using the Astra Portal to load JSON or CSV data into a collection that uses vectorize, make sure the Vector Field is set to None (no embeddings).