Work with documents
Documents represent a single row or record of data in Astra DB Serverless databases.
You use the Collection
class to work with documents through the Data API clients.
For instructions to get a Collection
object, see Work with collections.
For more information about the Data API and clients, see Get started with the Data API.
$vector and $vectorize
When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data: $vector
and $vectorize
.
Which fields you can use depends on the collection configuration.
Embedding generation methods
When you create a collection, you decide if the collection can store structured vector data. This is known as a vector-enabled collection. For vector-enabled collections, you also decide how to provide embeddings. You must decide which options you need when you create the collection:
-
For all vector-enabled collections, you can provide embeddings when you load data (also known as bring your own embeddings).
-
You can configure the collection to automatically generate embeddings with vectorize (the
$vectorize
reserved field).You can’t use
$vectorize
in a collection where you did not enable vectorize when you created the collection. If you want to use vectorize at all, then you must enable vectorize when you create the collection. -
If you enable vectorize, you can use both options interchangeably but not simultaneously. For example, you can use vectorize to generate embeddings for a batch of documents, and then insert a few documents with pre-generated embeddings.
To bring your own embeddings to a collection that uses vectorize, when you insert a document, include the document’s embedding in the
$vector
field.It is critical that all embeddings in a collection are generated by the same model with the same dimensions, regardless of whether you use vectorize, bring your own embeddings, or both.
Astra DB only checks that the dimensions are the same; it does not produce an error if the embeddings are from different models. You must ensure that the embeddings are compatible. Using mismatched embeddings produces unreliable and incorrect results in similarity searches.
-
For all vector-enabled collections, you can insert non-vector data.
Reserved fields
- $vector
-
The
$vector
parameter is a reserved field that stores vectors.To bring your own embeddings when you insert documents, include
$vector
for each document that has an embedding.If the collection uses vectorize, you have the option to omit
$vector
when you insert documents. You can use$vectorize
to generate an embedding, and then Astra DB populates the document’s$vector
field with the automatically generated embedding. Alternatively, if you want to bring your own embeddings to a collection that uses vectorize, you can include the$vector
field when you insert documents.Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use
$vector
to fetch documents by vector search. You can also use projections to include$vector
in responses.
- $vectorize
-
The
$vectorize
parameter is a reserved field that generates embeddings automatically based on a given text string.You can’t use
$vectorize
in a collection where you did not enable vectorize when you created the collection. If you want to use vectorize at all, then you must enable vectorize when you create the collection.If the collection uses vectorize, you have the option to include this parameter when you insert documents. The value of
$vectorize
is the text string from which you want to generate a document’s embedding. Make sure the vectorize text string is compliant with the embedding provider’s requirements, such a token size. Astra DB stores the resulting vector array in$vector
.When you find, update, replace, or delete documents in a collection that uses vectorize, you can use
$vectorize
to fetch documents by vector search with vectorize. You can also use projections to include$vectorize
in responses.For information about vectorize integrations and troubleshooting vectorize, see Auto-generate embeddings with vectorize.
$vector
and $vectorize
are excluded by default from Data API responses.
You can use projections to include these properties in responses.
Insert non-vector data in a vector-enabled collection
To insert a document that doesn’t need an embedding, then you can omit $vector
and $vectorize
.
When using the Astra Portal to load JSON or CSV data into a collection that uses vectorize, make sure the Vector Field is set to None (no embeddings).
$date
-
Python
-
TypeScript
-
Java
-
curl
The handling of datetime objects, with particular emphasis on usage of naive (i.e. timezone-unaware) datetimes, changed in the Python client version 2.0-preview. If you are using client version 2.0-preview or later, see the description of this change in Data API client upgrade guide. |
Date and datetime objects are instances of the Python standard library datetime.datetime
and datetime.date
classes that you can use anywhere in documents.
The following example uses dates in insert
, update
, and find
commands.
Read operations from a collection always return the datetime
class, regardless of whether the original command used date
or datetime
.
import datetime
from astrapy import DataAPIClient
from astrapy.ids import ObjectId, uuid8, UUID
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
# Insert documents containing date and datetime values:
collection.insert_one({"when": datetime.datetime.now()})
collection.insert_one({"date_of_birth": datetime.date(2000, 1, 1)})
collection.insert_one({"registered_at": datetime.date(1999, 11, 14)})
# Update a document, using a date in the filter:
collection.update_one(
{"registered_at": datetime.date(1999, 11, 14)},
{"$set": {"message": "happy Sunday!"}},
)
# Update a document, setting "last_reviewed" to the current date:
collection.update_one(
{"date_of_birth": {"$exists": True}},
{"$currentDate": {"last_reviewed": True}},
)
# Find documents by inequality on a date value:
print(
collection.find_one(
{"date_of_birth": {"$lt": datetime.date(2001, 1, 1)}},
projection={"_id": False},
)
)
# will print:
# {'date_of_birth': datetime.datetime(2000, 1, 1, 0, 0), 'last_reviewed': datetime.datetime(...now...)}
You can use standard JS Date
objects anywhere in documents to represent dates and times.
Read operations also return Date
objects for document fields stored using { $date: number }
.
The following example uses dates in insert
, update
, and find
commands:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
(async function () {
// Create an untyped collection
const collection = await db.createCollection('dates_test');
// Insert documents with some dates
await collection.insertOne({ dateOfBirth: new Date(1394104654000) });
await collection.insertOne({ dateOfBirth: new Date('1863-05-28') });
// Update a document with a date and setting lastModified to now
await collection.updateOne(
{
dateOfBirth: new Date('1863-05-28'),
},
{
$set: { message: 'Happy Birthday!' },
$currentDate: { lastModified: true },
},
);
// Will print around new Date()
const found = await collection.findOne({ dateOfBirth: { $lt: new Date('1900-01-01') } });
console.log(found?.lastModified);
})();
The Data API uses the ejson
standard to represents time-related objects.
The Java client introduces custom serializers as three types of objects: java.util.Date
, java.util.Calendar
, java.util.Instant
.
You can use these objects in documents as well as filter clauses and update clauses.
The following example uses dates in insert
, update
, and find
commands:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.Projections;
import java.time.Instant;
import java.util.Calendar;
import java.util.Date;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Updates.set;
public class WorkingWithDates {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
Calendar c = Calendar.getInstance();
collection.insertOne(new Document().append("registered_at", c));
collection.insertOne(new Document().append("date_of_birth", new Date()));
collection.insertOne(new Document().append("just_a_date", Instant.now()));
collection.updateOne(
eq("registered_at", c), // filter clause
set("message", "happy Sunday!")); // update clause
collection.findOne(
lt("date_of_birth", new Date(System.currentTimeMillis() - 1000 * 1000)),
new FindOneOptions().projection(Projections.exclude("_id")));
}
}
You can use $date
to represent dates as Unix timestamps in the JSON payload of a Data API command:
"date_of_birth": { "$date": 1690045891 }
The following example includes a date in an insertOne
command:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"date_of_birth": { "$date": 1690045891 }
}
}
}' | jq
The following example uses the date to find and update a document with the updateOne
command:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"updateOne": {
"filter": {
"date_of_birth": { "$date": 1690045891 }
},
"update": { "$set": { "message": "Happy birthday!" } }
}
}' | jq
The following example uses the $currentDate
update operator to set a property to the current date:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"findOneAndUpdate": {
"filter": { "_id": "doc1" },
"update": {
"$currentDate": {
"createdAt": true
}
}
}
}' | jq
Document IDs
Documents in a collection are always identified by an ID that is unique within the collection.
This identifier is stored in the reserved field _id
.
There are multiple types of document identifiers, such as string, integer, or datetime; however, the uuid
and ObjectId
types are recommended.
The Data API supports uuid
identifiers up to version 8 and ObjectId
identifiers as provided by the bson
library.
When you create a collection, you can set a default ID type that specifies how the Data API generates an _id
for any document that doesn’t have an explicit _id
field when you insert it into the collection.
If you explicitly define a document’s _id
, such as "_id": "12"
, then the server uses this value instead of generating an ID.
If explicitly defined, the _id
field must be a top-level document property.
_id
cannot be nested within another property.
Regardless of the defaultId
setting, the Data API honors document identifiers of any type, anywhere in a document, that you explicitly provide at any time:
-
You can include identifiers anywhere in a document, not only in the
_id
field. -
You can include different types of identifiers in different parts of the same document.
-
You can define identifiers at any time, such as when inserting or updating a document.
-
You can use any of a document’s identifiers for filter clauses and update/replace operations, just like any other data type.
-
Python
-
TypeScript
-
Java
-
curl
The Python client recognizes uuid
versions 1 and 3 through 8, as provided by the uuid
and uuid6
Python libraries.
The Python client also recognizes the ObjectId
from the bson
package.
For convenience, these utilities are exposed in AstraPy directly:
from astrapy.ids import (
ObjectId,
uuid1,
uuid3,
uuid4,
uuid5,
uuid6,
uuid7,
uuid8,
UUID,
)
You can generate new identifiers with statements such as new_id = uuid8()
or new_obj_id = ObjectId()
:
from astrapy import DataAPIClient
from astrapy.ids import ObjectId, uuid8, UUID
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
collection.insert_one({"_id": uuid8(), "tag": "new_id_v_8"})
collection.insert_one(
{"_id": UUID("018e77bc-648d-8795-a0e2-1cad0fdd53f5"), "tag": "id_v_8"}
)
collection.insert_one({"id": ObjectId(), "tag": "new_obj_id"})
collection.insert_one(
{"id": ObjectId("6601fb0f83ffc5f51ba22b88"), "tag": "obj_id"}
)
collection.find_one_and_update(
{"_id": ObjectId("6601fb0f83ffc5f51ba22b88")},
{"$set": {"item_inventory_id": UUID("1eeeaf80-e333-6613-b42f-f739b95106e6")}},
)
All uuid
versions are instances of the UUID
class, which exposes a version
property, if you need to access it.
To use and generate identifiers, astra-db-ts provides the UUID
and ObjectId
classes.
These are not the same as those exported from the bson
or uuid
libraries.
Instead, these are custom classes that you must import from the astra-db-ts
package:
import { UUID, ObjectId } from '@datastax/astra-db-ts';
To generate new identifiers, you can use UUID.v4()
, UUID.v7()
, or new ObjectId()
:
import { DataAPIClient, UUID, ObjectId } from '@datastax/astra-db-ts';
// Schema for the collection
interface Person {
_id: UUID | ObjectId;
name: string;
friendId?: UUID;
}
// Reference the DB instance
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
(async function () {
// Create the collection
const collection = await db.createCollection<Person>('people');
// Insert documents w/ various IDs
await collection.insertOne({ name: 'John', _id: UUID.v4() });
await collection.insertOne({ name: 'Jane', _id: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') });
await collection.insertOne({ name: 'Dan', _id: new ObjectId()});
await collection.insertOne({ name: 'Tim', _id: new ObjectId('65fd9b52d7fabba03349d013') });
// Update a document with a UUID in a non-_id field
await collection.updateOne(
{ name: 'John' },
{ $set: { friendId: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') } },
);
// Find a document by a UUID in a non-_id field
const john = await collection.findOne({ name: 'John' });
const jane = await collection.findOne({ _id: john!.friendId });
// Prints 'Jane 016b1cac-14ce-660e-8974-026c927b9b91 6'
console.log(jane?.name, jane?._id.toString(), (<UUID>jane?._id).version);
})();
All UUID methods return an instance of the same class, which exposes a version
property, if you need to access it.
UUIDs can also be constructed from a string representation of the IDs, if you want to use custom generation.
The Java client defines dedicated classes to support different implementations of UUID
, particularly v6 and v7.
When a unique identifier is retrieved from the server, it is returned as a uuid
, and then it is converted to the appropriate UUID
class, based on the class definition in the defaultId option.
ObjectId
classes are extracted from the BSON package, and they represent the ObjectId
type.
UUIDs from the Java UUID
class are implemented in the UUID v4 standard.
To generate new identifiers, you can use methods like new UUIDv6()
, new UUIDv7()
, or new ObjectId()
:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.ObjectId;
import com.datastax.astra.client.model.UUIDv6;
import com.datastax.astra.client.model.UUIDv7;
import java.time.Instant;
import java.util.UUID;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Updates.set;
public class WorkingWithDocumentIds {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Ids can be different Json scalar
// ('defaultId' options NOT set for collection)
new Document().id("abc");
new Document().id(123);
new Document().id(Instant.now());
// Working with UUIDv4
new Document().id(UUID.randomUUID());
// Working with UUIDv6
collection.insertOne(new Document().id(new UUIDv6()).append("tag", "new_id_v_6"));
UUID uuidv4 = UUID.fromString("018e77bc-648d-8795-a0e2-1cad0fdd53f5");
collection.insertOne(new Document().id(new UUIDv6(uuidv4)).append("tag", "id_v_8"));
// Working with UUIDv7
collection.insertOne(new Document().id(new UUIDv7()).append("tag", "new_id_v_7"));
// Working with ObjectIds
collection.insertOne(new Document().id(new ObjectId()).append("tag", "obj_id"));
collection.insertOne(new Document().id(new ObjectId("6601fb0f83ffc5f51ba22b88")).append("tag", "obj_id"));
collection.findOneAndUpdate(
eq((new ObjectId("6601fb0f83ffc5f51ba22b88"))),
set("item_inventory_id", UUID.fromString("1eeeaf80-e333-6613-b42f-f739b95106e6")));
}
}
When you insert a document, you can omit _id
to automatically generate an ID or you can manually specify an _id
, such as "_id": "12"
.
The following example inserts two documents with manually-defined _id
values.
One document uses the objectId
type, and the other uses the uuid
type.
"insertMany": {
"documents": [
{
"_id": { "$objectId": "6672e1cbd7fabb4e5493916f" },
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"key": "value",
"amount": 53990
},
{
"_id": { "$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739" },
"$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
"key": "value",
"amount": 4600
}
]
}
When you add or update a document, you can include additional identifiers in any document property, other than _id
, just as you would any other data type.
Sort clauses
Sort and filter clauses can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries. |
Data API commands, such as find
, findOne
, deleteOne
, updateOne
, and so on, can use sort
clauses to organize results based on similarity, or dissimilarity, to the given filter, such as a vector or field.
Additionally, you can use a projection to include specific document properties in the response.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
-
Python
-
TypeScript
-
Java
-
curl
|
When no particular order is required:
sort={} # (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
from astrapy.constants import SortDocuments
# Ascending sort
sort={"field": SortDocuments.ASCENDING}
# Descending sort
sort={"field": SortDocuments.DESCENDING}
Be aware of the order when chaining multiple sorts. For example, when sorting first by a specific field and then by a specific subfield:
sort={
"field": SortDocuments.ASCENDING,
"subfield": SortDocuments.ASCENDING,
}
While modern Python versions preserve the order of dictionaries, it is suggested for clarity to employ a collections.OrderedDict
with chained sorts.
You can use sort
to perform a vector similarity (ANN) search:
# Use the specified vector,
# And then sort by similarity to the given vector.
sort={"$vector": [0.4, 0.15, -0.5]}
# Generate a vector from a string,
# Run a similarity search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"$vectorize": "Text to vectorize"}
Sort example
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
print(doc["seq"])
...
# will print e.g.:
# 37
# 35
# 10
# 36
# 27
cursor1 = collection.find(
{},
limit=4,
sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
{"tag": "A", "$vector": [4, 5]},
{"tag": "B", "$vector": [3, 4]},
{"tag": "C", "$vector": [3, 2]},
{"tag": "D", "$vector": [4, 1]},
{"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
document["tag"]
for document in collection.find(
{},
sort={"$vector": [3, 3]},
limit=3,
)
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)
|
|
When no particular order is required:
{ sort: {} } // (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
{ sort: { field: +1 } } // ascending
{ sort: { field: -1 } } // descending
Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion For example, when sorting first by a field and then by a specific subfield:
{ sort: { field: 1, subfield: 1 } }
You can use sort
to perform a vector similarity (ANN) search:
// Use the specified vector,
// And then sort by similarity to the given vector.
{ sort: { $vector: [0.4, 0.15, -0.5] } }
// Generate a vector from a string,
// Run a similarity search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration
{ sort: { $vectorize: "Text to vectorize" } }
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'Jane', age: 25, $vector: [1.0, 1.0, 1.0, 1.0, 1.0] },
{ name: 'Dave', age: 40, $vector: [0.4, 0.5, 0.6, 0.7, 0.8] },
{ name: 'Jack', age: 40, $vector: [0.1, 0.9, 0.0, 0.5, 0.7] },
]);
// Sort by age ascending, then by name descending (Jane, Jack, Dave)
const sorted1 = await collection.find({}, { sort: { age: 1, name: -1 } }).toArray();
console.log(sorted1.map(d => d.name));
// Sort by vector distance (Jane, Dave, Jack)
const sorted2 = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] } }).toArray();
console.log(sorted2.map(d => d.name));
})();
|
The sort()
operations are optional.
Use them only when needed.
Be aware of the order when chaining multiple sorts:
Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
FindOptions.Builder.sort(s1, s2);
You can use sort
to perform a vector similarity (ANN) search:
// Use the specified vector,
// And then sort by similarity to the given vector.
FindOptions.Builder
.sort(new float[] {0.4f, 0.15f, -0.5f});
// Generate a vector from a string,
// Run a similarity search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration
FindOptions.Builder
.sort("Text to vectorize");
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sort;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
public class WorkingWithSorts {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Sort Clause for a vector
Sorts.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;
// Sort Clause for other fields
Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
// Build the sort clause
new FindOptions().sort(s1, s2);
// Adding vector
new FindOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}, s1, s2);
}
}
|
When you run a Find command, you can append nested JSON objects that define the search criteria (sort
or filter
), projection
, and other options
.
If no particular order is required, you can search with an empty filter
:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {},
}
}' | jq
This example finds documents by performing a vector similarity search:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"projection": { "$vector": 1 },
"options": {
"includeSimilarity": true,
"includeSortVector": false,
"limit": 100
}
}
}' | jq
This request does the following:
-
sort
compares the given vector,[0.15, 0.1, 0.1, 0.35, 0.55]
, against the vectors for documents in the collection, and then returns results ranked by similarity. The$vector
key is a reserved property name for storing vector data. -
projection
requests that the response return the$vector
for each document. -
options.includeSimilarity
requests that the response include the$similarity
key with the numeric similarity score, which represents the closeness of thesort
vector and the document’s vector. -
options.includeSortVector
is set to false to exclude thesortVector
from the response. This is only relevant ifsort
includes either$vector
or$vectorize
and you want the response to include the sort vector. This is particularly useful with$vectorize
because you don’t know the sort vector in advance. -
options.limit
specifies the maximum number of documents to return. This example limits the entire list of matching documents to 100 documents or less.Vector search returns a single page of up to 1000 documents, unless you set a lower
limit
. Other searches (without$vector
or$vectorize
) return matching documents in batches of 20. Pagination occurs if there are more than 20 matching documents. For information about handling pagination, see Find documents using filter clauses.
The projection
and options
settings can make the response more focused and potentially reduce the amount of data transferred.
Response
{
"data": {
"documents": [
{
"$similarity": 1,
"$vector": [
0.15,
0.1,
0.1,
0.35,
0.55
],
"_id": "3"
},
{
"$similarity": 0.9953563,
"$vector": [
0.15,
0.17,
0.15,
0.43,
0.55
],
"_id": "18"
},
{
"$similarity": 0.9732053,
"$vector": [
0.21,
0.22,
0.33,
0.44,
0.53
],
"_id": "21"
}
],
"nextPageState": null
}
}
Projection clauses
Certain document operations, such as findOne
, find
, findOneAndUpdate
, findOneAndReplace
, and findOneAndDelete
, support a projection
option that specifies which part of a document to return.
Typically, the projection specifies which fields to include or exclude.
If projection
is empty or unspecified, the Data API applies the default projection.
For documents, the default projection includes, at minimum, the document identifier (_id
) and all regular fields, which are fields not prefixed by a dollar sign ($
).
If you specify a projection, all special fields, such as _id
, $vector
, and $vectorize
, have specific inclusion and exclusion defaults that you can override individually.
However, for regular fields, the projection must either include or exclude those fields.
The projection can’t define a mix of included and excluded regular fields.
If a projection includes fields that don’t exist in a returned document, then those fields are ignored for that document.
In order to optimize the response size and improve read performance, DataStax recommends always providing an explicit projection tailored to the needs of the application. If an application relies on the presence of A quick, but possibly suboptimal, way to ensure the presence of special fields is to use the wildcard projection |
Projection syntax
A projection is expressed as a mapping of field names to boolean values.
Use true
mapping to include only the specified fields.
For example, the following true
mapping returns the document ID, field1
, and field2
:
{ "_id": true, "field1": true, "field2": true }
Alternatively, use a false
mapping to exclude the specified fields.
All other non-excluded fields are returned.
{ "field1": false, "field2": false }
The values in a projection map can be objects, booleans, decimals, or integers, but the Data API ultimately evaluates all of these as booleans.
For example, the following projection evaluates to true
(include) for all four fields:
{ "field1": true, "field2": 1, "field3": 90.0, "field4": { "keep": "yes!" } }
Whereas this projection evaluates to false
(exclude) for all four fields:
{ "field1": false, "field2": 0, "field3": 0.0, "field4": {} }
Passing null-like types (such as {}
, null
or 0
) for the whole projection
mapping is equivalent to omitting projection
.
Projecting regular and special fields
For regular fields, a projection can’t mix include and exclude projections.
It can contain only true
or only false
values for regular fields.
For example, {"field1": true, "field2": false}
is an invalid projection that results in an API error.
However, the special fields _id
, $vector
, and $vectorize
have individual default inclusion and exclusion rules, regardless of the projection mapping.
Unlike regular fields, you can set the projection values for special fields independently of regular fields:
-
The
_id
field is included by default. You can opt to exclude it in atrue
mapping, such as{ "_id": false, "field1": true }
. -
The
$vector
and$vectorize
fields are excluded by default. You can opt to include these in afalse
mapping, such as{ "field1": false, "$vector": true }
. -
The
$similarity
key isn’t a document field, and you can’t use this key in a projection. The$similarity
value is the result of a vector ANN search operation with$vector
or$vectorize
. Use theincludeSimilarity
parameter to control the presence of$similarity
in the response.
Therefore, the following are all valid projections for regular and special fields:
{ "_id": true, "field1": true, "field2": true }
{ "_id": false, "field1": true, "field2": true }
{ "_id": false, "field1": false, "field2": false }
{ "_id": true, "field1": false, "field2": false }
{ "_id": true, "field1": true, "field2": true, "$vector": true }
{ "_id": true, "field1": true, "field2": true, "$vector": false }
{ "_id": false, "field1": true, "field2": true, "$vector": true }
{ "_id": false, "field1": true, "field2": true, "$vector": false }
{ "_id": false, "field1": false, "field2": false, "$vector": true }
{ "_id": false, "field1": false, "field2": false, "$vector": false }
{ "_id": true, "field1": false, "field2": false, "$vector": true }
{ "_id": true, "field1": false, "field2": false, "$vector": false }
The wildcard projection "*"
represents the whole of the document.
If you use this projection, it must be the only key in the projection.
If set to true ({ "*": true }
), all fields are returned.
If set to false ({ "*": false }
), no fields are returned, and each document is empty ({}
).
Projecting arrays and nested objects
For array fields, you can use a $slice
to specify which elements of the array to return.
Use one of the following formats:
// Return the first two elements
{ "arr": { "$slice": 2 } }
// Return the last two elements
{ "arr": { "$slice": -2 } }
// Skip 4 elements (from 0th index), return the next 2
{ "arr": { "$slice": [4, 2] } }
// Skip backward 4 elements (from the end), return next 2 elements (forward)
{ "arr": { "$slice": [-4, 2] } }
If a projection refers to a nested field, the keys in the subdocument are includes or excluded as requested. If you exclude all keys of an existing subdocument, then the document is returned with the subdocument present and an empty nested object.
Examples of nested document projections
Given the following document:
{
"_id": "z",
"a": {
"a1": 10,
"a2": 20
}
}
The results of various projections are as follows:
Projection | Result |
---|---|
|
|
|
|
|
|
|
|
|
|
Referencing overlapping paths or subpaths in a projection can create conflicting clauses and return an API error. For example, this projection is invalid:
// Invalid:
{ "a.a1": true, "a": true }
Projection examples by language
-
Python
-
TypeScript
-
Java
-
curl
For the Python client, the projection can be any of the following:
-
A dictionary (
Dict[str, Any]
) to include specific fields in the response, like{field_name: True}
. -
A dictionary (
Dict[str, Any]
) to exclude specific fields from the response, like{field_name: False}
. -
A list or other iterable over key names that are implied to be included in the projection.
The following two projections are equivalent:
document = collection.find_one(
{"_id": 101},
projection={"name": True, "city": True},
)
document = collection.find_one(
{"_id": 101},
projection=["name", "city"],
)
For information about default projections and handling for special fields, see the preceding explanation of projection clauses.
The TypeScript client takes in an untyped Plain Old JavaScript Object (POJO) for the projection
parameter.
The client also offers a StrictProjection<Schema>
type that provides full autocomplete and type checking for your document schema.
When specifying a projection, make sure that you handle the return type carefully. Consider type-casting.
import { StrictProjection } from '@datastax/astra-db-ts';
const doc = await collection.findOne({}, {
projection: {
'name': true,
'address.city': true,
},
});
interface MySchema {
name: string,
address: {
city: string,
state: string,
},
}
const doc = await collection.findOne({}, {
projection: {
'name': 1,
'address.city': 1,
// @ts-expect-error - `'address.car'` does not exist in type `StrictProjection<MySchema>`
'address.car': 0,
// @ts-expect-error - Type `{ $slice: number }` is not assignable to type `boolean | 0 | 1 | undefined`
'address.state': { $slice: 3 }
} satisfies StrictProjection<MySchema>,
});
For information about default projections and handling for special fields, see the preceding explanation of projection clauses.
To support the projection mechanism, the Java client has different Options
classes that provide the projection
method in the helpers.
This method takes an array of Projection
classes with the field name and a boolean flag indicating inclusion or exclusion.
Projection p1 = new Projection("field1", true);
Projection p2 = new Projection("field2", true);
FindOptions options1 = FindOptions.Builder.projection(p1, p2);
To simplify this syntax, you can use the Projections
syntactic sugar:
FindOptions options2 = FindOptions.Builder
.projection(Projections.include("field1", "field2"));
FindOptions options3 = FindOptions.Builder
.projection(Projections.exclude("field1", "field2"));
The Projection
class also provides a method to support $slice
for array fields:
// {"arr": {"$slice": 2}}
Projection sliceOnlyStart = Projections.slice("arr", 2, null);
// {"arr": {"$slice": [-4, 2]}}
Projection sliceOnlyRange =Projections.slice("arr", -4, 2);
// An you can use then freely in the different builders
FindOptions options4 = FindOptions.Builder
.projection(sliceOnlyStart);
For information about default projections and handling for special fields, see the preceding explanation of projection clauses.
In an HTTP request, include projection
as a find
parameter:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"projection": { "$vector": true, "name": true, "city": true }
"options": {
"includeSimilarity": true,
"includeSortVector": false,
"limit": 100
}
}
}' | jq
For information about default projections and handling for special fields, see the preceding explanation of projection clauses.
Operators
Data API provides query and update operators that you can use in filters to find, update, replace, and delete documents:
Operator type | Name | Purpose |
---|---|---|
Logical query |
|
Joins query clauses with a logical |
|
Joins query clauses with a logical |
|
|
Returns documents that do not match the conditions of the filter clause. |
|
Range query |
|
Matches documents where the given property is greater than the specified value. |
|
Matches documents where the given property is greater than or equal to the specified value. |
|
|
Matches documents where the given property is less than the specified value. |
|
|
Matches documents where the given property is less than or equal to the specified value. |
|
Comparison query |
|
Matches documents where the value of a property equals the specified value. This is the default when you do not specify an operator. |
|
Matches documents where the value of a property does not equal the specified value. |
|
|
Match one or more of an array of specified values.
For example, If you have only one value to match, an array is not necessary, such as The |
|
|
Matches any of the values that are NOT IN the array. |
|
Element query |
|
Matches documents that have the specified property. |
Array query |
|
Matches arrays that contain all elements in the specified array. |
|
Selects documents where the array has the specified number of elements. |
|
Property update |
|
Used in an update operation to set a property to the current date. |
|
Increments the value of the property by the specified amount. |
|
|
Updates the property only if the specified value is less than the existing property value. |
|
|
Updates the property only if the specified value is greater than the existing property value. |
|
|
Multiply the value of a property in the document. |
|
|
Renames the specified property in each matching document. |
|
|
Sets the value of a property in each matching document. |
|
|
Set the value of a property in the document if an upsert is performed. |
|
|
Removes the specified property from each matching document. |
|
Array update |
|
Adds elements to the array only if they do not already exist in the set. You can use |
|
Removes the first or last item of the array, depending on the value of the operator.
Use |
|
|
Adds or appends data to the end of the property value. If the value is not yet an array and the property has no value, this operator creates a one-element array that contains the given item. If the value is not yet an array and the property has a non-array value, this operator creates a two-element array that has the existing value as the first entry and the given item as the second entry. You can use |
|
|
Modify the |
|
|
Modify the Use For an example, see the curl tab for Find and update a document. |