Find documents
Documents represent a single row or record of data in Astra DB Serverless databases.
You use the Collection
class to work with documents through the Data API clients.
For instructions to get a Collection
object, see Work with collections.
For general information about working with documents, including common operations and operators, see the Work with documents.
For more information about the Data API and clients, see Get started with the Data API.
Find documents using filter clauses
Where you use findOne to fetch one document that matches a query, you use find
to fetch multiple documents that match a query.
Sort and filter clauses can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries. |
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:
doc_iterator = collection.find({"category": "house_appliance"}, limit=10)
Find documents using a filter operator:
document = collection.find({"tag": {"$exists": True}}, limit=10)
Iterate over the documents most similar to a given vector:
doc_iterator = collection.find(
{},
sort={"$vector": [0.55, -0.40, 0.08]},
limit=5,
)
Iterate over similar documents by running a vector search with vectorize:
doc_iterator = collection.find(
{},
sort={"$vectorize": "Text to vectorize"},
limit=5,
)
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
result = collection.find({"category": "house_appliance"}, limit=10, projection={"name": True})
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A predicate expressed as a dictionary according to the Data API filter syntax.
For example: |
|
|
Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection clauses. |
|
|
Specify a number of documents to bypass (skip) before returning documents.
The first You can use this parameter only in conjunction with an explicit |
|
|
Limit the total number of documents returned.
Once |
|
|
If true, the response includes a |
|
|
If true, the response includes the You can’t use |
|
|
Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
|
|
A timeout, in milliseconds, for each underlying HTTP request used to fetch documents as you iterate over the cursor. This method uses the collection-level timeout by default. |
Returns:
Cursor
- A cursor for iterating over documents.
AstraPy cursors are compatible with for
loops, and they provide a few additional features.
However, for vector ANN search (with $vector
or $vectorize
), the response is a single page of up to 1000 documents, unless you set a lower limit
.
If you need to materialize a list of all results, you can use A cursor, while it is consumed, transitions between |
Example response
Cursor("some_collection", idle, consumed so far: 0)
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.COLLECTION
# Find all documents in the collection
# Not advisable if a very high number of matches is anticipated
for document in collection.find({}):
print(document)
# Find all documents in the collection with a specific field value
for document in collection.find({"a": 123}):
print(document)
# Find all documents in the collection matching a compound filter expression
matches = list(collection.find({
"$and": [
{"f1": 1},
{"f2": 2},
]
}))
# Same as the preceding example, but using the implicit AND operator
matches = list(collection.find({
"f1": 1,
"f2": 2,
}))
# Use the "less than" operator in the filter expression
matches2 = list(collection.find({
"$and": [
{"name": "John"},
{"price": {"$lt": 100}},
]
}))
# Run a $vectorize search, get back the query vector along with the documents
results_ite = collection.find(
{},
projection={"*": 1},
limit=3,
include_sort_vector=True,
sort={"$vectorize": "Query text"},
)
query = results_ite.get_sort_vector()
for doc in results_ite:
print(f"{doc['$vectorize']}: {doc['$vector'][:2]}... VS. {query[:2]}...")
For more information, see the Client reference.
Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 });
Find documents using a filter operator:
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { tag: { $exists: true } });
Iterate over the documents most similar to a given vector:
const cursor = collection.find({}, { sort: { $vector: [0.55, -0.40, 0.08] }, limit: 5 });
Iterate over similar documents by running a vector search with vectorize:
const cursor = collection.find({}, { sort: { $vectorize: 'Text to vectorize' }, limit: 5 });
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { projection: { name: 1 } });
Parameters:
Name | Type | Summary |
---|---|---|
|
A filter to select the documents to find. For a list of available operators, see Data API operators. |
|
|
The options for this operation. |
Options (FindOptions
):
Name | Type | Summary |
---|---|---|
Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection clauses. When specifying a projection, make sure that you handle the return type carefully. Consider type-casting. |
||
|
If true, the response includes a |
|
|
If true, the response includes the You can’t use You can also access this through |
|
Perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
||
|
Specify a number of documents to bypass (skip) before returning documents.
The first You can use this parameter only in conjunction with an explicit |
|
|
Limit the total number of documents returned in the lifetime of the cursor.
Once |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request as you iterate over the cursor. |
Returns:
FindCursor<FoundDoc<Schema>>
- A cursor you can use to iterate over
the matching documents.
For vector ANN search (with $vector
or $vectorize
), the response is a single page of up to 1000 documents, unless you set a lower limit
.
If you need to materialize a list of all results, you can use A cursor, while it is consumed, transitions between |
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
{ name: 'Jane', age: 25, },
{ name: 'Dave', age: 40, },
]);
// Gets all 3 in some order
const unpredictable = await collection.find({}).toArray();
console.log(unpredictable);
// Failed find by name ([])
const matchless = await collection.find({ name: 'Carrie' }).toArray();
console.log(matchless);
// Find by $gt age (John, Dave)
const gtAgeCursor = collection.find({ age: { $gt: 25 } });
for await (const doc of gtAgeCursor) {
console.log(doc.name);
}
// Find by sorting by age (Jane, John, Dave)
const sortedAgeCursor = collection.find({}, { sort: { age: 1 } });
await sortedAgeCursor.forEach(console.log);
// Find first by vector similarity (John, 1)
const john = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true }).next();
console.log(john?.name, john?.$similarity);
})();
Operations on documents are performed at the Collection
level.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
// Synchronous
FindIterable<T> find(Filter filter, FindOptions options);
// Helper to build filter and options above ^
FindIterable<T> find(FindOptions options); // no filter
FindIterable<T> find(Filter filter); // default options
FindIterable<T> find(); // default options + no filters
FindIterable<T> find(float[] vector, int limit); // semantic search
FindIterable<T> find(Filter filter, float[] vector, int limit);
For more information, see [find-a-document] and the Client reference.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Criteria list to filter documents. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. |
|
Set the different options for the
|
Returns:
FindIterable<T>
- A cursor that fetches up to the first 20 documents, and it can be iterated to fetch additional documents as needed.
However, for vector ANN search (with $vector
or $vectorize
), the response is a single page of up to 1000 documents, unless you set a lower limit
.
The The You can use the |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class Find {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Find Options
FindOptions options = new FindOptions()
.projection(include("field", "field2", "field3")) // select fields
.projection(exclude("_id")) // exclude some fields
.sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}) // similarity vector
.skip(1) // skip first item
.limit(10) // stop after 10 items (max records)
.pageState("pageState") // used for pagination
.includeSimilarity(); // include similarity
// Execute a find operation
FindIterable<Document> result = collection.find(filter, options);
// Iterate over the result
for (Document document : result) {
System.out.println(document);
}
}
}
Use the find
command to retrieve multiple documents matching a query.
Retrieve documents by any property, as long as the property is covered by the collection’s indexing configuration:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": { "purchase_date": { "$date": 1690045891 } }
}
}' | jq
Retrieve documents using a filter operator:
"find": {
"filter": { "preferred_customer": { "$exists": true } }
}
More filter operator examples
Match values that are equal to the filter value:
"find": {
"filter": {
"customer": {
"$eq": {
"name": "Jasmine S.",
"city": "Jersey City"
}
}
}
}
Match values that are not the filter value:
"find": {
"filter": {
"$not": {
"customer.address.state": "NJ"
}
}
}
You can use similar $not
operators for arrays, such as $nin
an $ne
.
Match one or more of an array of specified values:
"find": {
"filter": {
"customer.address.city": {
"$in": [ "Jersey City", "Orange" ]
}
}
}
If you have only one value to match, an array is not necessary, such as { "$in": "Jersey City" }
.
The $in
operator also functions as a $contains
operator.
For example, a field containing the array [ 1, 2, 3 ]
will match filters like { "$in": [ 2, 6 ] }
or { "$in": 1 }
.
Match all specified values:
"find": {
"filter": {
"items": {
"$all": [
{
"car": "Sedan",
"color": "White"
},
"Extended warranty"
]
}
}
}
Compound and/or operators:
"find": {
"filter": {
"$and": [
{
"$or": [
{ "customer.address.city": "Jersey City" },
{ "customer.address.city": "Orange" }
]
},
{
"$or": [
{ "seller.name": "Jim A." },
{ "seller.name": "Tammy S." }
]
}
]
}
}
Compound range operators:
"find": {
"filter": {
"$and": [
{ "customer.credit_score": { "$gte": 700 } },
{ "customer.credit_score": { "$lt": 800 } }
]
}
}
Retrieve documents that are most similar to a given vector:
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"options": {
"limit": 100
}
}
Retrieve similar documents by running a vector search with vectorize:
"find": {
"sort": { "$vectorize": "I'd like some talking shoes" },
"options": {
"limit": 100
}
}
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"projection": { "$vector": 1 },
"options": {
"includeSimilarity": true,
"limit": 100
}
}
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The Data API command to retrieve multiple document in a collection based on one or more of |
|
|
An object that defines filter criteria using the Data API filter syntax.
For example: |
|
|
Perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
|
|
Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection clauses. |
|
|
If true, the response includes a
|
|
|
If true, the response includes the
You can’t use |
|
|
Specify a number of documents to bypass (skip) before returning documents.
The first You can use this parameter only in conjunction with an explicit |
|
|
Limit the total number of documents returned.
Pagination can occur if more than 20 documents are returned in the current set of matching documents.
Once the |
Returns:
A successful response can include a data
object and a status
object:
-
The
data
object containsdocuments
, which is an array of objects. Each object represents a document matching the given query. The returned fields in each document object depend on thefind
parameters, namely theprojection
andoptions
.For vector ANN search (with
$vector
or$vectorize
), the response is a single page of up to 1000 documents, unless you set a lowerlimit
.For non-vector searches, pagination can occur if there are more than 20 matching documents, as indicated by the
nextPageState
key.nextPageState
isnull
or omitted if there are no more documents or if the specifiedsort
orfilter
operation doesn’t support pagination.nextPageState
contains an ID if there are more documents to fetch.{ "data": { "documents": [ { "_id": { "$uuid": "018e65c9-df45-7913-89f8-175f28bd7f74" } }, { "_id": { "$uuid": "018e65c9-e33d-749b-9386-e848739582f0" } } ], "nextPageState": null } }
In the event of pagination, you must issue a subsequent request with a
pageState
ID to fetch the next page of documents that matched the filter. As long as there is a subsequent page with matching documents, the transaction returns anextPageState
ID, which you use as thepageState
for the subsequent request. Each paginated request is exactly the same as the original request, except for the addition of thepageState
in theoptions
object:{ "find": { "filter": { "active_user": true }, "options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" } } }
Continue issuing requests with the subsequent
pageState
ID until you have fetched all matching documents. -
The
status
object contains thesortVector
value if you setincludeSortVector
totrue
in the request:"status": { "sortVector": [0.4, 0.1, ...] }
Example:
Example of a simple property filter
This example uses a simple filter based on two document properties:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"customer.address.city": "Hoboken",
"customer.address.state": "NJ"
}
}
}' | jq
The response returned one matching document:
{
"data": {
"documents": [
{
"$vector": [
0.1,
0.15,
0.3,
0.12,
0.09
],
"_id": "17",
"amount": 54900,
"customer": {
"address": {
"address_line": "1234 Main St",
"city": "Hoboken",
"state": "NJ"
},
"age": 61,
"credit_score": 694,
"name": "Yolanda Z.",
"phone": "123-456-1177"
},
"items": [
{
"car": "Tesla Model 3",
"color": "Blue"
},
"Extended warranty - 5 years"
],
"purchase_date": {
"$date": 1702660291
},
"purchase_type": "Online",
"seller": {
"location": "Jersey City NJ",
"name": "Jim A."
},
"status": "active"
}
],
"nextPageState": null
}
}
Example of logical operators in a filter
This example uses the $and
and $or
logical operators to retrieve documents matching one condition from each $or
clause.
In this case, the customer.address.city
must be either Jersey City
or Orange
and the seller.name
must be either Jim A.
or Tammy S.
.
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"$and": [
{
"$or": [
{ "customer.address.city": "Jersey City" },
{ "customer.address.city": "Orange" }
]
},
{
"$or": [
{ "seller.name": "Jim A." },
{ "seller.name": "Tammy S." }
]
}
]
}
}
}' | jq
The response returned two matching documents:
{
"data": {
"documents": [
{
"$vector": [
0.3,
0.23,
0.15,
0.17,
0.4
],
"_id": "8",
"amount": 46900,
"customer": {
"address": {
"address_line": "1234 Main St",
"city": "Orange",
"state": "NJ"
},
"age": 29,
"credit_score": 710,
"name": "Harold S.",
"phone": "123-456-8888"
},
"items": [
{
"car": "BMW X3 SUV",
"color": "Black"
},
"Extended warranty - 5 years"
],
"purchase_date": {
"$date": 1693329091
},
"purchase_type": "In Person",
"seller": {
"location": "Staten Island NYC",
"name": "Tammy S."
},
"status": "active"
},
{
"$vector": [
0.25,
0.045,
0.38,
0.31,
0.67
],
"_id": "5",
"amount": 94990,
"customer": {
"address": {
"address_line": "32345 Main Ave",
"city": "Jersey City",
"state": "NJ"
},
"age": 50,
"credit_score": 800,
"name": "David C.",
"phone": "123-456-5555"
},
"items": [
{
"car": "Tesla Model S",
"color": "Red"
},
"Extended warranty - 5 years"
],
"purchase_date": {
"$date": 1690996291
},
"purchase_type": "Online",
"seller": {
"location": "Jersey City NJ",
"name": "Jim A."
},
"status": "active"
}
],
"nextPageState": null
}
}