Find documents reference
Documents represent a single row or record of data in Astra DB Serverless databases.
You use the Collection
class to work with documents through the Data API.
For instructions to get a Collection
object, see the Collections reference.
Astra DB APIs use the term keyspace to refer to both namespaces and keyspaces. |
For general information about working with documents, including common operations and operators, see the Documents reference.
Prerequisites
-
Review the prerequisites and other information in Intro to Astra DB APIs.
-
Create a Serverless (Vector) database.
-
Learn how to instantiate a
DataAPIClient
object and connect to your database.
Find a document
Retrieve a single document from a collection using various filter and query options.
Sort and filter operations can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries. |
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Retrieve a single document from a collection by its _id
:
document = collection.find_one({"_id": 101})
Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:
document = collection.find_one({"location": "warehouse_C"})
Retrieve a single document from a collection by an arbitrary filtering clause:
document = collection.find_one({"tag": {"$exists": True}})
Retrieve the document that is most similar to a given vector:
result = collection.find_one({}, sort={"$vector": [.12, .52, .32]})
Retrieve the most similar document by running a vector search with vectorize:
result = collection.find_one({}, sort={"$vectorize": "Text to vectorize"})
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
result = collection.find_one({"_id": 101}, projection={"name": True})
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A predicate expressed as a dictionary according to the Data API filter syntax.
For example: |
|
|
Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations. |
|
|
If true, the response includes a |
|
|
Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
|
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Returns:
Union[Dict[str, Any], None]
- Either the found document as a dictionary or None
if no matching document is found.
Example response
{'_id': 101, 'name': 'John Doe', '$vector': [0.12, 0.52, 0.32]}
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
collection.find_one()
# prints: {'_id': '68d1e515-...', 'seq': 37}
collection.find_one({"seq": 10})
# prints: {'_id': 'd560e217-...', 'seq': 10}
collection.find_one({"seq": 1011})
# (returns None for no matches)
collection.find_one(projection={"seq": False})
# prints: {'_id': '68d1e515-...'}
collection.find_one(
{},
sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: {'_id': '97e85f81-...', 'seq': 69}
collection.find_one(sort={"$vector": [1, 0]}, projection={"*": True})
# prints: {'_id': '...', 'tag': 'D', '$vector': [4.0, 1.0]}
For more information, see the Client reference.
Retrieve a single document from a collection by its _id
:
const doc = await collection.findOne({ _id: '101' });
Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:
const doc = await collection.findOne({ location: 'warehouse_C' });
Retrieve a single document from a collection by an arbitrary filtering clause:
const doc = await collection.findOne({ tag: { $exists: true } });
Retrieve the document that is most similar to a given vector:
const doc = await collection.findOne({}, { sort: { $vector: [.12, .52, .32] } });
Retrieve the most similar document by running a vector search with vectorize:
const doc = await collection.findOne({}, { sort: { $vectorize: 'Text to vectorize' } });
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
const doc = await collection.findOne({ _id: '101' }, { projection: { name: 1 } });
Parameters:
Name | Type | Summary |
---|---|---|
|
A filter to select the document to find. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options. |
|
|
The options for this operation. |
Options (FindOneOptions
):
Name | Type | Summary |
---|---|---|
Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations. When specifying a projection, make sure that you handle the return type carefully. Consider type-casting. |
||
|
If true, the response includes a |
|
Perform a vector similarity search or set the order in which documents are returned.
For similarity searches, |
||
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<FoundDoc<Schema> | null>
- A promise that resolves
to the found document (inc. $similarity
if applicable), or null
if no matching document is found.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
{ name: 'Jane', age: 25, },
{ name: 'Dave', age: 40, },
]);
// Unpredictably prints one of their names
const unpredictable = await collection.findOne({});
console.log(unpredictable?.name);
// Failed find by name (null)
const failed = await collection.findOne({ name: 'Carrie' });
console.log(failed);
// Find by $gt age (Dave)
const dave = await collection.findOne({ age: { $gt: 30 } });
console.log(dave?.name);
// Find by sorting by age (Jane)
const jane = await collection.findOne({}, { sort: { age: 1 } });
console.log(jane?.name);
// Find by vector similarity (John, 1)
const john = await collection.findOne({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true });
console.log(john?.name, john?.$similarity);
})();
Operations on documents are performed at the Collection
level.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
For more information, see the Client reference.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
// Synchronous
Optional<T> findOne(Filter filter);
Optional<T> findOne(Filter filter, FindOneOptions options);
Optional<T> findById(Object id); // build the filter for you
// Asynchronous
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter);
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter, FindOneOptions options);
CompletableFuture<Optional<DOC>> findByIdAsync(Filter filter);
You can retrieve documents in various ways, for example:
-
Retrieve a single document from a collection by its
_id
. -
Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration.
-
Retrieve a single document from a collection by an arbitrary filtering clause.
-
Retrieve the document that is most similar to a given vector.
-
Retrieve the most similar document by running a vector search with vectorize.
Additionally, you can use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
In the underlying HTTP request, a filter
is a JSON object containing filter and sort parameters, for example:
{
"findOne": {
"filter": {
"$and": [
{ "field2": { "$gt": 10 } },
{ "field3": { "$lt": 20 } },
{ "field4": { "$eq": "value" } }
]
},
"projection": {
"_id": 0,
"field": 1,
"field2": 1,
"field3": 1
},
"sort": {
"$vector": [0.25, 0.25, 0.25,0.25, 0.25]
},
"options": {
"includeSimilarity": true
}
}
}
You can define the preceding JSON object in Java as follows:
collection.findOne(
Filters.and(
Filters.gt("field2", 10),
Filters.lt("field3", 20),
Filters.eq("field4", "value")
),
new FindOneOptions()
.projection(Projections.include("field", "field2", "field3"))
.projection(Projections.exclude("_id"))
.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.includeSimilarity()
)
);
// with the import Static Magic
collection.findOne(
and(
gt("field2", 10),
lt("field3", 20),
eq("field4", "value")
),
vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.projection(Projections.include("field", "field2", "field3"))
.projection(Projections.exclude("_id"))
.includeSimilarity()
);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options. |
|
Set the different options for the
|
Returns:
Optional<T>
- Return the working document matching the filter or Optional.empty()
if no document is found.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneOptions;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.and;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.gt;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class FindOne {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Complete FindOne
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
FindOneOptions options = new FindOneOptions()
.projection(include("field", "field2", "field3"))
.projection(exclude("_id"))
.sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.includeSimilarity();
Optional<Document> result = collection.findOne(filter, options);
// with the import Static Magic
collection.findOne(and(
gt("field2", 10),
lt("field3", 20),
eq("field4", "value")),
new FindOneOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.projection(include("field", "field2", "field3"))
.projection(exclude("_id"))
.includeSimilarity()
);
// find one with a vectorize
collection.findOne(and(
gt("field2", 10),
lt("field3", 20),
eq("field4", "value")),
new FindOneOptions().sort("Life is too short to be living somebody else's dream.")
.projection(include("field", "field2", "field3"))
.projection(exclude("_id"))
.includeSimilarity()
);
collection.insertOne(new Document()
.append("field", "value")
.append("field2", 15)
.append("field3", 15)
.vectorize("Life is too short to be living somebody else's dream."));
}
}
Use the findOne
command to retrieve a document.
Retrieve a single document from a collection by its _id
:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"findOne": {
"filter": { "_id": "018e65c9-df45-7913-89f8-175f28bd7f74" }
}
}' | jq
Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:
"findOne": {
"filter": { "purchase_date": { "$date": 1690045891 } }
}
Retrieve a single document from a collection by an arbitrary filtering clause:
"findOne": {
"filter": { "preferred_customer": { "$exists": true } }
}
Retrieve the document that is most similar to a given vector:
"findOne": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] }
}
Retrieve the most similar document by running a vector search with vectorize:
"findOne": {
"sort": { "$vectorize": "I'd like some talking shoes" }
}
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
"findOne": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"projection": { "$vector": 1 }
}
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The Data API command to retrieve a document in a collection based on one or more of |
|
|
An object that defines filter criteria using the Data API filter syntax.
For example: |
|
|
Perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
|
|
Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations. |
|
|
If true, the response includes a
|
Returns:
A successful response includes a data
object that contains a document
object representing the document matching the given query.
The returned document
fields depend on the findOne
parameters, namely the projection
and options
.
"data": {
"document": {
"_id": "14"
}
}
Example:
This request retrieves a document from a collection by its _id
with the default projection
and options
:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"findOne": {
"filter": { "_id": "14" }
}
}' | jq
The response contains the document’s _id
and all regular fields.
The default projection excludes $vector
and $vectorize
.
{
"data": {
"document": {
"_id": "14",
"amount": 110400,
"customer": {
"address": {
"address_line": "1414 14th Pl",
"city": "Brooklyn",
"state": "NY"
},
"age": 44,
"credit_score": 702,
"name": "Kris S.",
"phone": "123-456-1144"
},
"items": [
{
"car": "Tesla Model X",
"color": "White"
}
],
"purchase_date": {
"$date": 1698513091
},
"purchase_type": "In Person",
"seller": {
"location": "Brooklyn NYC",
"name": "Jasmine S."
}
}
}
}
Find documents using filtering options
Where you use findOne to fetch one document that matches a query, you use find
to fetch multiple documents that match a query.
Sort and filter operations can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries. |
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:
doc_iterator = collection.find({"category": "house_appliance"}, limit=10)
Find documents matching a filter operator:
document = collection.find({"tag": {"$exists": True}}, limit=10)
Iterate over the documents most similar to a given vector:
doc_iterator = collection.find(
{},
sort={"$vector": [0.55, -0.40, 0.08]},
limit=5,
)
Iterate over similar documents by running a vector search with vectorize:
doc_iterator = collection.find(
{},
sort={"$vectorize": "Text to vectorize"},
limit=5,
)
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
result = collection.find({"category": "house_appliance"}, limit=10, projection={"name": True})
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A predicate expressed as a dictionary according to the Data API filter syntax.
For example: |
|
|
Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations. |
|
|
Specify a number of documents to bypass (skip) before returning documents.
The first You can use this parameter only in conjunction with an explicit |
|
|
Limit the total number of documents returned.
Once |
|
|
If true, the response includes a |
|
|
If true, the response includes the You can’t use |
|
|
Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
|
|
A timeout, in milliseconds, for each underlying HTTP request used to fetch documents as you iterate over the cursor. This method uses the collection-level timeout by default. |
Returns:
Cursor
- A cursor for iterating over documents.
AstraPy cursors are compatible with for
loops, and they provide a few additional features.
However, for vector ANN search (with $vector
or $vectorize
), the response is a single page of up to 1000 documents, unless you set a lower limit
.
If you need to materialize a list of all results, you can use A cursor, while it is consumed, transitions between |
Example response
Cursor("some_collection", new, retrieved so far: 0)
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.COLLECTION
# Find all documents in the collection
# Not advisable if a very high number of matches is anticipated
for document in collection.find({}):
print(document)
# Find all documents in the collection with a specific field value
for document in collection.find({"a": 123}):
print(document)
# Find all documents in the collection matching a compound filter expression
matches = list(collection.find({
"$and": [
{"f1": 1},
{"f2": 2},
]
}))
# Same as the preceding example, but using the implicit AND operator
matches = list(collection.find({
"f1": 1,
"f2": 2,
}))
# Use the "less than" operator in the filter expression
matches2 = list(collection.find({
"$and": [
{"name": "John"},
{"price": {"$lt": 100}},
]
}))
# Run a $vectorize search, get back the query vector along with the documents
results_ite = collection.find(
{},
projection={"*": 1},
limit=3,
include_sort_vector=True,
sort={"$vectorize": "Query text"},
)
query = results_ite.get_sort_vector()
for doc in results_ite:
print(f"{doc['$vectorize']}: {doc['$vector'][:2]}... VS. {query[:2]}...")
For more information, see the Client reference.
Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 });
Find documents matching a filter operator:
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { tag: { $exists: true } });
Iterate over the documents most similar to a given vector:
const cursor = collection.find({}, { sort: { $vector: [0.55, -0.40, 0.08] }, limit: 5 });
Iterate over similar documents by running a vector search with vectorize:
const cursor = collection.find({}, { sort: { $vectorize: 'Text to vectorize' }, limit: 5 });
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { projection: { name: 1 } });
Parameters:
Name | Type | Summary |
---|---|---|
|
A filter to select the documents to find. For a list of available operators, see Data API operators. |
|
|
The options for this operation. |
Options (FindOptions
):
Name | Type | Summary |
---|---|---|
Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations. When specifying a projection, make sure that you handle the return type carefully. Consider type-casting. |
||
|
If true, the response includes a |
|
|
If true, the response includes the You can’t use You can also access this through |
|
Perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
||
|
Specify a number of documents to bypass (skip) before returning documents.
The first You can use this parameter only in conjunction with an explicit |
|
|
Limit the total number of documents returned in the lifetime of the cursor.
Once |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request as you iterate over the cursor. |
Returns:
FindCursor<FoundDoc<Schema>>
- A cursor you can use to iterate over
the matching documents.
For vector ANN search (with $vector
or $vectorize
), the response is a single page of up to 1000 documents, unless you set a lower limit
.
If you need to materialize a list of all results, you can use A cursor, while it is consumed, transitions between |
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
{ name: 'Jane', age: 25, },
{ name: 'Dave', age: 40, },
]);
// Gets all 3 in some order
const unpredictable = await collection.find({}).toArray();
console.log(unpredictable);
// Failed find by name ([])
const matchless = await collection.find({ name: 'Carrie' }).toArray();
console.log(matchless);
// Find by $gt age (John, Dave)
const gtAgeCursor = collection.find({ age: { $gt: 25 } });
for await (const doc of gtAgeCursor) {
console.log(doc.name);
}
// Find by sorting by age (Jane, John, Dave)
const sortedAgeCursor = collection.find({}, { sort: { age: 1 } });
await sortedAgeCursor.forEach(console.log);
// Find first by vector similarity (John, 1)
const john = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true }).next();
console.log(john?.name, john?.$similarity);
})();
Operations on documents are performed at the Collection
level.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
// Synchronous
FindIterable<T> find(Filter filter, FindOptions options);
// Helper to build filter and options above ^
FindIterable<T> find(FindOptions options); // no filter
FindIterable<T> find(Filter filter); // default options
FindIterable<T> find(); // default options + no filters
FindIterable<T> find(float[] vector, int limit); // semantic search
FindIterable<T> find(Filter filter, float[] vector, int limit);
For more information, see Find a document and the Client reference.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Criteria list to filter documents. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. |
|
Set the different options for the
|
Returns:
FindIterable<T>
- A cursor that fetches up to the first 20 documents, and it can be iterated to fetch additional documents as needed.
However, for vector ANN search (with $vector
or $vectorize
), the response is a single page of up to 1000 documents, unless you set a lower limit
.
The The You can use the |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class Find {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Find Options
FindOptions options = new FindOptions()
.projection(include("field", "field2", "field3")) // select fields
.projection(exclude("_id")) // exclude some fields
.sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}) // similarity vector
.skip(1) // skip first item
.limit(10) // stop after 10 items (max records)
.pageState("pageState") // used for pagination
.includeSimilarity(); // include similarity
// Execute a find operation
FindIterable<Document> result = collection.find(filter, options);
// Iterate over the result
for (Document document : result) {
System.out.println(document);
}
}
}
Use the find
command to retrieve multiple documents matching a query.
Retrieve documents by any property, as long as the property is covered by the collection’s indexing configuration:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": { "purchase_date": { "$date": 1690045891 } }
}
}' | jq
Retrieve documents matching a filter operator:
"find": {
"filter": { "preferred_customer": { "$exists": true } }
}
More filter operator examples
Match values that are equal to the filter value:
"find": {
"filter": {
"customer": {
"$eq": {
"name": "Jasmine S.",
"city": "Jersey City"
}
}
}
}
Match values that are not the filter value:
"find": {
"filter": {
"$not": {
"customer.address.state": "NJ"
}
}
}
You can use similar $not
operators for arrays, such as $nin
an $ne
.
Match any of the specified values in an array:
"find": {
"filter": {
"customer.address.city": {
"$in": [ "Jersey City", "Orange" ]
}
}
}
Match all in an array:
"find": {
"filter": {
"items": {
"$all": [
{
"car": "Sedan",
"color": "White"
},
"Extended warranty"
]
}
}
}
Compound and/or operators:
"find": {
"filter": {
"$and": [
{
"$or": [
{ "customer.address.city": "Jersey City" },
{ "customer.address.city": "Orange" }
]
},
{
"$or": [
{ "seller.name": "Jim A." },
{ "seller.name": "Tammy S." }
]
}
]
}
}
Compound range operators:
"find": {
"filter": {
"$and": [
{ "customer.credit_score": { "$gte": 700 } },
{ "customer.credit_score": { "$lt": 800 } }
]
}
}
Retrieve documents that are most similar to a given vector:
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"options": {
"limit": 100
}
}
Retrieve similar documents by running a vector search with vectorize:
"find": {
"sort": { "$vectorize": "I'd like some talking shoes" },
"options": {
"limit": 100
}
}
Use a projection to specify the fields returned from each document.
A projection is required if you want to return certain reserved fields, like $vector
and $vectorize
, that are excluded by default.
"find": {
"sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
"projection": { "$vector": 1 },
"options": {
"includeSimilarity": true,
"limit": 100
}
}
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The Data API command to retrieve multiple document in a collection based on one or more of |
|
|
An object that defines filter criteria using the Data API filter syntax.
For example: |
|
|
Perform a vector similarity search or set the order in which documents are returned.
For similarity searches, this parameter can use either |
|
|
Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations. |
|
|
If true, the response includes a
|
|
|
If true, the response includes the
You can’t use |
|
|
Specify a number of documents to bypass (skip) before returning documents.
The first You can use this parameter only in conjunction with an explicit |
|
|
Limit the total number of documents returned.
Pagination can occur if more than 20 documents are returned in the current set of matching documents.
Once the |
Returns:
A successful response can include a data
object and a status
object:
-
The
data
object containsdocuments
, which is an array of objects. Each object represents a document matching the given query. The returned fields in each document object depend on thefindMany
parameters, namely theprojection
andoptions
.For vector ANN search (with
$vector
or$vectorize
), the response is a single page of up to 1000 documents, unless you set a lowerlimit
.For non-vector searches, pagination occurs if there are more than 20 matching documents, as indicated by the
nextPageState
key. If there are no more documents,nextPageState
isnull
or omitted. If there are more documents,nextPageState
contains an ID.{ "data": { "documents": [ { "_id": { "$uuid": "018e65c9-df45-7913-89f8-175f28bd7f74" } }, { "_id": { "$uuid": "018e65c9-e33d-749b-9386-e848739582f0" } } ], "nextPageState": null } }
In the event of pagination, you must issue a subsequent request with a
pageState
ID to fetch the next page of documents that matched the filter. As long as there is a subsequent page with matching documents, the transaction returns anextPageState
ID, which you use as thepageState
for the subsequent request. Each paginated request is exactly the same as the original request, except for the addition of thepageState
in theoptions
object:{ "find": { "filter": { "active_user": true }, "options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" } } }
Continue issuing requests with the subsequent
pageState
ID until you have fetched all matching documents. -
The
status
object contains thesortVector
value if you setincludeSortVector
totrue
in the request:"status": { "sortVector": [0.4, 0.1, ...] }
Example:
Example of a simple property filter
This example uses a simple filter based on two document properties:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"customer.address.city": "Hoboken",
"customer.address.state": "NJ"
}
}
}' | jq
The response returned one matching document:
{
"data": {
"documents": [
{
"$vector": [
0.1,
0.15,
0.3,
0.12,
0.09
],
"_id": "17",
"amount": 54900,
"customer": {
"address": {
"address_line": "1234 Main St",
"city": "Hoboken",
"state": "NJ"
},
"age": 61,
"credit_score": 694,
"name": "Yolanda Z.",
"phone": "123-456-1177"
},
"items": [
{
"car": "Tesla Model 3",
"color": "Blue"
},
"Extended warranty - 5 years"
],
"purchase_date": {
"$date": 1702660291
},
"purchase_type": "Online",
"seller": {
"location": "Jersey City NJ",
"name": "Jim A."
},
"status": "active"
}
],
"nextPageState": null
}
}
Example of logical operators in a filter
This example uses the $and
and $or
logical operators to retrieve documents matching one condition from each $or
clause.
In this case, the customer.address.city
must be either Jersey City
or Orange
and the seller.name
must be either Jim A.
or Tammy S.
.
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"$and": [
{
"$or": [
{ "customer.address.city": "Jersey City" },
{ "customer.address.city": "Orange" }
]
},
{
"$or": [
{ "seller.name": "Jim A." },
{ "seller.name": "Tammy S." }
]
}
]
}
}
}' | jq
The response returned two matching documents:
{
"data": {
"documents": [
{
"$vector": [
0.3,
0.23,
0.15,
0.17,
0.4
],
"_id": "8",
"amount": 46900,
"customer": {
"address": {
"address_line": "1234 Main St",
"city": "Orange",
"state": "NJ"
},
"age": 29,
"credit_score": 710,
"name": "Harold S.",
"phone": "123-456-8888"
},
"items": [
{
"car": "BMW X3 SUV",
"color": "Black"
},
"Extended warranty - 5 years"
],
"purchase_date": {
"$date": 1693329091
},
"purchase_type": "In Person",
"seller": {
"location": "Staten Island NYC",
"name": "Tammy S."
},
"status": "active"
},
{
"$vector": [
0.25,
0.045,
0.38,
0.31,
0.67
],
"_id": "5",
"amount": 94990,
"customer": {
"address": {
"address_line": "32345 Main Ave",
"city": "Jersey City",
"state": "NJ"
},
"age": 50,
"credit_score": 800,
"name": "David C.",
"phone": "123-456-5555"
},
"items": [
{
"car": "Tesla Model S",
"color": "Red"
},
"Extended warranty - 5 years"
],
"purchase_date": {
"$date": 1690996291
},
"purchase_type": "Online",
"seller": {
"location": "Jersey City NJ",
"name": "Jim A."
},
"status": "active"
}
],
"nextPageState": null
}
}
Find distinct values across documents
Get a list of the distinct values of a certain key in a collection.
|
Sort and filter operations can use only indexed fields. If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries. |
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
collection.distinct("category")
Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.
collection.distinct(
"food.allergies",
filter={"registered_for_dinner": True},
)
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable |
|
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
|
|
A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. |
Returns:
List[Any]
- A list of the distinct values encountered. Documents that lack the requested key are ignored.
Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]
For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
collection.insert_many(
[
{"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
{"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
]
)
collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]
For more information, see the Client reference.
const unique = await collection.distinct('category');
Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.
const unique = await collection.distinct(
'food.allergies',
{ registeredForDinner: true },
);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field whose value is inspected across documents. Keys can use dot-notation to
descend to deeper document levels. Example of acceptable key values: |
|
A filter to select the documents to use. If not provided, all documents will be used. |
Returns:
Promise<Flatten<(SomeDoc & ToDotNotation<FoundDoc<Schema>>)[Key]>[]>
- A promise which resolves to the
unique distinct values.
The return type is mostly accurate, but with complex keys, it may be required to manually cast the return type to the expected type.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertOne({ name: 'Marco', food: ['apple', 'orange'], city: 'Helsinki' });
await collection.insertOne({ name: 'Emma', food: { likes_fruit: true, allergies: [] } });
// ['Marco', 'Emma']
await collection.distinct('name')
// ['Helsinki']
await collection.distinct('city')
// ['apple', 'orange', { likes_fruit: true, allergies: [] }]
await collection.distinct('food')
// ['orange']
await collection.distinct('food.1')
// []
await collection.distinct('food.allergies')
// [true]
await collection.distinct('food.likes_fruit')
})();
Gets the distinct values of the specified field name.
// Synchronous
DistinctIterable<T,F> distinct(String fieldName, Filter filter, Class<F> resultClass);
DistinctIterable<T,F> distinct(String fieldName, Class<F> resultClass);
// Asynchronous
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Filter filter, Class<F> resultClass);
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Class<F> resultClass);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field on which project the value. |
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
|
|
The type of the field we are working on |
Returns:
DistinctIterable<F>
- List of distinct values of the specified field name.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DistinctIterable;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class Distinct {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Execute a find operation
DistinctIterable<Document, String> result = collection
.distinct("field", String.class);
DistinctIterable<Document, String> result2 = collection
.distinct("field", filter, String.class);
// Iterate over the result
for (String fieldValue : result) {
System.out.println(fieldValue);
}
}
}
This operation has no literal equivalent in HTTP.
Instead, you can use Find documents using filtering options, and then use jq
or another utility to extract _id
or other desired values from the response.