Find documents reference

Documents represent a single row or record of data in Astra DB Serverless databases. You use the Collection class to work with documents through the Data API. For instructions to get a Collection object, see the Collections reference.

Astra DB APIs use the term keyspace to refer to both namespaces and keyspaces.

For general information about working with documents, including common operations and operators, see the Documents reference.

Find a document

Retrieve a single document from a collection using various filter and query options.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Retrieve a single document from a collection by its _id:

document = collection.find_one({"_id": 101})

Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:

document = collection.find_one({"location": "warehouse_C"})

Retrieve a single document from a collection by an arbitrary filtering clause:

document = collection.find_one({"tag": {"$exists": True}})

Retrieve the document that is most similar to a given vector:

result = collection.find_one({}, sort={"$vector": [.12, .52, .32]})

Retrieve the most similar document by running a vector search with vectorize:

result = collection.find_one({}, sort={"$vectorize": "Text to vectorize"})

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

result = collection.find_one({"_id": 101}, projection={"name": True})

Returns:

Union[Dict[str, Any], None] - Either the found document as a dictionary or None if no matching document is found.

Example response
{'_id': 101, 'name': 'John Doe', '$vector': [0.12, 0.52, 0.32]}

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

include_similarity

Optional[bool]

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. Only valid for vector ANN search with $vector or $vectorize.

sort

Optional[Dict[str, Any]]

Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.find_one()
# prints: {'_id': '68d1e515-...', 'seq': 37}
collection.find_one({"seq": 10})
# prints: {'_id': 'd560e217-...', 'seq': 10}
collection.find_one({"seq": 1011})
# (returns None for no matches)
collection.find_one(projection={"seq": False})
# prints: {'_id': '68d1e515-...'}
collection.find_one(
    {},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: {'_id': '97e85f81-...', 'seq': 69}
collection.find_one(sort={"$vector": [1, 0]}, projection={"*": True})
# prints: {'_id': '...', 'tag': 'D', '$vector': [4.0, 1.0]}

For more information, see the Client reference.

Retrieve a single document from a collection by its _id:

const doc = await collection.findOne({ _id: '101' });

Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:

const doc = await collection.findOne({ location: 'warehouse_C' });

Retrieve a single document from a collection by an arbitrary filtering clause:

const doc = await collection.findOne({ tag: { $exists: true } });

Retrieve the document that is most similar to a given vector:

const doc = await collection.findOne({}, { sort: { $vector: [.12, .52, .32] } });

Retrieve the most similar document by running a vector search with vectorize:

const doc = await collection.findOne({}, { sort: { $vectorize: 'Text to vectorize' } });

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

const doc = await collection.findOne({ _id: '101' }, { projection: { name: 1 } });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to find. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

options?

FindOneOptions

The options for this operation.

Options (FindOneOptions):

Name Type Summary

projection?

Projection

Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

When specifying a projection, make sure that you handle the return type carefully. Consider type-casting.

includeSimilarity?

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid when performing a vector search with $vector or $vectorize.

sort?

Sort

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, sort can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<FoundDoc<Schema> | null> - A promise that resolves to the found document (inc. $similarity if applicable), or null if no matching document is found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
    { name: 'Jane', age: 25, },
    { name: 'Dave', age: 40, },
  ]);

  // Unpredictably prints one of their names
  const unpredictable = await collection.findOne({});
  console.log(unpredictable?.name);

  // Failed find by name (null)
  const failed = await collection.findOne({ name: 'Carrie' });
  console.log(failed);

  // Find by $gt age (Dave)
  const dave = await collection.findOne({ age: { $gt: 30 } });
  console.log(dave?.name);

  // Find by sorting by age (Jane)
  const jane = await collection.findOne({}, { sort: { age: 1 } });
  console.log(jane?.name);

  // Find by vector similarity (John, 1)
  const john = await collection.findOne({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true });
  console.log(john?.name, john?.$similarity);
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the Client reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
Optional<T> findOne(Filter filter);
Optional<T> findOne(Filter filter, FindOneOptions options);
Optional<T> findById(Object id); // build the filter for you

// Asynchronous
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter);
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter, FindOneOptions options);
CompletableFuture<Optional<DOC>> findByIdAsync(Filter filter);

You can retrieve documents in various ways, for example:

Additionally, you can use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

In the underlying HTTP request, a filter is a JSON object containing filter and sort parameters, for example:

{
  "findOne": {
    "filter": {
      "$and": [
        { "field2": { "$gt": 10 } },
        { "field3": { "$lt": 20 } },
        { "field4": { "$eq": "value" } }
      ]
    },
    "projection": {
      "_id": 0,
      "field": 1,
      "field2": 1,
      "field3": 1
    },
    "sort": {
      "$vector": [0.25, 0.25, 0.25,0.25, 0.25]
    },
    "options": {
      "includeSimilarity": true
    }
  }
}

You can define the preceding JSON object in Java as follows:

collection.findOne(
  Filters.and(
   Filters.gt("field2", 10),
   Filters.lt("field3", 20),
   Filters.eq("field4", "value")
  ),
  new FindOneOptions()
   .projection(Projections.include("field", "field2", "field3"))
   .projection(Projections.exclude("_id"))
   .vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
   .includeSimilarity()
  )
);

// with the import Static Magic
collection.findOne(
  and(
   gt("field2", 10),
   lt("field3", 20),
   eq("field4", "value")
  ),
  vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
   .projection(Projections.include("field", "field2", "field3"))
   .projection(Projections.exclude("_id"))
   .includeSimilarity()
);

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

options (optional)

FindOneOptions

Set the different options for the findOne operation, including the following:

  • sort(): Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

  • projection(): A list of flags that select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

  • includeSimilarity(): If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

Returns:

Optional<T> - Return the working document matching the filter or Optional.empty() if no document is found.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneOptions;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.and;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.gt;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class FindOne {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Complete FindOne
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));
        FindOneOptions options = new FindOneOptions()
                .projection(include("field", "field2", "field3"))
                .projection(exclude("_id"))
                .sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
                .includeSimilarity();
        Optional<Document> result = collection.findOne(filter, options);

        // with the import Static Magic
        collection.findOne(and(
                gt("field2", 10),
                lt("field3", 20),
                eq("field4", "value")),
               new FindOneOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
                .projection(include("field", "field2", "field3"))
                .projection(exclude("_id"))
                .includeSimilarity()
        );

        // find one with a vectorize
        collection.findOne(and(
                        gt("field2", 10),
                        lt("field3", 20),
                        eq("field4", "value")),
                new FindOneOptions().sort("Life is too short to be living somebody else's dream.")
                        .projection(include("field", "field2", "field3"))
                        .projection(exclude("_id"))
                        .includeSimilarity()
        );

        collection.insertOne(new Document()
                .append("field", "value")
                .append("field2", 15)
                .append("field3", 15)
                .vectorize("Life is too short to be living somebody else's dream."));

    }
}

Use the findOne command to retrieve a document.

Retrieve a single document from a collection by its _id:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOne": {
    "filter": { "_id": "018e65c9-df45-7913-89f8-175f28bd7f74" }
  }
}' | jq

Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:

"findOne": {
  "filter": { "purchase_date": { "$date": 1690045891 } }
}

Retrieve a single document from a collection by an arbitrary filtering clause:

"findOne": {
  "filter": { "preferred_customer": { "$exists": true } }
}

Retrieve the document that is most similar to a given vector:

"findOne": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] }
}

Retrieve the most similar document by running a vector search with vectorize:

"findOne": {
  "sort": { "$vectorize": "I'd like some talking shoes" }
}

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

"findOne": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
  "projection": { "$vector": 1 }
}

Parameters:

Name Type Summary

findOne

command

The Data API command to retrieve a document in a collection based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

sort

object

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

projection

object

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

"options": { "includeSimilarity": true }

Returns:

A successful response includes a data object that contains a document object representing the document matching the given query. The returned document fields depend on the findOne parameters, namely the projection and options.

"data": {
  "document": {
    "_id": "14"
  }
}
Example

This request retrieves a document from a collection by its _id with the default projection and options:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOne": {
    "filter": { "_id": "14" }
  }
}' | jq

The response contains the document’s _id and all regular fields. The default projection excludes $vector and $vectorize.

{
  "data": {
    "document": {
      "_id": "14",
      "amount": 110400,
      "customer": {
        "address": {
          "address_line": "1414 14th Pl",
          "city": "Brooklyn",
          "state": "NY"
        },
        "age": 44,
        "credit_score": 702,
        "name": "Kris S.",
        "phone": "123-456-1144"
      },
      "items": [
        {
          "car": "Tesla Model X",
          "color": "White"
        }
      ],
      "purchase_date": {
        "$date": 1698513091
      },
      "purchase_type": "In Person",
      "seller": {
        "location": "Brooklyn NYC",
        "name": "Jasmine S."
      }
    }
  }
}

Find documents using filtering options

Where you use findOne to fetch one document that matches a query, you use find to fetch multiple documents that match a query.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:

doc_iterator = collection.find({"category": "house_appliance"}, limit=10)

Find documents matching a filter operator:

document = collection.find({"tag": {"$exists": True}}, limit=10)

Iterate over the documents most similar to a given vector:

doc_iterator = collection.find(
    {},
    sort={"$vector": [0.55, -0.40, 0.08]},
    limit=5,
)

Iterate over similar documents by running a vector search with vectorize:

doc_iterator = collection.find(
    {},
    sort={"$vectorize": "Text to vectorize"},
    limit=5,
)

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

result = collection.find({"category": "house_appliance"}, limit=10, projection={"name": True})

Returns:

Cursor - A cursor for iterating over documents. AstraPy cursors are compatible with for loops, and they provide a few additional features. However, for vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

collection.find returns a cursor that must be iterated over to fetch matching documents.

If you need to materialize a list of all results, you can use list(). However, be aware that the time and memory required for this operation depend on the number of results.

A cursor, while it is consumed, transitions between initialized, running, and exhausted status. exhausted indicates there are no more documents to read.

Example response
Cursor("some_collection", new, retrieved so far: 0)

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

skip

Optional[int]

Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if skip=5, the first 5 documents are discarded, and the results begin at the 6th document.

You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

limit

Optional[int]

Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

include_similarity

Optional[bool]

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. Only valid for vector ANN search with $vector or $vectorize.

include_sort_vector

Optional[bool]

If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

You can’t use include_sort_vector with find_one(). However, you can use include_sort_vector and limit=1 with find().

sort

Optional[Dict[str, Any]]

Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

max_time_ms

Optional[int]

A timeout, in milliseconds, for each underlying HTTP request used to fetch documents as you iterate over the cursor. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy

client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.COLLECTION

# Find all documents in the collection
# Not advisable if a very high number of matches is anticipated
for document in collection.find({}):
    print(document)

# Find all documents in the collection with a specific field value
for document in collection.find({"a": 123}):
    print(document)

# Find all documents in the collection matching a compound filter expression
matches = list(collection.find({
    "$and": [
      {"f1": 1},
      {"f2": 2},
    ]
}))

# Same as the preceding example, but using the implicit AND operator
matches = list(collection.find({
    "f1": 1,
    "f2": 2,
}))

# Use the "less than" operator in the filter expression
matches2 = list(collection.find({
    "$and": [
      {"name": "John"},
      {"price": {"$lt": 100}},
    ]
}))

# Run a $vectorize search, get back the query vector along with the documents
results_ite = collection.find(
    {},
    projection={"*": 1},
    limit=3,
    include_sort_vector=True,
    sort={"$vectorize": "Query text"},
)
query = results_ite.get_sort_vector()
for doc in results_ite:
    print(f"{doc['$vectorize']}: {doc['$vector'][:2]}... VS. {query[:2]}...")

For more information, see the Client reference.

Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 });

Find documents matching a filter operator:

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { tag: { $exists: true } });

Iterate over the documents most similar to a given vector:

const cursor = collection.find({}, { sort: { $vector: [0.55, -0.40, 0.08] }, limit: 5 });

Iterate over similar documents by running a vector search with vectorize:

const cursor = collection.find({}, { sort: { $vectorize: 'Text to vectorize' }, limit: 5 });

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { projection: { name: 1 } });

Returns:

FindCursor<FoundDoc<Schema>> - A cursor you can use to iterate over the matching documents. For vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

collection.find returns a cursor that must be iterated over to fetch matching documents.

If you need to materialize a list of all results, you can use list(). However, be aware that the time and memory required for this operation depend on the number of results.

A cursor, while it is consumed, transitions between initialized, running, and exhausted status. exhausted indicates there are no more documents to read.

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to find. For a list of available operators, see Data API operators.

options?

FindOptions

The options for this operation.

Options (FindOptions):

Name Type Summary

projection?

Projection

Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

When specifying a projection, make sure that you handle the return type carefully. Consider type-casting.

includeSimilarity?

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. Only valid for vector ANN search with $vector or $vectorize.

includeSortVector?

boolean

If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

You can’t use includeSortVector with findOne(). However, you can use includeSortVector and limit: 1 with find().

You can also access this through await cursor.getSortVector().

sort?

Sort

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

skip?

number

Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if skip: 5, the first 5 documents are discarded, and the results begin at the 6th document.

You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

limit?

number

Limit the total number of documents returned in the lifetime of the cursor. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request as you iterate over the cursor.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
    { name: 'Jane', age: 25, },
    { name: 'Dave', age: 40, },
  ]);

  // Gets all 3 in some order
  const unpredictable = await collection.find({}).toArray();
  console.log(unpredictable);

  // Failed find by name ([])
  const matchless = await collection.find({ name: 'Carrie' }).toArray();
  console.log(matchless);

  // Find by $gt age (John, Dave)
  const gtAgeCursor = collection.find({ age: { $gt: 25 } });
  for await (const doc of gtAgeCursor) {
    console.log(doc.name);
  }

  // Find by sorting by age (Jane, John, Dave)
  const sortedAgeCursor = collection.find({}, { sort: { age: 1 } });
  await sortedAgeCursor.forEach(console.log);

  // Find first by vector similarity (John, 1)
  const john = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true }).next();
  console.log(john?.name, john?.$similarity);
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
FindIterable<T> find(Filter filter, FindOptions options);
// Helper to build filter and options above ^
FindIterable<T> find(FindOptions options); // no filter
FindIterable<T> find(Filter filter); // default options
FindIterable<T> find(); // default options + no filters
FindIterable<T> find(float[] vector, int limit); // semantic search
FindIterable<T> find(Filter filter, float[] vector, int limit);

For more information, see Find a document and the Client reference.

Returns:

FindIterable<T> - A cursor that fetches up to the first 20 documents, and it can be iterated to fetch additional documents as needed. However, for vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

The FindIterable is an Iterable that you can use in a for loop to iterate over the returned documents.

The FindIterable fetches chunks of documents, and then fetches more as needed. The FindIterable is a lazy iterator, meaning that it only fetches the next chunk of documents when needed.

You can use the .all() method to exhaust it, but use this with caution.

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter documents. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators.

options (optional)

FindOptions

Set the different options for the find operation, including the following:

  • sort(): Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

  • projection(): A list of flags that select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

  • includeSimilarity(): If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

  • includeSortVector(): If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

    You can’t use includeSortVector with findOne(). However, you can use includeSortVector and limit(1) with find().

  • limit: Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

  • skip: Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if skip: 5, the first 5 documents are discarded, and the results begin at the 6th document.

    You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Find {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Find Options
        FindOptions options = new FindOptions()
                .projection(include("field", "field2", "field3")) // select fields
                .projection(exclude("_id")) // exclude some fields
                .sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}) // similarity vector
                .skip(1) // skip first item
                .limit(10) // stop after 10 items (max records)
                .pageState("pageState") // used for pagination
                .includeSimilarity(); // include similarity

        // Execute a find operation
        FindIterable<Document> result = collection.find(filter, options);

        // Iterate over the result
        for (Document document : result) {
            System.out.println(document);
        }
    }
}

Use the find command to retrieve multiple documents matching a query.

Retrieve documents by any property, as long as the property is covered by the collection’s indexing configuration:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": { "purchase_date": { "$date": 1690045891 } }
  }
}' | jq

Retrieve documents matching a filter operator:

"find": {
  "filter": { "preferred_customer": { "$exists": true } }
}
More filter operator examples

Match values that are equal to the filter value:

"find": {
  "filter": {
    "customer": {
      "$eq": {
        "name": "Jasmine S.",
        "city": "Jersey City"
      }
    }
  }
}

Match values that are not the filter value:

"find": {
  "filter": {
    "$not": {
      "customer.address.state": "NJ"
    }
  }
}

You can use similar $not operators for arrays, such as $nin an $ne.

Match any of the specified values in an array:

"find": {
  "filter": {
    "customer.address.city": {
      "$in": [ "Jersey City", "Orange" ]
    }
  }
}

Match all in an array:

"find": {
  "filter": {
    "items": {
      "$all": [
        {
          "car": "Sedan",
          "color": "White"
        },
        "Extended warranty"
      ]
    }
  }
}

Compound and/or operators:

"find": {
  "filter": {
    "$and": [
      {
        "$or": [
          { "customer.address.city": "Jersey City" },
          { "customer.address.city": "Orange" }
        ]
      },
      {
        "$or": [
          { "seller.name": "Jim A." },
          { "seller.name": "Tammy S." }
        ]
      }
    ]
  }
}

Compound range operators:

"find": {
  "filter": {
    "$and": [
      { "customer.credit_score": { "$gte": 700 } },
      { "customer.credit_score": { "$lt": 800 } }
    ]
  }
}

Retrieve documents that are most similar to a given vector:

"find": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
  "options": {
    "limit": 100
  }
}

Retrieve similar documents by running a vector search with vectorize:

"find": {
  "sort": { "$vectorize": "I'd like some talking shoes" },
  "options": {
    "limit": 100
  }
}

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

"find": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
  "projection": { "$vector": 1 },
  "options": {
    "includeSimilarity": true,
    "limit": 100
  }
}

Parameters:

Name Type Summary

find

command

The Data API command to retrieve multiple document in a collection based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators.

sort

object

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Sort operations.

projection

object

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Projection operations.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and each document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

"options": { "includeSimilarity": true }

options.includeSortVector

boolean

If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

"options": { "includeSortVector": true }

You can’t use includeSortVector with findOne. However, you can use includeSortVector and limit: 1 with find.

skip

integer

Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if "skip": 5, the first 5 documents are discarded, and the results begin at the 6th document.

You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

limit

integer

Limit the total number of documents returned. Pagination can occur if more than 20 documents are returned in the current set of matching documents. Once the limit is reached, either in a single response or the last page of a paginated response, nothing more is returned.

Returns:

A successful response can include a data object and a status object:

  • The data object contains documents, which is an array of objects. Each object represents a document matching the given query. The returned fields in each document object depend on the findMany parameters, namely the projection and options.

    For vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

    For non-vector searches, pagination occurs if there are more than 20 matching documents, as indicated by the nextPageState key. If there are no more documents, nextPageState is null or omitted. If there are more documents, nextPageState contains an ID.

    {
      "data": {
        "documents": [
          {
            "_id": { "$uuid": "018e65c9-df45-7913-89f8-175f28bd7f74" }
          },
          {
            "_id": { "$uuid": "018e65c9-e33d-749b-9386-e848739582f0" }
          }
        ],
        "nextPageState": null
      }
    }

    In the event of pagination, you must issue a subsequent request with a pageState ID to fetch the next page of documents that matched the filter. As long as there is a subsequent page with matching documents, the transaction returns a nextPageState ID, which you use as the pageState for the subsequent request. Each paginated request is exactly the same as the original request, except for the addition of the pageState in the options object:

    {
      "find": {
        "filter": { "active_user": true },
        "options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" }
      }
    }

    Continue issuing requests with the subsequent pageState ID until you have fetched all matching documents.

  • The status object contains the sortVector value if you set includeSortVector to true in the request:

    "status": { "sortVector": [0.4, 0.1, ...] }

Examples:

Example of simple property filter

This example uses a simple filter based on two document properties:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "customer.address.city": "Hoboken",
      "customer.address.state": "NJ"
    }
  }
}' | jq

The response returned one matching document:

{
  "data": {
    "documents": [
      {
        "$vector": [
          0.1,
          0.15,
          0.3,
          0.12,
          0.09
        ],
        "_id": "17",
        "amount": 54900,
        "customer": {
          "address": {
            "address_line": "1234 Main St",
            "city": "Hoboken",
            "state": "NJ"
          },
          "age": 61,
          "credit_score": 694,
          "name": "Yolanda Z.",
          "phone": "123-456-1177"
        },
        "items": [
          {
            "car": "Tesla Model 3",
            "color": "Blue"
          },
          "Extended warranty - 5 years"
        ],
        "purchase_date": {
          "$date": 1702660291
        },
        "purchase_type": "Online",
        "seller": {
          "location": "Jersey City NJ",
          "name": "Jim A."
        },
        "status": "active"
      }
    ],
    "nextPageState": null
  }
}
Example of logical operators in a filter

This example uses the $and and $or logical operators to retrieve documents matching one condition from each $or clause. In this case, the customer.address.city must be either Jersey City or Orange and the seller.name must be either Jim A. or Tammy S..

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "$and": [
        {
          "$or": [
            { "customer.address.city": "Jersey City" },
            { "customer.address.city": "Orange" }
          ]
        },
        {
          "$or": [
            { "seller.name": "Jim A." },
            { "seller.name": "Tammy S." }
          ]
        }
      ]
    }
  }
}' | jq

The response returned two matching documents:

{
  "data": {
    "documents": [
      {
        "$vector": [
          0.3,
          0.23,
          0.15,
          0.17,
          0.4
        ],
        "_id": "8",
        "amount": 46900,
        "customer": {
          "address": {
            "address_line": "1234 Main St",
            "city": "Orange",
            "state": "NJ"
          },
          "age": 29,
          "credit_score": 710,
          "name": "Harold S.",
          "phone": "123-456-8888"
        },
        "items": [
          {
            "car": "BMW X3 SUV",
            "color": "Black"
          },
          "Extended warranty - 5 years"
        ],
        "purchase_date": {
          "$date": 1693329091
        },
        "purchase_type": "In Person",
        "seller": {
          "location": "Staten Island NYC",
          "name": "Tammy S."
        },
        "status": "active"
      },
      {
        "$vector": [
          0.25,
          0.045,
          0.38,
          0.31,
          0.67
        ],
        "_id": "5",
        "amount": 94990,
        "customer": {
          "address": {
            "address_line": "32345 Main Ave",
            "city": "Jersey City",
            "state": "NJ"
          },
          "age": 50,
          "credit_score": 800,
          "name": "David C.",
          "phone": "123-456-5555"
        },
        "items": [
          {
            "car": "Tesla Model S",
            "color": "Red"
          },
          "Extended warranty - 5 years"
        ],
        "purchase_date": {
          "$date": 1690996291
        },
        "purchase_type": "Online",
        "seller": {
          "location": "Jersey City NJ",
          "name": "Jim A."
        },
        "status": "active"
      }
    ],
    "nextPageState": null
  }
}

Find distinct values across documents

Get a list of the distinct values of a certain key in a collection.

distinct is a client-side operation, which effectively browses all required documents using the logic of the find command, and then collects the unique values found for key. There can be performance, latency, and billing implications if there are many matching documents.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

collection.distinct("category")

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

collection.distinct(
    "food.allergies",
    filter={"registered_for_dinner": True},
)

Returns:

List[Any] - A list of the distinct values encountered. Documents that lack the requested key are ignored.

Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]

Parameters:

Name Type Summary

key

str

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: "field", "field.subfield", "field.3", and "field.3.subfield". If lists are encountered and no numeric index is specified, all items in the list are visited.

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default.

For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many(
    [
        {"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
        {"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
    ]
)

collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]

For more information, see the Client reference.

const unique = await collection.distinct('category');

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

const unique = await collection.distinct(
  'food.allergies',
  { registeredForDinner: true },
);

Parameters:

Name Type Summary

key

string

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: 'field', 'field.subfield', 'field.3', and 'field.3.subfield'. If lists are encountered and no numeric index is specified, all items in the list are visited.

filter?

Filter<Schema>

A filter to select the documents to use. If not provided, all documents will be used.

Returns:

Promise<Flatten<(SomeDoc & ToDotNotation<FoundDoc<Schema>>)[Key]>[]> - A promise which resolves to the unique distinct values.

The return type is mostly accurate, but with complex keys, it may be required to manually cast the return type to the expected type.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertOne({ name: 'Marco', food: ['apple', 'orange'], city: 'Helsinki' });
  await collection.insertOne({ name: 'Emma', food: { likes_fruit: true, allergies: [] } });

  // ['Marco', 'Emma']
  await collection.distinct('name')

  // ['Helsinki']
  await collection.distinct('city')

  // ['apple', 'orange', { likes_fruit: true, allergies: [] }]
  await collection.distinct('food')

  // ['orange']
  await collection.distinct('food.1')

  // []
  await collection.distinct('food.allergies')

  // [true]
  await collection.distinct('food.likes_fruit')
})();

Gets the distinct values of the specified field name.

// Synchronous
DistinctIterable<T,F> distinct(String fieldName, Filter filter, Class<F> resultClass);
DistinctIterable<T,F> distinct(String fieldName, Class<F> resultClass);

// Asynchronous
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Filter filter, Class<F> resultClass);
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Class<F> resultClass);

Returns:

DistinctIterable<F> - List of distinct values of the specified field name.

Parameters:

Name Type Summary

fieldName

String

The name of the field on which project the value.

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

resultClass

Class

The type of the field we are working on

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DistinctIterable;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Distinct {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Execute a find operation
        DistinctIterable<Document, String> result = collection
                .distinct("field", String.class);
        DistinctIterable<Document, String> result2 = collection
                .distinct("field", filter, String.class);

        // Iterate over the result
        for (String fieldValue : result) {
            System.out.println(fieldValue);
        }
    }
}

This operation has no literal equivalent in HTTP. Instead, you can use Find documents using filtering options, and then use jq or another utility to extract _id or other desired values from the response.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com