Count documents reference

Documents represent a single row or record of data in Astra DB Serverless databases. You use the Collection class to work with documents through the Data API. For instructions to get a Collection object, see the Collections reference.

Astra DB APIs use the term keyspace to refer to both namespaces and keyspaces.

For general information about working with documents, including common operations and operators, see the Documents reference.

Count documents in a collection

Get the count of documents in a collection. Count all documents or apply filtering to count a subset of documents.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Count all documents in a collection up to the specified limit:

collection.count_documents({}, upper_bound=500)

Get the count of the documents in a collection matching a filter condition up to the specified limit:

collection.count_documents({"seq":{"$gt": 15}}, upper_bound=50)

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

Example response
320

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, an exception is raised. If you need to count large numbers of documents, consider using estimatedDocumentCount or dsbulk count instead.

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

upper_bound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upper_bound.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": i} for i in range(20)])

collection.count_documents({}, upper_bound=100)
# prints: 20
collection.count_documents({"seq":{"$gt": 15}}, upper_bound=100)
# prints: 4
collection.count_documents({}, upper_bound=10)
# Raises: astrapy.exceptions.TooManyDocumentsToCountException

For more information, see the Client reference.

const numDocs = await collection.countDocuments({}, 500);

Get the count of the documents in a collection matching a filter.

const numDocs = await collection.countDocuments({ seq: { $gt: 15 } }, 50);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to count. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

upperBound

number

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upperBound.

options?

WithTimeout

The options (the timeout) for this operation.

Returns:

Promise<number> - A promise that resolves to the exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound, in which case an exception is raised.

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, an exception is raised. If you need to count large numbers of documents, consider using estimatedDocumentCount or dsbulk count instead.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany(Array.from({ length: 20 }, (_, i) => ({ seq: i })));

  // Prints 20
  await collection.countDocuments({}, 100);

  // Prints 4
  await collection.countDocuments({ seq: { $gt: 15 } }, 100);

  // Throws TooManyDocumentsToCountError
  await collection.countDocuments({}, 10);
})();

Count all documents or get the count of the documents in a collection matching a condition:

// Synchronous
int countDocuments(int upperBound)
throws TooManyDocumentsToCountException;

int countDocuments(Filter filter, int upperBound)
throws TooManyDocumentsToCountException;

Parameters:

Name Type Summary

filter (optional)

Filter

A filter to select documents to count. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

upperBound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upperBound.

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

The checked exception TooManyDocumentsToCountException is raised when the actual number of documents exceeds the upper bound set by the caller or the API. This exception indicates that there are more matching documents beyond the count threshold. Consider modifying your conditions to count fewer documents at once. If you need to count large numbers of documents, consider using estimatedDocumentCount or dsbulk count instead.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import static com.datastax.astra.client.model.Filters.lt;

public class CountDocuments {
    public static void main(String[] args)  {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        try {
            // Count with no filter
            collection.countDocuments(500);

            // Count with a filter
            collection.countDocuments(filter, 500);

        } catch(TooManyDocumentsToCountException tmde) {
            // Explicit error if the count is above the upper limit or above the 1000 limit
        }

    }


}

Use the Data API countDocuments command to obtain the exact count of documents in a collection:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{ "countDocuments": {} }' | jq

You can provide an optional filter condition to count only documents matching the filter:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "countDocuments": {
    "filter": {
      "year": { "$gt": 2000 }
    }
  }
}' | jq

Parameters:

Name Type Summary

countDocuments

command

A command to return an exact count of documents in a collection.

filter

object

An optional filter to select the documents to count. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

Returns:

A successful response returns count. This is the exact count of the documents counted as requested, unless it exceeds the API-set upper bound, in which case the overflow is reported in the response by the moreData flag.

Response within upper bound
{
  "status": {
    "count": 105
  }
}
Response exceeding upper bound
{
  "status": {
    "moreData": true,
    "count": 1000
  }
}

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, the response includes "moreData": true to indicate that there are more matching documents beyond the count threshold.

If you need to count large numbers of documents, consider using estimatedDocumentCount or dsbulk count instead.

Estimate document count in a collection

Get an approximate document count for an entire collection. Filtering isn’t supported. For the clients, you can set standard options, such as a timeout in milliseconds. There are no other options available.

In the estimatedDocumentCount command’s response, the document count is based on current system statistics at the time the request is received by the database server. Due to potential in-progress updates (document additions and deletions), the actual number of documents in the collection can be lower or higher in the database.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Get an approximate document count for a collection:

collection.estimated_document_count()

Returns:

int - A server-side estimate of the total number of documents in the collection.

Example response
37500

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.collection

collection.estimated_document_count()

For more information, see the Client reference.

Get an approximate document count for a collection:

const estNumDocs = await collection.estimatedDocumentCount();

Returns:

Promise<number> - A promise that resolves to a server-side estimate of the total number of documents in the collection.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  console.log(await collection.estimatedDocumentCount());
})();

For more information, see the Client reference.

Get an approximate document count for a collection:

long estimatedDocumentCount();
long estimatedDocumentCount(EstimatedCountDocumentsOptions options);

Returns:

long - A server-side estimate of the total number of documents in the collection. This estimate is built from the SSTable files.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.EstimatedCountDocumentsOptions;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.internal.command.LoggingCommandObserver;

import static com.datastax.astra.client.model.Filters.lt;

public class EstimateCountDocuments {

    public static void main(String[] args)  {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Count with no filter
        long estimatedCount = collection.estimatedDocumentCount();

        // Count with options (adding a logger)
        EstimatedCountDocumentsOptions options = new EstimatedCountDocumentsOptions()
                    .registerObserver("logger", new LoggingCommandObserver(DataAPIClient.class));
        long estimateCount2 = collection.estimatedDocumentCount(options);
    }


}

Use the estimatedDocumentCount command to get an approximate document count for a collection:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{ "estimatedDocumentCount": {} }' | jq

Returns:

A successful request returns count, which is an estimate of the total number of documents in the collection:

{ "status": { "count": 37500 } }

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com