Find distinct values

Documents represent a single row or record of data in Astra DB Serverless databases.

You use the Collection class to work with documents through the Data API clients. For instructions to get a Collection object, see Work with collections.

For general information about working with documents, including common operations and operators, see the Work with documents.

For more information about the Data API and clients, see Get started with the Data API.

Find distinct values across documents

Get a list of the distinct values of a certain key in a collection.

distinct is a client-side operation, which effectively browses all required documents using the logic of the find command, and then collects the unique values found for key. There can be performance, latency, and billing implications if there are many matching documents.

Sort and filter clauses can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

collection.distinct("category")

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

collection.distinct(
    "food.allergies",
    filter={"registered_for_dinner": True},
)

Parameters:

Name Type Summary

key

str

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: "field", "field.subfield", "field.3", and "field.3.subfield". If lists are encountered and no numeric index is specified, all items in the list are visited.

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default.

Returns:

List[Any] - A list of the distinct values encountered. Documents that lack the requested key are ignored.

Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]

For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many(
    [
        {"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
        {"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
    ]
)

collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]

For more information, see the Client reference.

const unique = await collection.distinct('category');

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

const unique = await collection.distinct(
  'food.allergies',
  { registeredForDinner: true },
);

Parameters:

Name Type Summary

key

string

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: 'field', 'field.subfield', 'field.3', and 'field.3.subfield'. If lists are encountered and no numeric index is specified, all items in the list are visited.

filter?

Filter<Schema>

A filter to select the documents to use. If not provided, all documents will be used.

Returns:

Promise<Flatten<(SomeDoc & ToDotNotation<FoundDoc<Schema>>)[Key]>[]> - A promise which resolves to the unique distinct values.

The return type is mostly accurate, but with complex keys, it may be required to manually cast the return type to the expected type.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertOne({ name: 'Marco', food: ['apple', 'orange'], city: 'Helsinki' });
  await collection.insertOne({ name: 'Emma', food: { likes_fruit: true, allergies: [] } });

  // ['Marco', 'Emma']
  await collection.distinct('name')

  // ['Helsinki']
  await collection.distinct('city')

  // ['apple', 'orange', { likes_fruit: true, allergies: [] }]
  await collection.distinct('food')

  // ['orange']
  await collection.distinct('food.1')

  // []
  await collection.distinct('food.allergies')

  // [true]
  await collection.distinct('food.likes_fruit')
})();

Gets the distinct values of the specified field name.

// Synchronous
DistinctIterable<T,F> distinct(String fieldName, Filter filter, Class<F> resultClass);
DistinctIterable<T,F> distinct(String fieldName, Class<F> resultClass);

// Asynchronous
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Filter filter, Class<F> resultClass);
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Class<F> resultClass);

Parameters:

Name Type Summary

fieldName

String

The name of the field on which project the value.

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

resultClass

Class

The type of the field we are working on

Returns:

DistinctIterable<F> - List of distinct values of the specified field name.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DistinctIterable;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Distinct {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Execute a find operation
        DistinctIterable<Document, String> result = collection
                .distinct("field", String.class);
        DistinctIterable<Document, String> result2 = collection
                .distinct("field", filter, String.class);

        // Iterate over the result
        for (String fieldValue : result) {
            System.out.println(fieldValue);
        }
    }
}

This operation has no literal equivalent in HTTP. Instead, you can use [find-documents-using-filter-options], and then use jq or another utility to extract _id or other desired values from the response.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com