Find distinct values

This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

The Data API tables commands are available through HTTP and the clients.

If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide.

Get a list of the distinct values in a certain column in a table.

A row represents a single record of data in a table in an Astra DB Serverless database.

You use the Table class to work with rows through the Data API clients. For instructions to get a Table object, see Work with tables.

For general information about working with rows, including common operations and operators, see Work with rows.

For more information about the Data API and clients, see Get started with the Data API.

distinct is a client-side operation, which effectively browses all required rows using the logic of the find command, and then collects the unique values found for the given key. There can be performance, latency, and billing implications if there are many matching rows.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Use a specified filter condition to list distinct values for a column:

my_table.distinct("winner", filter={"match_id": "challenge6"})

Parameters:

Name Type Summary

key

str

The name of the field whose value is inspected across rows. Keys are typically just column names, although they can use dot notation to select particular entries in map columns. For set and list columns, individual entries are unrolled automatically. Particularly for lists, numeric indices can be used in the key dot-notation syntax. Examples of acceptable key values include "a_column", "map_column.map_key", and "list_column.2".

filter

dict

A dictionary expressing which condition the inspected rows must satisfy. You can use filter operators to compare columns with literal values. For example:

  • {"match_no": 123}: Uses the implied equality ($eq) operator. Shorthand for {"match_no": {"$eq": 123}}.

  • {"match_no": 123, "round": "C"}: Uses the implied equality operator and combines the two conditions with an implicit $and.

  • {}: An empty filter (all rows).

general_method_timeout_ms

int | None

A timeout, in milliseconds, to impose on the underlying API request. If not provided, the Table defaults apply. This parameter is aliased as timeout_ms for convenience.

request_timeout_ms

int | None

A timeout, in milliseconds, for each API request performed while scrolling through matches (see find). If not provided, this object’s defaults apply.

Example:

Full example script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
    ],
)

# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']

# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...

# Over a column containing null values
# (also with composite filter):
my_table.distinct(
    "score",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]

# distinct over a set column (automatically "unrolled"):
my_table.distinct(
    "fighters",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...
# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']

# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...

# Over a column containing null values
# (also with composite filter):
my_table.distinct(
    "score",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]

# distinct over a set column (automatically "unrolled"):
my_table.distinct(
    "fighters",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...

The TypeScript client doesn’t have a distinct operation for tables.

Use a filter condition to find rows and then list distinct values in a given column:

// Find with no options
Filter filter = Filters.eq("match_id", "challenge6");

// Show you the distinct match id in the table
List<String> winners = table
 .distinct("winner", filter, String.class);

// Adding an Options
TableDistinctOptions options = new TableDistinctOptions()
  .timeout(1000);
List<String> winners2 = table
 .distinct("winner", filter, String.class, options);

Parameters:

Name Type Summary

fieldName

String

The name of the field on which perform the distinct

filter

Filter

A filter expressing which condition the inspected rows must satisfy. You can use filter operators to compare columns with literal values. Filters can be instantiated with its constructor and specialized with method where(..) or leverage the class Filters.

columnClass

Class<?>

Represents the expected type of the column returned. For example, if the target column is text or ascii, then the returned value type is String.class.

options

TableDistinctOptions

A wrapper for the different options and specialization of this search.

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Filters;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableDistinctOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.List;

import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;

public class Distinct {
 public static void main(String[] args) {
   Database db = new DataAPIClient("token").getDatabase("endpoint");

   Table<Row> table = db.getTable("games");

   // Show you all match id in the table
   List<Row> matches = table.find(null, new TableFindOptions()
      .projection(include("match_id"))).toList();
     matches.forEach(System.out::println);

     Filter filter = Filters.eq("match_id", "challenge6");

     TableDistinctOptions options = new TableDistinctOptions()
             .timeout(1000);
    // Show you the distinct match id in the table
     table.distinct("match_id", filter, String.class)
             .forEach(System.out::println);
 }
}

This operation has no literal equivalent in HTTP. Instead, you can use find, and then use projection and a utility like jq to extract specific values from the response.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com