Find distinct values
This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms. The Data API tables commands are available through HTTP and the clients. If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide. |
Get a list of the distinct values in a certain column in a table.
A row represents a single record of data in a table in an Astra DB Serverless database.
You use the Table
class to work with rows through the Data API clients.
For instructions to get a Table
object, see Work with tables.
For general information about working with rows, including common operations and operators, see Work with rows.
For more information about the Data API and clients, see Get started with the Data API.
|
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Use a specified filter condition to list distinct values for a column:
my_table.distinct("winner", filter={"match_id": "challenge6"})
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field whose value is inspected across rows.
Keys are typically just column names, although they can use dot notation to select particular entries in |
|
|
A dictionary expressing which condition the inspected rows must satisfy. You can use filter operators to compare columns with literal values. For example:
|
|
|
A timeout, in milliseconds, to impose on the underlying API request.
If not provided, the |
|
|
A timeout, in milliseconds, for each API request performed while scrolling through matches (see |
Example:
Full example script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
from astrapy.constants import SortMode
from astrapy.info import (
CreateTableDefinition,
ColumnType,
)
my_table = database.create_table(
"games",
definition=(
CreateTableDefinition.builder()
.add_column("match_id", ColumnType.TEXT)
.add_column("round", ColumnType.TINYINT)
.add_vector_column("m_vector", dimension=3)
.add_column("score", ColumnType.INT)
.add_column("when", ColumnType.TIMESTAMP)
.add_column("winner", ColumnType.TEXT)
.add_set_column("fighters", ColumnType.UUID)
.add_partition_by(["match_id"])
.add_partition_sort({"round": SortMode.ASCENDING})
.build()
),
)
from astrapy.data_types import (
DataAPISet,
DataAPITimestamp,
DataAPIVector,
)
from astrapy.ids import UUID
insert_result = my_table.insert_many(
[
{
"match_id": "fight4",
"round": 1,
"winner": "Victor",
"score": 18,
"when": DataAPITimestamp.from_string(
"2024-11-28T11:30:00Z",
),
"fighters": DataAPISet([
UUID("0193539a-2770-8c09-a32a-111111111111"),
UUID('019353e3-00b4-83f9-a127-222222222222'),
]),
"m_vector": DataAPIVector([0.4, -0.6, 0.2]),
},
{
"match_id": "challenge6",
"round": 1,
"winner": "Donna",
"m_vector": [0.9, -0.1, -0.3],
},
{"match_id": "challenge6", "round": 2, "winner": "Erick"},
{"match_id": "challenge6", "round": 3, "winner": "Fiona"},
],
)
# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']
# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...
# Over a column containing null values
# (also with composite filter):
my_table.distinct(
"score",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]
# distinct over a set column (automatically "unrolled"):
my_table.distinct(
"fighters",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...
# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']
# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...
# Over a column containing null values
# (also with composite filter):
my_table.distinct(
"score",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]
# distinct over a set column (automatically "unrolled"):
my_table.distinct(
"fighters",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...
The TypeScript client doesn’t have a distinct
operation for tables.
Use a filter condition to find rows and then list distinct values in a given column:
// Find with no options
Filter filter = Filters.eq("match_id", "challenge6");
// Show you the distinct match id in the table
List<String> winners = table
.distinct("winner", filter, String.class);
// Adding an Options
TableDistinctOptions options = new TableDistinctOptions()
.timeout(1000);
List<String> winners2 = table
.distinct("winner", filter, String.class, options);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field on which perform the distinct |
|
A filter expressing which condition the inspected rows must satisfy.
You can use filter operators to compare columns with literal values.
Filters can be instantiated with its constructor and specialized with method |
|
|
|
Represents the expected type of the column returned.
For example, if the target column is |
|
A wrapper for the different options and specialization of this search. |
Example:
package com.datastax.astra.client.tables;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Filters;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableDistinctOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.List;
import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;
public class Distinct {
public static void main(String[] args) {
Database db = new DataAPIClient("token").getDatabase("endpoint");
Table<Row> table = db.getTable("games");
// Show you all match id in the table
List<Row> matches = table.find(null, new TableFindOptions()
.projection(include("match_id"))).toList();
matches.forEach(System.out::println);
Filter filter = Filters.eq("match_id", "challenge6");
TableDistinctOptions options = new TableDistinctOptions()
.timeout(1000);
// Show you the distinct match id in the table
table.distinct("match_id", filter, String.class)
.forEach(System.out::println);
}
}
This operation has no literal equivalent in HTTP.
Instead, you can use find
, and then use projection
and a utility like jq
to extract specific values from the response.