Find rows reference
This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms. The Data API tables commands are available through HTTP and the clients. If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide. |
A row represents a single record of data in a table in an Astra DB Serverless database.
You use the Table
class to work with rows through the Data API clients.
For instructions to get a Table
object, see Work with tables.
For general information about working with rows, including common operations and operators, see Work with rows.
For more information about the Data API and clients, see Get started with the Data API.
Find a row
Retrieve a single row from a table using various filter and query options.
For best performance, filter and sort on indexed columns, partition keys, and clustering keys. Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete. An empty filter ( Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications. |
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Use find_one
to retrieve a single row matching your filter
and sort
criteria.
Retrieve a single row from a table that matches a given column and value:
my_table.find_one({"winner": "Caio Gozer"})
The previous example is shorthand for the $eq
(equals) operator.
There are other filter operators you can use, such as the $gte
(greater than or equal to) operator:
my_table.find_one({"score": {"$gte": 15}})
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
# Provide a pre-generated query vector
my_table.find_one({}, sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])})
# Generate a query vector with vectorize
my_table.find_one({}, sort={"m_vector": "Text to vectorize"})
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A dictionary expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. For example:
You cannot filter on To perform a vector search, use |
|
|
This dictionary parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches.
The |
|
|
Select a subset of columns to include in the response for the returned row:
DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see Projection clauses. |
|
|
If true, the returned row includes a |
|
|
A timeout, in milliseconds, to impose on the underlying API request.
If not provided, the |
Returns:
dict | None
- A dictionary expressing the result if a row is found, otherwise None
.
Example response
The following example response is reformatted for clarity:
{
'match_id': 'challenge6',
'round': 1,
'fighters': DataAPISet([]),
'm_vector': DataAPIVector([0.9, -0.1, -0.3]),
'score': None,
'when': None,
'winner': 'Donna'
}
Example:
Full script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
from astrapy.constants import SortMode
from astrapy.info import (
CreateTableDefinition,
ColumnType,
)
my_table = database.create_table(
"games",
definition=(
CreateTableDefinition.builder()
.add_column("match_id", ColumnType.TEXT)
.add_column("round", ColumnType.TINYINT)
.add_vector_column("m_vector", dimension=3)
.add_column("score", ColumnType.INT)
.add_column("when", ColumnType.TIMESTAMP)
.add_column("winner", ColumnType.TEXT)
.add_set_column("fighters", ColumnType.UUID)
.add_partition_by(["match_id"])
.add_partition_sort({"round": SortMode.ASCENDING})
.build()
),
)
from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions
my_table.create_index(
"score_index",
column="score",
)
my_table.create_index(
"winner_index",
column="winner",
options=TableIndexOptions(
ascii=False,
normalize=True,
case_sensitive=False,
),
)
my_table.create_vector_index(
"m_vector_index",
column="m_vector",
options=TableVectorIndexOptions(
metric=VectorMetric.DOT_PRODUCT,
),
)
from astrapy.data_types import (
DataAPISet,
DataAPITimestamp,
DataAPIVector,
)
from astrapy.ids import UUID
insert_result = my_table.insert_many(
[
{
"match_id": "fight4",
"round": 1,
"winner": "Victor",
"score": 18,
"when": DataAPITimestamp.from_string(
"2024-11-28T11:30:00Z",
),
"fighters": DataAPISet([
UUID("0193539a-2770-8c09-a32a-111111111111"),
UUID('019353e3-00b4-83f9-a127-222222222222'),
]),
"m_vector": DataAPIVector([0.4, -0.6, 0.2]),
},
{
"match_id": "challenge6",
"round": 1,
"winner": "Donna",
"m_vector": [0.9, -0.1, -0.3],
},
{"match_id": "challenge6", "round": 2, "winner": "Erick"},
{"match_id": "challenge6", "round": 3, "winner": "Fiona"},
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
],
)
from astrapy.constants import SortMode
from astrapy.data_types import DataAPITimestamp, DataAPIVector
# Filter on the partitioning:
my_table.find_one({"match_id": "challenge6"})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# A find with no matches:
str(my_table.find_one({"match_id": "not_real"}))
# 'None'
# Optimize bandwidth using a projection:
my_table.find_one({"match_id": "challenge6"}, projection={"round": True, "winner": True})
# {'round': 1, 'winner': 'Donna'}
# Filter on primary key:
my_table.find_one({"match_id": "challenge6", "round": 1})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on a regular indexed column:
my_table.find_one({"winner": "Caio Gozer"})
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Non-equality filter on a regular indexed column:
my_table.find_one({"score": {"$gte": 15}})
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
{"when": {
"$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
}}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Empty filter:
my_table.find_one({})
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find_one({"round": 3, "winner": "Caio Gozer"})
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find_one(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
)
# {'winner': 'Donna'}
# Hybrid search with vector sort and non-vector filtering:
my_table.find_one(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
)
# {'winner': 'Victor'}
# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find_one(
{},
sort={"m_vector": [0.2, 0.3, 0.4]},
projection={"winner": True},
include_similarity=True,
)
# {'winner': 'Donna', '$similarity': 0.515}
# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
)
# {'winner': 'Caio Gozer'}
# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find_one(
{"match_id": "fight5"},
sort={"winner": SortMode.ASCENDING},
projection={"winner": True},
)
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# {'winner': 'Adam Zuul'}
from astrapy.constants import SortMode
from astrapy.data_types import DataAPITimestamp, DataAPIVector
# Filter on the partitioning:
my_table.find_one({"match_id": "challenge6"})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# A find with no matches:
str(my_table.find_one({"match_id": "not_real"}))
# 'None'
# Optimize bandwidth using a projection:
my_table.find_one({"match_id": "challenge6"}, projection={"round": True, "winner": True})
# {'round': 1, 'winner': 'Donna'}
# Filter on primary key:
my_table.find_one({"match_id": "challenge6", "round": 1})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on a regular indexed column:
my_table.find_one({"winner": "Caio Gozer"})
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Non-equality filter on a regular indexed column:
my_table.find_one({"score": {"$gte": 15}})
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
{"when": {
"$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
}}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Empty filter:
my_table.find_one({})
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find_one({"round": 3, "winner": "Caio Gozer"})
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find_one(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
)
# {'winner': 'Donna'}
# Hybrid search with vector sort and non-vector filtering:
my_table.find_one(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
)
# {'winner': 'Victor'}
# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find_one(
{},
sort={"m_vector": [0.2, 0.3, 0.4]},
projection={"winner": True},
include_similarity=True,
)
# {'winner': 'Donna', '$similarity': 0.515}
# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
)
# {'winner': 'Caio Gozer'}
# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find_one(
{"match_id": "fight5"},
sort={"winner": SortMode.ASCENDING},
projection={"winner": True},
)
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# {'winner': 'Adam Zuul'}
For more information, see the Client reference.
Use findOne
to retrieve a single row matching your filter
and sort
criteria.
Retrieve a single row from a table that matches a given column and value:
await table.findOne({ winner: "Gray Tist" });
The previous example is shorthand for the $eq
(equals) operator.
There are other filter operators you can use, such as the $gte
(greater than or equal to) operator:
await table.findOne({ winner: { $gte: 15 } });
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
// Provide a pre-generated query vector
await table.findOne({}, { sort: { mVector: vector([.2, .3, .4]) } });
// Generate a query vector with vectorize
await table.findOne({}, { sort: { mVector: 'text to vectorize' } });
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An object that defines filter criteria using the Data API filter syntax. Use filter operators to compare columns with literal values. For more information and examples, see Data API operators and Find rows using filter clauses. You cannot filter on To perform a vector search, use |
|
|
The options for this operation |
Options (TableFindOneOptions
):
Name | Type | Summary |
---|---|---|
|
|
The |
|
|
Select a subset of columns to include in the response for the returned row:
If empty or unspecified, the default projection (all columns) is used. DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as Additionally, DataStax recommends providing your own type for the returned row because projections can break typing guarantees.
If your query includes For more information and examples, see Projection clauses. |
|
|
If true, the returned row includes a If your query includes |
|
|
The client-side timeout for this operation. |
Returns:
Promise<RSchema | null>
- A promise which resolves to the result if a row is found, otherwise null
.
Example response
{
match_id: "challenge6",
round: 1,
fighters: Set([]),
mVector: DataAPIVector([0.9, -0.1, -0.3]),
score: null,
when: null,
winner: "Donna",
}
Example:
Full script
import { CreateTableDefinition, DataAPIClient, InferTablePrimaryKey, InferTableSchema, timestamp, uuid, vector } from '@datastax/astra-db-ts';
// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });
// Create table schema using bespoke Data API table definition syntax, and then infer the type.
// For information about table typing and definitions, see the documentation for createTable.
const TableDefinition = <const>{
columns: {
matchId: 'text'
round: 'tinyint',
mVector: { type: 'vector', dimension: 3 },
score: 'int',
when: 'timestamp',
winner: 'text',
fighters: { type: 'set', valueType: 'uuid' },
},
primaryKey: {
partitionBy: ['matchId'],
partitionSort: { round: 1 },
},
} satisfies CreateTableDefinition;
type TableSchema = InferTableSchema<typeof TableDefinition>;
(async function () {
// Create a table with the given TableSchema type if a 'games' table doesn't already exist
const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });
// Insert some rows in an unordered fashion.
await table.insertMany([
{ matchId: 'fight4', round: 1, winner: 'Victor', score: 18, when: timestamp('2024-11-28T11:30:00Z'), fighters: new Set([UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-83f9-a127-222222222222')]), mVector: vector([0.4, -0.6, 0.2]) },
{ matchId: 'challenge6', round: 1, winner: 'Donna', mVector: vector([0.9, -0.1, -0.3]) },
{ matchId: 'challenge6', round: 2, winner: 'Erick' },
{ matchId: 'challenge6', round: 3, winner: 'Fiona' },
{ matchId: 'fight5', round: 3, winner: 'Caio Gozer' },
]);
// Create a secondary index on the 'score' column if it doesn't already exist
await table.createIndex('round_idx', 'round', { ifNotExists: true });
// Create a secondary index on the 'winner' column with case-insensitivity if it doesn't already exist
await table.createIndex('winner_idx', 'winner', {
options: {
caseSensitive: false,
},
ifNotExists: true,
});
// Create a vector index on the 'mVector' column if it doesn't already exist
await table.createVectorIndex('m_vector_idx', 'mVector', {
options: {
metric: 'dot_product',
},
});
// Use findOne and find to query rows in a table, including vector search.
// findOne examples
// Find a row with an exact match on one column.
// Inherently uses the equality ($eq) filter operator.
await table.findOne({ 'matchId': 'challenge6' }).then(console.log);
// If there is no match, the response is null.
await table.findOne({ 'matchId': 'not_real' }).then(console.log);
// Projections optimize bandwidth by returning specified columns instead of the entire row.
// Specify the exact return type to prevent accidental type errors.
// This example prints the values for 'round' and 'winner' in the matching row.
await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
projection: { round: 1, winner: 1 },
}).then(console.log);
// Find a row using other filter operators on other indexed columns.
await table.findOne({ score: { $gte: 15 } }).then(console.log);
// (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
// The Data API returns a warning, if you are listening for or logging warnings.
await table.findOne({ score: { $gt: timestamp() } });
// To get any row, pass an empty filter.
await table.findOne({}).then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.findOne<{ winner: string }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 }
}).then(console.log);
// includeSimilarity returns the similarity score for the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.findOne<{ winner: string, $similarity: number }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
includeSimilarity: true,
}).then(console.log);
// You can also use sort for regular ascending/descending sorts on a given column.
await table.findOne({ matchId: 'round5' }, {
sort: { round: -1 },
}).then(console.log);
// find examples
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;
for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);
// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);
// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
.sort({ mVector: vector([.2, .3, .4]) })
.project<{ winner: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
.sort({ mVector: vector([.2, .3, .4]) })
.includeSimilarity(true)
.project<{ winner: number, $similarity: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
.sort({ round: -1 })
.toArray()
.then(console.log);
// You can also use mapping.
await table.find({ matchId: 'fight5' })
.sort({ round: -1 })
.limit(5)
.map(row => row.winner.toUpperCase())
.toArray()
.then(console.log);
// Uncomment the following line to drop the table and any related indexes.
// await table.drop();
})();
// Find a row with an exact match on one column.
// Inherently uses the equality ($eq) filter operator.
await table.findOne({ 'matchId': 'challenge6' }).then(console.log);
// If there is no match, the response is null.
await table.findOne({ 'matchId': 'not_real' }).then(console.log);
// Projections optimize bandwidth by returning specified columns instead of the entire row.
// Specify the exact return type to prevent accidental type errors.
// This example prints the values for 'round' and 'winner' in the matching row.
await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
projection: { round: 1, winner: 1 },
}).then(console.log);
// Find a row using other filter operators on other indexed columns.
await table.findOne({ score: { $gte: 15 } }).then(console.log);
// (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
// The Data API returns a warning, if you are listening for or logging warnings.
await table.findOne({ score: { $gt: timestamp() } });
// To get any row, pass an empty filter.
await table.findOne({}).then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.findOne<{ winner: string }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 }
}).then(console.log);
// includeSimilarity returns the similarity score for the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.findOne<{ winner: string, $similarity: number }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
includeSimilarity: true,
}).then(console.log);
// You can also use sort for regular ascending/descending sorts on a given column.
await table.findOne({ matchId: 'round5' }, {
sort: { round: -1 },
}).then(console.log);
For more information, see the Client reference.
Use findOne
to retrieve a single row matching your filter
and sort
criteria.
Find one row based on an equality predicate on a column (shorthand for the $eq
operator):
// Filter can be built with a where clause
Filter f1 = new Filter()
.where("winner")
.isEqualsTo("Caio Gozer");
// Class Filters provides a shorthand for the above
Optional<Row> row = myTable
.findOne(Filters.eq("winner", "Caio Gozer"));
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
// Given a vector
DataAPIVector vector =
new DataAPIVector(new float[] {0.2f, 0.3f, 0.4f});
// Perform a vector search
myTable.findOne(new TableFindOneOptions()
.sort(Sort.vector("m_vector", vector)));
There are other filter operators you can use, such as the $gte
(greater than or equal to) operator, as well as other findOne
options:
// Import to shorten the code
import static com.datastax.astra.client.core.query.Filters.*;
import static com.datastax.astra.client.core.query.Projection.include;
// Filters can combine multiple conditions
Filter filter = and(
eq("match_id", "match_0"),
gt("round", 1),
eq("winner", "Victor"));
// Some options can be added to the findOne
TableFindOneOptions options = new TableFindOneOptions()
// Include only these columns in the response
.projection(include("match_id", "winner", "winner"))
// Perform a vector similarity search
.sort(Sort.vector("m_vector", new float[] {0.4f, -0.6f, 0.2f}))
// Return the similarity score of the vector search
.includeSimilarity(true);
Optional<Row> row = myTable
.findOne(filter, options);
You can use dynamic object mapping instead of the default
You can use the
|
Parameters:
Name | Type | Summary |
---|---|---|
|
A filter expressing which condition the returned row must satisfy.
You can use filter operators to compare columns with literal values.
Filters can be instantiated with its constructor and specialized with method You cannot filter on To perform a vector search, use |
|
|
A wrapper for the different options and specialization of this search. |
|
|
|
This parameter acts a formal specifier for the type checker.
If omitted, the resulting cursor is implicitly an |
Name | Type | Summary |
---|---|---|
|
This parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches.
The |
|
|
Select a subset of columns to include in the response for the returned row:
If empty or unspecified, the default projection (return all columns) is used. DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see Projection clauses. |
|
|
|
If true, the returned row includes a |
|
|
A timeout, in milliseconds (long), to impose on the underlying API request. If not provided, the Table defaults apply. |
Returns:
Optional<R>
- An optional object of type R
that represents the result, if a row is found, otherwise it is Optional.empty()
.
If rowClass
is not specified, then the type R
is the same type as that of the rows in the table.
Example:
package com.datastax.astra.client.tables;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOneOptions;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.mapping.Column;
import lombok.Data;
import java.util.Optional;
import static com.datastax.astra.client.core.query.Filters.and;
import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Filters.gt;
public class FindOne {
public static void main(String[] args) {
Database db = new DataAPIClient("token").getDatabase("endpoint");
Table<Row> tableRow = db.getTable("games");
Filter filter = and(
eq("match_id", "mtch_0"),
gt("round", 1),
eq("winner", "Victor"));
TableFindOneOptions options = new TableFindOneOptions()
.sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
.includeSimilarity(true);
// Find a row
Optional<Row> row = tableRow.findOne(filter, options);
row.ifPresent(r -> {
System.out.println("Row: " + r);
DataAPIVector v = r.getVector("m_vector");
System.out.println(r.getInstant("when"));
});
// Find a game by match_id
Table<Game> tableGame = db.getTable("games", Game.class);
Optional<Game> row2 = tableGame.findOne(filter, options);
row2.ifPresent(game -> {
System.out.println("game: " + game.getVector());
System.out.println(game.getFighters());
System.out.println(game.getMatchId());
});
// Returning another type because you project
TableFindOneOptions optionsProjection = new TableFindOneOptions()
.projection(Projection.include("match_id", "winner", "field3"))
.sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
.includeSimilarity(true);
Optional<MiniGame> mini = tableRow.findOne(filter, optionsProjection, MiniGame.class);
mini.ifPresent(m -> {
System.out.println("MiniGame: " + m.getWinner());
System.out.println("MiniGame: " + m.getMatchId());
});
Optional<MiniGame> mini2 = tableGame.findOne(filter, optionsProjection, MiniGame.class);
mini.ifPresent(m -> {
System.out.println("MiniGame 2: " + m.getWinner());
System.out.println("MiniGame 2: " + m.getMatchId());
});
}
@Data
public static class MiniGame {
@Column(name = "match_id")
private String matchId;
private String winner;
}
}
Use findOne
to retrieve a single row matching your filter
and sort
criteria.
Retrieve a single row from a table that matches a given column and value:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"findOne": {
"filter": {
"COLUMN_NAME": "FILTER_VALUE"
}
}
}' | jq
The previous example is shorthand for the $eq
(equals) operator.
There are other filter operators you can use, such as the $gt
(greater than) operator:
"findOne": {
"filter": { "year": { "$gt": 2000 } }
}
For more filter operator examples, see Find rows using filter clauses.
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
# Provide the query vector as an array
"findOne": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] }
}
# Provide the query vector with $binary
"findOne": {
"sort": { "vect_emb": { "$binary": "PczMzb5MzM0+mZma" } }
}
# Generate a query vector with vectorize
"findOne": {
"sort": { "vect_emb": "Text to vectorize" }
}
# Perform a vector search and include the similarity score in the response
"findOne": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
"options": {
"includeSimilarity": true
}
}
To perform a hybrid search, use both sort
and filter
:
"findOne": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
"filter": { "year": { "$gt": 2000 } }
}
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The Data API command to retrieve a row in a table based on one or more of |
|
|
An object that defines filter criteria using the Data API filter syntax. Use filter operators to compare columns with literal values. For more information and examples, see the preceding syntax examples, Data API operators, and Find rows using filter clauses. You cannot filter on To perform a vector search, use |
|
|
Perform a vector similarity search or set the sequence of matches. For more information and examples, see Sort clauses and Vector type. |
|
|
Select a subset of columns to include in the response for the returned row:
If empty or unset, the default projection is used.
The default projection includes all columns, but it omits Over HTTP, the response always omits DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see Projection clauses. |
|
|
If true, the response includes a |
Returns:
A successful response includes a document
object that describes the matching row.
The contents of document
depends on projection
.
The response always omits columns with null
values, regardless of projection
.
Example response
"data": {
"document": {
"name": "Tal Noor",
"email": "tal@example.com",
"graduated": true,
"graduation_year": 2014
}
}
If your query used a non-indexed column, the response can include a warning that the query used allow filtering, which is inefficient and potentially resource-intensive.
Example:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/students" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"findOne": {
"filter": { "email": "tal@example.com" }
}
}' | jq
Find rows using filter clauses
Find multiple rows that match a query.
For best performance, filter and sort on indexed columns, partition keys, and clustering keys. Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete. An empty filter ( Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications. |
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Use find
to retrieve rows that match your filter
and sort
criteria.
Run a find with a non-vector filter condition:
my_table.find({"match_id": "challenge6"})
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
Run a vector search and get an iterable over the returned results:
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
limit=3,
)
Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
).to_list()
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A dictionary expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. For example:
You cannot filter on To perform a vector search, use |
|
|
This dictionary parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches.
The |
|
|
Select a subset of columns to include in the response for the returned row:
DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see Projection clauses. |
|
|
This parameter acts a formal specifier for the type checker.
If omitted, the resulting cursor is implicitly a |
|
|
Optionally specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned from the table.
The returned cursor stops yielding rows either when it reaches the |
|
|
If true, the returned rows include a |
|
|
If true, you can call the You can’t use |
|
|
A timeout, in milliseconds, to impose on each individual HTTP request to the Data API to accomplish the operation. If not provided, the Table defaults apply. This parameter is aliased as |
Returns:
TableFindCursor
- An object of type astrapy.cursors.TableFindCursor
, representing the stream of results.
You can manipulate a TableFindCursor
in various ways.
Typically, it is iterated over and yields the search results, managing pagination discretely as needed.
For more information, see FindCursor.
Invoking .to_list()
on a TableFindCursor causes it to consume all rows, and then materialize the entire set of results as the returned list.
This is not recommended for find
operations that can return a large number of results.
While you iterate over a cursor, rows are retrieved in chunks progressively. It is possible for retrieved chunks to reflect real-time changes (inserts, updates, and deletions) on the table. |
Example response
TableFindCursor("games", idle, consumed so far: 0)
Example:
Full script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
from astrapy.constants import SortMode
from astrapy.info import (
CreateTableDefinition,
ColumnType,
)
my_table = database.create_table(
"games",
definition=(
CreateTableDefinition.builder()
.add_column("match_id", ColumnType.TEXT)
.add_column("round", ColumnType.TINYINT)
.add_vector_column("m_vector", dimension=3)
.add_column("score", ColumnType.INT)
.add_column("when", ColumnType.TIMESTAMP)
.add_column("winner", ColumnType.TEXT)
.add_set_column("fighters", ColumnType.UUID)
.add_partition_by(["match_id"])
.add_partition_sort({"round": SortMode.ASCENDING})
.build()
),
)
from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions
my_table.create_index(
"score_index",
column="score",
)
my_table.create_index(
"winner_index",
column="winner",
options=TableIndexOptions(
ascii=False,
normalize=True,
case_sensitive=False,
),
)
my_table.create_vector_index(
"m_vector_index",
column="m_vector",
options=TableVectorIndexOptions(
metric=VectorMetric.DOT_PRODUCT,
),
)
from astrapy.data_types import (
DataAPISet,
DataAPITimestamp,
DataAPIVector,
)
from astrapy.ids import UUID
insert_result = my_table.insert_many(
[
{
"match_id": "fight4",
"round": 1,
"winner": "Victor",
"score": 18,
"when": DataAPITimestamp.from_string(
"2024-11-28T11:30:00Z",
),
"fighters": DataAPISet([
UUID("0193539a-2770-8c09-a32a-111111111111"),
UUID('019353e3-00b4-83f9-a127-222222222222'),
]),
"m_vector": DataAPIVector([0.4, -0.6, 0.2]),
},
{
"match_id": "challenge6",
"round": 1,
"winner": "Donna",
"m_vector": [0.9, -0.1, -0.3],
},
{"match_id": "challenge6", "round": 2, "winner": "Erick"},
{"match_id": "challenge6", "round": 3, "winner": "Fiona"},
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
],
)
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector
# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"when": {
"$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
}}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]
# Hybrid search with vector sort and non-vector filtering:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Victor'}]
# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
{},
sort={"m_vector": [0.2, 0.3, 0.4]},
projection={"winner": True},
limit=3,
include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...
# Non-vector sorting on a 'partitionSort' column:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...
# Using skip
and limit
:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
skip=1,
limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]
# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5"},
sort={"winner": SortMode.ASCENDING},
projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...
# Using .map()
on a cursor:
winner_cursor = my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL
# Some other examples of cursor manipulation
matches_cursor = my_table.find(
sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
next(matches_cursor)
except StopIteration:
print("StopIteration triggered.")
# StopIteration triggered.
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector
# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"when": {
"$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
}}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]
# Hybrid search with vector sort and non-vector filtering:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Victor'}]
# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
{},
sort={"m_vector": [0.2, 0.3, 0.4]},
projection={"winner": True},
limit=3,
include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...
# Non-vector sorting on a 'partitionSort' column:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...
# Using skip
and limit
:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
skip=1,
limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]
# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5"},
sort={"winner": SortMode.ASCENDING},
projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...
# Using .map()
on a cursor:
winner_cursor = my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL
# Some other examples of cursor manipulation
matches_cursor = my_table.find(
sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
next(matches_cursor)
except StopIteration:
print("StopIteration triggered.")
# StopIteration triggered.
For more information, see the Client reference.
Use find
to retrieve multiple rows matching your filter
and sort
criteria.
Find all rows matching a basic equality filter:
await table.find({ matchId: 'challenge6' });
The previous example is shorthand for the $eq
(equals) operator.
There are other filter operators you can use, such as the $gte
(greater than or equal to) operator:
await table.find({ winner: { $gte: 15 } });
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
// Provide a pre-generated query vector
await table.find({}, { sort: { mVector: vector([.2, .3, .4]) } });
// Generate a query vector with vectorize
await table.find({}, { sort: { mVector: 'text to vectorize' } });
To run a hybrid search, combine a filter with a vector search (this example also uses a projection):
table.find<{ winner: string }>({ matchId: 'fight4' }, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
});
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An object that defines filter criteria using the Data API filter syntax. For more information and examples, see Data API operators and Find rows using filter clauses. You cannot filter on To perform a vector search, use |
|
|
The options for this operation |
Options (TableFindOptions
):
Name | Type | Summary |
---|---|---|
|
|
The |
|
|
Select a subset of columns to include in the response for the returned rows:
If empty or unspecified, the default projection (all columns) is used. DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as Additionally, DataStax recommends providing your own type for the returned rows because projections can break typing guarantees.
If your query includes For more information and examples, see Projection clauses. |
|
|
Optionally specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned from the table.
The returned cursor stops yielding rows either when it reaches the |
|
|
If true, the returned rows include a If your query includes |
|
|
If true, you can call the You can’t use |
|
|
The client-side timeout for this operation. |
Returns:
TableFindCursor<RSchema>
- The FindCursor
, which you can configure and iterate over lazily.
Example:
Full script
import { CreateTableDefinition, DataAPIClient, InferTablePrimaryKey, InferTableSchema, timestamp, uuid, vector } from '@datastax/astra-db-ts';
// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });
// Create table schema using bespoke Data API table definition syntax, and then infer the type.
// For information about table typing and definitions, see the documentation for createTable.
const TableDefinition = <const>{
columns: {
matchId: 'text'
round: 'tinyint',
mVector: { type: 'vector', dimension: 3 },
score: 'int',
when: 'timestamp',
winner: 'text',
fighters: { type: 'set', valueType: 'uuid' },
},
primaryKey: {
partitionBy: ['matchId'],
partitionSort: { round: 1 },
},
} satisfies CreateTableDefinition;
type TableSchema = InferTableSchema<typeof TableDefinition>;
(async function () {
// Create a table with the given TableSchema type if a 'games' table doesn't already exist
const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });
// Insert some rows in an unordered fashion.
await table.insertMany([
{ matchId: 'fight4', round: 1, winner: 'Victor', score: 18, when: timestamp('2024-11-28T11:30:00Z'), fighters: new Set([UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-83f9-a127-222222222222')]), mVector: vector([0.4, -0.6, 0.2]) },
{ matchId: 'challenge6', round: 1, winner: 'Donna', mVector: vector([0.9, -0.1, -0.3]) },
{ matchId: 'challenge6', round: 2, winner: 'Erick' },
{ matchId: 'challenge6', round: 3, winner: 'Fiona' },
{ matchId: 'fight5', round: 3, winner: 'Caio Gozer' },
]);
// Create a secondary index on the 'score' column if it doesn't already exist
await table.createIndex('round_idx', 'round', { ifNotExists: true });
// Create a secondary index on the 'winner' column with case-insensitivity if it doesn't already exist
await table.createIndex('winner_idx', 'winner', {
options: {
caseSensitive: false,
},
ifNotExists: true,
});
// Create a vector index on the 'mVector' column if it doesn't already exist
await table.createVectorIndex('m_vector_idx', 'mVector', {
options: {
metric: 'dot_product',
},
});
// Use findOne and find to query rows in a table, including vector search.
// findOne examples
// Find a row with an exact match on one column.
// Inherently uses the equality ($eq) filter operator.
await table.findOne({ 'matchId': 'challenge6' }).then(console.log);
// If there is no match, the response is null.
await table.findOne({ 'matchId': 'not_real' }).then(console.log);
// Projections optimize bandwidth by returning specified columns instead of the entire row.
// Specify the exact return type to prevent accidental type errors.
// This example prints the values for 'round' and 'winner' in the matching row.
await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
projection: { round: 1, winner: 1 },
}).then(console.log);
// Find a row using other filter operators on other indexed columns.
await table.findOne({ score: { $gte: 15 } }).then(console.log);
// (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
// The Data API returns a warning, if you are listening for or logging warnings.
await table.findOne({ score: { $gt: timestamp() } });
// To get any row, pass an empty filter.
await table.findOne({}).then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.findOne<{ winner: string }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 }
}).then(console.log);
// includeSimilarity returns the similarity score for the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.findOne<{ winner: string, $similarity: number }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
includeSimilarity: true,
}).then(console.log);
// You can also use sort for regular ascending/descending sorts on a given column.
await table.findOne({ matchId: 'round5' }, {
sort: { round: -1 },
}).then(console.log);
// find examples
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;
for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);
// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);
// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
.sort({ mVector: vector([.2, .3, .4]) })
.project<{ winner: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
.sort({ mVector: vector([.2, .3, .4]) })
.includeSimilarity(true)
.project<{ winner: number, $similarity: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
.sort({ round: -1 })
.toArray()
.then(console.log);
// You can also use mapping.
await table.find({ matchId: 'fight5' })
.sort({ round: -1 })
.limit(5)
.map(row => row.winner.toUpperCase())
.toArray()
.then(console.log);
// Uncomment the following line to drop the table and any related indexes.
// await table.drop();
})();
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;
for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);
// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);
// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
.sort({ mVector: vector([.2, .3, .4]) })
.project<{ winner: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
.sort({ mVector: vector([.2, .3, .4]) })
.includeSimilarity(true)
.project<{ winner: number, $similarity: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
.sort({ round: -1 })
.toArray()
.then(console.log);
// You can also use mapping.
await table.find({ matchId: 'fight5' })
.sort({ round: -1 })
.limit(5)
.map(row => row.winner.toUpperCase())
.toArray()
.then(console.log);
For more information, see the Client reference.
Run a find with a non-vector filter condition:
TableCursor<Row> row2 = table
.find(eq("match_id", "challenge6"));
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
Run a vector search and get an iterable over the returned results:
TableFindOptions options = new TableFindOptions()
.sort(Sort.vector("m_vector", new float[] {0.4f, -0.6f, 0.2f}))
.limit(3);
TableCursor<Row> row = table.find(foptions);
Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:
Filter filter = eq("match_id", "fight4");
TableFindOptions options = new TableFindOptions()
.projection(include("winner"))
.sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));
List<Row> result = table
.find(filter, options)
.toList();
Parameters:
Name | Type | Summary |
---|---|---|
|
A filter expressing which condition the returned rows must satisfy.
You can use filter operators to compare columns with literal values.
Filters can be instantiated with its constructor and specialized with method You cannot filter on To perform a vector search, use |
|
|
A wrapper for the different options and specialization of this search. |
|
|
|
This parameter acts a formal specifier for the type checker.
If omitted, the resulting cursor is implicitly a |
Name | Type | Summary |
---|---|---|
|
The |
|
|
Select a subset of columns to include in the response for the returned rows:
If empty or unspecified, the default projection (all columns) is used. DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see Projection clauses. |
|
|
|
If true, the returned rows include a |
|
|
If true, you can call the You can’t use |
|
|
Optionally specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned from the table.
The returned cursor stops yielding rows either when it reaches the |
|
|
A timeout, in milliseconds (long), to impose on the underlying API request. If not provided, the Table defaults apply. |
Returns:
TableCursor<T, R>
- An object representing the stream of results.
You can manipulate a TableCursor
in various ways.
Typically, it is iterated over and yields the search results, managing pagination discretely as needed.
Invoking .to_list()
on a TableCursor
causes it to consume all rows, and then materialize the entire set of results as the returned list.
This is not recommended for queries that return a large number of results.
Example:
package com.datastax.astra.client.tables;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.List;
import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;
public class FindMany {
public static void main(String[] args) {
Database db = new DataAPIClient("token").getDatabase("endpoint");
Table<Row> table = db.getTable("games");
Filter filter = eq("match_id", "tournamentA");
TableFindOptions options = new TableFindOptions()
// .projection(include("match_id", "winner", "field3"))
.limit(2)
//.sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
.includeSortVector(true)
.includeSimilarity(true);
TableCursor<Row, Row> row = table.find(filter, options);
row.forEach(r -> {
System.out.println("Row: " + r);
});
TableCursor<Row, Game> gameCursor = table.find(filter, options, Game.class);
gameCursor.forEach(g -> {
System.out.println("Game: " + g.getWinner());
});
TableCursor<Row, Row> row2 = table.find(eq("match_id", "tournamentA"));
row2.getSortVector().ifPresent(v -> {
System.out.println("Sort Vector: " + v);
});
Filter filter3 = eq("match_id", "fight4");
TableFindOptions options3 = new TableFindOptions()
.projection(include("winner"))
.sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));
List<Row> result = table.find(filter3, options3).toList();
}
}
Use find
to retrieve multiple rows matching your filter
and sort
criteria.
Retrieve multiple rows by a given filter
clause:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"COLUMN_NAME": {
"FILTER_OPERATOR": FILTER_VALUE
}
}
"options": {
"skip": 5,
"limit": 10
}
}
}' | jq
The filter
can include one or more columns, filter operators, and values.
For example, the $eq
(equals) operator finds exact matches.
The following example retrieves rows where the customer
column has the name "Jasmine S." and the city "Jersey City":
"find": {
"filter": {
"customer": {
"$eq": {
"name": "Jasmine S.",
"city": "Jersey City"
}
}
}
}
$eq
is the default operator if you do not specify an operator:
"find": {
"filter": {
"customer": { "name": "Jasmine S.", "city": "Jersey City" }
}
}
For more information about filter operators, expand the following Filter operator examples or see Data API operators.
Filter operator examples
Use $ne
to match rows that do not have the given filter value:
"find": {
"filter": {
"$ne": {
"state": "NJ"
}
}
}
Use $in
to match any of a series of specified values:
"find": {
"filter": {
"city": {
"$in": [ "Jersey City", "Orange" ]
}
}
}
Similarly, use $nin
to match rows that don’t have any of the specified values.
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
# Provide the query vector as an array
"find": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] }
}
# Provide the query vector with $binary
"find": {
"sort": { "vect_emb": { "$binary": "PczMzb5MzM0+mZma" } }
}
# Generate a query vector with vectorize
"find": {
"sort": { "vect_emb": "Text to vectorize" }
}
# Perform a vector search and include the similarity score in the response
"find": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
"options": {
"includeSimilarity": true
}
}
To perform a hybrid search, use both sort
and filter
:
"find": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
"filter": { "year": { "$gt": 2000 } }
}
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The Data API command to retrieve multiple rows in a table based on one or more of |
|
|
An object that defines filter criteria using the Data API filter syntax. For a list of available operators, see Data API operators. You cannot filter on To perform a vector search, use |
|
|
Perform a vector similarity search or set the order in which rows are returned. For more information and examples, see Sort clauses and Vector type. |
|
|
Select a subset of columns to include in the response for each returned row.
If empty or unset, the default projection is used.
The default projection includes all columns, but it omits The response always omits For more information and examples, see Projection clauses. |
|
|
Specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned.
Pagination can occur if more than 20 rows are returned in the current set of matching rows.
Once the |
|
|
If true, the response includes a |
|
|
If true, the response includes the
You can’t use |
Returns:
A successful response can include a data
object that contains documents
and nextPageState
.
-
documents
is an array of objects where each object represents a row matching the given query. The returned values for each row object depend on theprojection
andoptions
. -
nextPageState
can benull
or an ID. If it is an ID, then you can use that ID to fetch the next page of rows that match the filter. If it isnull
or omitted, then there are no more matches or pages available.Some
find
operations don’t paginate, even if there are additional matches. For example:-
Operations that require in-memory sort, such as allow filtering on non-indexed columns. The Data API returns a warning if this happens.
-
Vector searches. Vector searches return a maximum of 1000 rows, unless you specify a lower
limit
or your table does not have 1000 rows. -
Certain combinations of
sort
andfilter
options.
-
{
"data": {
"documents": [
{
"name": "Sami Minh",
"email": "sami@example.com",
"graduated": true,
"graduation_year": 2024
},
{
"name": "Kiran Jay",
"email": "kiran@example.com",
"graduated": true,
"graduation_year": 2024
}
],
"nextPageState": null
}
}
In the event of pagination, you must issue a subsequent request with a pageState
ID to fetch the next page of rows that matched the filter.
As long as there is a subsequent page with matching rows, the transaction returns a nextPageState
ID, which you use as the pageState
for the subsequent request.
Each paginated request is exactly the same as the original request, except for the addition of the pageState
in the options
object:
{
"find": {
"filter": { "graduation_year": { "$eq": 2024 } },
"options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" }
}
}
Continue issuing requests with the subsequent pageState
ID until you have fetched all matching rows.
Example:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/students" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"graduation_year": 2024
}
}
}' | jq
Find distinct values across rows
Get a list of the distinct values in a certain column in a table.
|
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Use a specified filter condition to list distinct values for a column:
my_table.distinct("winner", filter={"match_id": "challenge6"})
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field whose value is inspected across rows.
Keys are typically just column names, although they can use dot notation to select particular entries in |
|
|
A dictionary expressing which condition the inspected rows must satisfy. You can use filter operators to compare columns with literal values. For example:
|
|
|
A timeout, in milliseconds, to impose on the underlying API request.
If not provided, the |
|
|
A timeout, in milliseconds, for each API request performed while scrolling through matches (see |
Example:
Full example script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
from astrapy.constants import SortMode
from astrapy.info import (
CreateTableDefinition,
ColumnType,
)
my_table = database.create_table(
"games",
definition=(
CreateTableDefinition.builder()
.add_column("match_id", ColumnType.TEXT)
.add_column("round", ColumnType.TINYINT)
.add_vector_column("m_vector", dimension=3)
.add_column("score", ColumnType.INT)
.add_column("when", ColumnType.TIMESTAMP)
.add_column("winner", ColumnType.TEXT)
.add_set_column("fighters", ColumnType.UUID)
.add_partition_by(["match_id"])
.add_partition_sort({"round": SortMode.ASCENDING})
.build()
),
)
from astrapy.data_types import (
DataAPISet,
DataAPITimestamp,
DataAPIVector,
)
from astrapy.ids import UUID
insert_result = my_table.insert_many(
[
{
"match_id": "fight4",
"round": 1,
"winner": "Victor",
"score": 18,
"when": DataAPITimestamp.from_string(
"2024-11-28T11:30:00Z",
),
"fighters": DataAPISet([
UUID("0193539a-2770-8c09-a32a-111111111111"),
UUID('019353e3-00b4-83f9-a127-222222222222'),
]),
"m_vector": DataAPIVector([0.4, -0.6, 0.2]),
},
{
"match_id": "challenge6",
"round": 1,
"winner": "Donna",
"m_vector": [0.9, -0.1, -0.3],
},
{"match_id": "challenge6", "round": 2, "winner": "Erick"},
{"match_id": "challenge6", "round": 3, "winner": "Fiona"},
],
)
# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']
# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...
# Over a column containing null values
# (also with composite filter):
my_table.distinct(
"score",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]
# distinct over a set column (automatically "unrolled"):
my_table.distinct(
"fighters",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...
# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']
# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...
# Over a column containing null values
# (also with composite filter):
my_table.distinct(
"score",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]
# distinct over a set column (automatically "unrolled"):
my_table.distinct(
"fighters",
filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...
The TypeScript client doesn’t have a distinct
operation for tables.
Use a filter condition to find rows and then list distinct values in a given column:
// Find with no options
Filter filter = Filters.eq("match_id", "challenge6");
// Show you the distinct match id in the table
List<String> winners = table
.distinct("winner", filter, String.class);
// Adding an Options
TableDistinctOptions options = new TableDistinctOptions()
.timeout(1000);
List<String> winners2 = table
.distinct("winner", filter, String.class, options);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field on which perform the distinct |
|
A filter expressing which condition the inspected rows must satisfy.
You can use filter operators to compare columns with literal values.
Filters can be instantiated with its constructor and specialized with method |
|
|
|
Represents the expected type of the column returned.
For example, if the target column is |
|
A wrapper for the different options and specialization of this search. |
Example:
package com.datastax.astra.client.tables;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Filters;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableDistinctOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.List;
import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;
public class Distinct {
public static void main(String[] args) {
Database db = new DataAPIClient("token").getDatabase("endpoint");
Table<Row> table = db.getTable("games");
// Show you all match id in the table
List<Row> matches = table.find(null, new TableFindOptions()
.projection(include("match_id"))).toList();
matches.forEach(System.out::println);
Filter filter = Filters.eq("match_id", "challenge6");
TableDistinctOptions options = new TableDistinctOptions()
.timeout(1000);
// Show you the distinct match id in the table
table.distinct("match_id", filter, String.class)
.forEach(System.out::println);
}
}
This operation has no literal equivalent in HTTP.
Instead, you can use Find rows using filter clauses, and then use projection
and a utility like jq
to extract specific values from the response.
Sort clauses
You can use sort
clauses to organize results by similarity to a given vector or by regular ascending or descending order.
For regular sorts, sort
can include one or more non-vector
columns.
If you specify multiple columns, the sorting applies hierarchically to the rows.
For vector search, sort
can include only one vector
column.
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
-
Python
-
TypeScript
-
Java
-
curl
The Python client uses the sort
dictionary parameter for vector search or regular sorting.
When no particular order is required:
sort={} # (default when parameter not provided)
For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order:
from astrapy.constants import SortMode
# Ascending sort (two equivalent forms)
sort={"round": SortMode.ASCENDING}
sort={"round": +1}
# Descending sort (two equivalent forms)
sort={"round": SortMode.DESCENDING}
sort={"round": -1}
Be aware of the order when chaining sorts, as the sorting applies hierarchically:
# Multi-column sort
sort={
"week_no": SortMode.ASCENDING,
"day_of_week": SortMode.ASCENDING,
}
# Equivalent form for multi-column sort
sort={"week_no": +1, "day_of_week": +1}
You can use sort
to perform a vector similarity (ANN) search.
For a vector search, you can provide a pre-generated query vector in the form {"vector_column": QUERY_VECTOR}
, where QUERY_VECTOR
is a list of floats or DataAPIVector
.
If the specified vector
column has a vectorize embedding provider integration, you can generate a query vector from a string using the form {"vector_column": "query text"}
.
# Provide a pre-generated query vector as array,
# And then sort by similarity to the given vector.
sort={"vector_column": [0.1, -0.2, 0.3]}
# Provide a binary-encoded pre-generated query vector,
# And then sort by similarity to the given vector.
# Binary encoding is preferrable for performance.
sort={"vector_column": DataAPIVector([0.1, -0.2, 0.3])}
# Generate a vector from a string,
# Run a similarity search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"vector_column": "Query text to vectorize"}
Sort example (full script)
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
from astrapy.constants import SortMode
from astrapy.info import (
CreateTableDefinition,
ColumnType,
)
my_table = database.create_table(
"games",
definition=(
CreateTableDefinition.builder()
.add_column("match_id", ColumnType.TEXT)
.add_column("round", ColumnType.TINYINT)
.add_vector_column("m_vector", dimension=3)
.add_column("score", ColumnType.INT)
.add_column("when", ColumnType.TIMESTAMP)
.add_column("winner", ColumnType.TEXT)
.add_set_column("fighters", ColumnType.UUID)
.add_partition_by(["match_id"])
.add_partition_sort({"round": SortMode.ASCENDING})
.build()
),
)
from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions
my_table.create_index(
"score_index",
column="score",
)
my_table.create_index(
"winner_index",
column="winner",
options=TableIndexOptions(
ascii=False,
normalize=True,
case_sensitive=False,
),
)
my_table.create_vector_index(
"m_vector_index",
column="m_vector",
options=TableVectorIndexOptions(
metric=VectorMetric.DOT_PRODUCT,
),
)
from astrapy.data_types import (
DataAPISet,
DataAPITimestamp,
DataAPIVector,
)
from astrapy.ids import UUID
insert_result = my_table.insert_many(
[
{
"match_id": "fight4",
"round": 1,
"winner": "Victor",
"score": 18,
"when": DataAPITimestamp.from_string(
"2024-11-28T11:30:00Z",
),
"fighters": DataAPISet([
UUID("0193539a-2770-8c09-a32a-111111111111"),
UUID('019353e3-00b4-83f9-a127-222222222222'),
]),
"m_vector": DataAPIVector([0.4, -0.6, 0.2]),
},
{
"match_id": "challenge6",
"round": 1,
"winner": "Donna",
"m_vector": [0.9, -0.1, -0.3],
},
{"match_id": "challenge6", "round": 2, "winner": "Erick"},
{"match_id": "challenge6", "round": 3, "winner": "Fiona"},
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
],
)
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector
# Top vector result using "sort" (on an indexed vector column):
my_table.find_one(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
)
# {'winner': 'Donna'}
# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
)
# {'winner': 'Caio Gozer'}
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]
# Hybrid search with vector sort and non-vector filtering:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Victor'}]
When no particular order is required:
{ sort: {} } // (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
{ sort: { column: +1 } } // ascending
{ sort: { column: -1 } } // descending
Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion
For example, when sorting first by a column1
and then by column2
:
{ sort: { column1: 1, column2: 1 } }
You can use sort
to perform a vector similarity (ANN) search.
You can supply a pre-generated query vector or, if the vector
column has a vectorize integration, you can generate a query vector from a string.
For more information, see Vector type.
// Provide a pre-generated query vector as array,
// And then sort by similarity to the given vector.
{ sort: { vector_column: vector([0.4, 0.15, -0.5]) } }
// Generate a vector from a string,
// Run a similarity search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration.
{ sort: { vector_column: "Text to vectorize" } }
If your vector search includes projection
and includeSimilarity: true
, then you must manually include $similarity
in the type of the returned rows.
For example:
await table.findOne<{ winner: string, $similarity: number }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
includeSimilarity: true,
}).then(console.log);
If you don’t include projection
, then $similarity
is inferred to be a part of the returned rows.
Example:
import { CreateTableDefinition, DataAPIClient, InferTableSchema, vector } from '@datastax/astra-db-ts';
// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });
// Create table schema using bespoke Data API table definition syntax
const TableDefinition = <const>{
columns: {
matchId: 'text'
round: 'tinyint',
mVector: { type: 'vector', dimension: 3 },
score: 'int',
when: 'timestamp',
winner: 'text',
fighters: { type: 'set', valueType: 'uuid' },
},
primaryKey: {
partitionBy: ['matchId'],
partitionSort: { round: 1 },
},
} satisfies CreateTableDefinition;
type TableSchema = InferTableSchema<typeof TableDefinition>;
(async function () {
// Create the table if it doesn't already exist
const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });
// Create the necessary indexes
await table.createIndex('winner_idx', 'winner', { ifNotExists: true });
await table.createVectorIndex('m_vector_idx', 'mVector', { ifNotExists: true });
// Inserts a few rows for demonstration purposes
await table.insertMany([
{ matchId: 'fight4', round: 1, winner: 'Victor', mVector: vector([.1, .2, .3]) },
{ matchId: 'fight5', round: 1, winner: 'Gray Tist', mVector: vector([.1, .2, .3]) },
{ matchId: 'fight5', round: 2, winner: 'Cham P. Yun', mVector: vector([.9, .8, .7]) },
{ matchId: 'fight5', round: 3, winner: 'Winona', mVector: vector([.1, .2, .3]) },
{ matchId: 'fight7', round: 1, winner: 'Wynn Uhr', mVector: vector([.1, .2, .3]) },
{ matchId: 'fight7', round: 2, winner: 'Zeeb Est', mVector: vector([.1, .2, .3]) },
{ matchId: 'fight7', round: 3, winner: 'Theeg Oat', mVector: vector([.1, .2, .3]) },
]);
// Random match
const row0 = await table.findOne({});
console.log('Match winner:', row0?.winner);
// Simple single-field sort ('Victor')
const row1 = await table.findOne({}, { sort: { matchId: 1 } });
console.log('First match winner:', row1?.winner);
// Simple multi-field sort ('Wynn Uhr')
const row3 = await table.findOne({}, { sort: { matchId: -1, round: 1 } });
console.log('Last match, first round winner:', row3?.winner);
// Vector sort ('Cham P. Yun')
const row4 = await table.findOne({}, { sort: { mVector: vector([.9, .8, .7]) } });
console.log('Match with the closest matching vector:', row4?.winner);
// Uncomment the following line to drop the table and any related indexes.
// await table.drop();
})();
The Java client uses sort()
for vector search or regular sorting.
The sort()
operations are optional.
Use them only when needed.
For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order. Be aware of the order when chaining sorts, as the sorting applies hierarchically.
import com.datastax.astra.client.core.query.Sort;
Sort s1 = Sort.ascending("field1");
Sort s2 = Sort.descending("field2");
TableFindOptions options =
new TableFindOptions().sort(s1, s2);
To perform a vector search, use the Sort.vector(..,..)
and Sort.vectorize(..,..)
methods.
These methods take two arguments: a column name and a query vector.
If you omit the column name, the client uses the default of $vector
or $vectorize
, depending on the method.
These are reserved fields for collections.
With tables, it is important that you specify the name of your vector
column in the first argument.
This is because a table can contain multiple vector
columns, and vector
columns can have any valid name.
You can provide sort vectors to these methods in the same way that you write vectors:
// This example uses TableFindOptions, but you can use any sortable object
// Sort with DataAPIVector
DataAPIVector vector = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
new TableFindOptions()
.sort(Sort.vector("m_vector", vector));
// Sort with float[]
new TableFindOptions()
.sort(Sort.vector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));
// Sort with vectorize
new TableFindOptions()
.sort(Sort.vectorize("m_vector", "Text To Serialize"));
Example:
package com.datastax.astra.client.tables;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOneOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.definition.rows.Row;
public class WorkingWithSorts {
public static void main(String[] args) {
Database db = new DataAPIClient("token").getDatabase("endpoint");
Table<Row> tableRow = db.getTable("TABLE_NAME");
// Sort Clause for a vector
Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;
// Sort Clause for other fields
Sort s1 = Sort.ascending("field1");
Sort s2 = Sort.descending("field2");
// Build the sort clause
new TableFindOptions().sort(s1, s2);
// Adding vector
new TableFindOptions().sort(
Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}), s1, s2);
}
}
# Provide the query vector as an array
"sort": { "vector_column": [ 0.1, -0.2, 0.3 ] }
# Provide the query vector with $binary
"sort": { "vector_column": { "$binary": "PczMzb5MzM0+mZma" } }
# Generate a query vector with vectorize
"sort": { "vector_column": "Text to vectorize" }
For HTTP commands with sort
, see Find a row and Find rows using filter clauses.
Sort results and optimization
For best performance, filter and sort on indexed columns, partition keys, and clustering keys. Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete. An empty filter ( Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications. |
Some combinations of arguments impose an implicit upper bound on the number of rows returned by the Data API:
-
Vector ANN search returns up to 1000 rows per search operation, unless you set a lower
limit
. -
When using an ascending or descending sort criterion, the Data API returns up to 20 rows at once. The returned rows are the top results across the whole collection based on the
filter
criteria.These provisions can also apply when running subsequent commands on client cursors, such as
.distinct()
.
Over HTTP, pagination can occur if there are more than 20 matching rows, but, in some cases, the nextPageState
is null
regardless of the actual presence of additional results.
This can occur with or without sort
.
For ascending or descending sort
clauses that do not automatically paginate, it is sometimes possible to use the limit
and skip
options to control the number of rows returned and the starting point of the results, as a form of manual pagination.
When you don’t specify sorting criteria (by vector or otherwise), the following can occur:
-
Client cursors can scroll through an arbitrary number of rows because the Data API and the client periodically exchange new chunks of rows.
-
If rows are added or removed after starting a
find
operation, pagination and client cursor behavior depends on database internals. There is no guarantee as to whether or not the response will pick up such "real-time" changes in the data.
For more information about handling pagination and cursors, see Find rows using filter clauses.
Projection clauses
In findOne
and find
, you can use the projection
parameter to select specific columns to include or exclude from the response.
The projection can’t mix inclusions and exclusions.
If projection
is empty or unspecified, the Data API applies the default projection.
For rows, the default projection includes all columns.
Over HTTP, the Data API always excludes null
values in responses, regardless of projection
For example, if find
matches multiple rows, the response can include different columns for each row, depending on which columns are null
in each row.
You cannot forcibly include null
values in the response.
The Python, TypeScript and Java clients reconstruct returned rows by adding the columns omitted
in the API response.
Therefore, when using a client, rows from findOne
and find
operations always have
the same columns, as specified by the projection
, regardless of the presence of null
values.
In order to optimize the response size and improve performance, DataStax recommends always providing an explicit projection tailored to the needs of the application. |
Projection syntax
A projection is expressed as a mapping of column names to boolean values.
Use true
mapping to include only the specified columns.
For example, the following true
mapping returns the columns name
and email
:
{ "name": true, "email": true }
Alternatively, use a false
mapping to exclude the specified columns.
All other non-excluded columns are returned.
For example, the following false
mapping returns all columns except name
and email
:
{ "name": false, "email": false }
The values in a projection map can be objects, booleans, decimals, or integers, but the Data API ultimately evaluates all of these as booleans.
For example, the following projection evaluates to true
(include) for all four columns:
{ "column1": true, "column2": 1, "column3": 90.0, "column4": { "keep": "yes!" } }
Whereas this projection evaluates to false
(exclude) for all four columns:
{ "column1": false, "column2": 0, "column3": 0.0, "column4": {} }
Passing null-like types (such as {}
, null
or 0
) for the whole projection
mapping is equivalent to not specifying a projection
, causing the Data API to use the default projection.
A projection can’t mix include and exclude projections.
It can contain only true
or only false
values for the given columns.
For example, {"column1": true, "column2": false}
is an invalid projection that results in an API error.
A projection can’t include or exclude sub-column values, such as keys in map
columns.
To include all columns, you can use the wildcard projection { "*": true }
.
This is equivalent to the default projection.
The Data API doesn’t support false wildcard projections for tables.
Projection examples by language
-
Python
-
TypeScript
-
Java
-
curl
For the Python client, the projection can be any of the following:
-
A dictionary (
dict[str, Any]
) to include specific columns in the response, like{column_name: True}
. -
A dictionary (
dict[str, Any]
) to exclude specific columns from the response, like{column_name: False}
. -
A list or other iterable over key names that are implied to be included in the projection.
# The following two statements are equivalent:
document = table.find_one(
{"user_id": 101},
projection={"name": True, "city": True},
)
document = table.find_one(
{"user_id": 101},
projection=["name", "city"],
)
# Return the whole rows (default):
table.find({}, projection={'*': True}).to_list()
# Exclude certain columns:
table.find({}, projection={'name': False}).to_list()
For information about default projections and projection syntax, see the preceding explanation of projection clauses.
The TypeScript client takes in an untyped JS object for the projection
parameter.
When specifying a projection, you must handle the return type carefully.
To avoid some false typing claims, when it cannot be inferred that a projection is not passed, Partial<Row>
is used as the return type for find
& findOne
operations.
This is only Partial
, not DeepPartial
, so it only makes the top-level keys optional.
DataStax recommends that you provide your own explicit type override on find
/findOne
operations when using projections.
interface TableSchema {
matchId: string,
round: number,
score?: number | null,
winner?: string | null,
}
// row1 is of type TableSchema
const row1 = await table.findOne({});
// row2 is of type Partial<TableSchema>
const row2 = await table.findOne({}, {
projection: { score: 1, winner: 1 }
});
// row3 is of type { score?: string | null, winner?: number | null }
const row3 = await table.findOne<Pick<TableSchema, 'score' | 'winner'>>({}, {
projection: { score: 1, winner: 1 }
});
// Projections also work with cursors
// rows are of type { score: number, winner: string, $similarity: number }
const cursor = await table.find({})
.project<{ score: string, winner: number }>({ score: 1, winner: 1 })
.includeSimilarity();
For information about default projections and projection syntax, see the preceding explanation of projection clauses.
If you perform a vector search that includes projection
and includeSimilarity: true
, then you must manually include $similarity
in the type of the returned rows.
For example:
await table.findOne<{ winner: string, $similarity: number }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
includeSimilarity: true,
}).then(console.log);
If you don’t include projection
, then $similarity
is inferred to be a part of the returned rows.
To support the projection mechanism, the Java client has different Options
classes that provide the projection
method in the helpers.
This method takes an array of Projection
classes with the column name and a boolean flag indicating inclusion or exclusion:
Projection p1 = new Projection("column1", true);
Projection p2 = new Projection("column2", true);
TableFindOneOptions options1 = new TableFindOneOptions().projection(p1, p2);
To simplify this syntax, you can use the Projections
syntactic sugar:
TableFindOneOptions options2 = new TableFindOneOptions()
.projection(Projections.include("column1", "column2"));
TableFindOneOptions options3 =new TableFindOneOptions()
.projection(Projections.exclude("column1", "column2"));
For information about default projections and projection syntax, see the preceding explanation of projection clauses.
projection
is an optional parameter on find
and findOne
.
The following example returns the first 100 rows matching the given query, after skipping the first 20 matches.
The response contains only the name
and email
value for each returned row, if they are not null
.
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": { "email": "tal@example.com" },
"projection": { "name": true, "email": true },
"options": {
"skip": 20,
"limit": 100
}
}
}' | jq
For information about default projections and projection syntax, see the preceding explanation of projection clauses.