Find rows
This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms. The Data API tables commands are available through HTTP and the clients. If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide. |
Find multiple rows that match a query.
For best performance, filter and sort on indexed columns, partition keys, and clustering keys. Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete. An empty filter ( Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications. |
A row represents a single record of data in a table in an Astra DB Serverless database.
You use the Table
class to work with rows through the Data API clients.
For instructions to get a Table
object, see Work with tables.
For general information about working with rows, including common operations and operators, see Work with rows.
For more information about the Data API and clients, see Get started with the Data API.
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Use find
to retrieve rows that match your filter
and sort
criteria.
Run a find with a non-vector filter condition:
my_table.find({"match_id": "challenge6"})
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
Run a vector search and get an iterable over the returned results:
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
limit=3,
)
Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
).to_list()
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A dictionary expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. For example:
You cannot filter on To perform a vector search, use |
|
|
This dictionary parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches.
The |
|
|
Select a subset of columns to include in the response for the returned row:
DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see projection clauses. |
|
|
This parameter acts a formal specifier for the type checker.
If omitted, the resulting cursor is implicitly a |
|
|
Optionally specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned from the table.
The returned cursor stops yielding rows either when it reaches the |
|
|
If true, the returned rows include a |
|
|
If true, you can call the You can’t use |
|
|
A timeout, in milliseconds, to impose on each individual HTTP request to the Data API to accomplish the operation. If not provided, the Table defaults apply. This parameter is aliased as |
Returns:
TableFindCursor
- An object of type astrapy.cursors.TableFindCursor
, representing the stream of results.
You can manipulate a TableFindCursor
in various ways.
Typically, it is iterated over and yields the search results, managing pagination discretely as needed.
For more information, see FindCursor.
Invoking .to_list()
on a TableFindCursor causes it to consume all rows, and then materialize the entire set of results as the returned list.
This is not recommended for find
operations that can return a large number of results.
While you iterate over a cursor, rows are retrieved in chunks progressively. It is possible for retrieved chunks to reflect real-time changes (inserts, updates, and deletions) on the table. |
Example response
TableFindCursor("games", idle, consumed so far: 0)
Example:
Full script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
from astrapy.constants import SortMode
from astrapy.info import (
CreateTableDefinition,
ColumnType,
)
my_table = database.create_table(
"games",
definition=(
CreateTableDefinition.builder()
.add_column("match_id", ColumnType.TEXT)
.add_column("round", ColumnType.TINYINT)
.add_vector_column("m_vector", dimension=3)
.add_column("score", ColumnType.INT)
.add_column("when", ColumnType.TIMESTAMP)
.add_column("winner", ColumnType.TEXT)
.add_set_column("fighters", ColumnType.UUID)
.add_partition_by(["match_id"])
.add_partition_sort({"round": SortMode.ASCENDING})
.build()
),
)
from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions
my_table.create_index(
"score_index",
column="score",
)
my_table.create_index(
"winner_index",
column="winner",
options=TableIndexOptions(
ascii=False,
normalize=True,
case_sensitive=False,
),
)
my_table.create_vector_index(
"m_vector_index",
column="m_vector",
options=TableVectorIndexOptions(
metric=VectorMetric.DOT_PRODUCT,
),
)
from astrapy.data_types import (
DataAPISet,
DataAPITimestamp,
DataAPIVector,
)
from astrapy.ids import UUID
insert_result = my_table.insert_many(
[
{
"match_id": "fight4",
"round": 1,
"winner": "Victor",
"score": 18,
"when": DataAPITimestamp.from_string(
"2024-11-28T11:30:00Z",
),
"fighters": DataAPISet([
UUID("0193539a-2770-8c09-a32a-111111111111"),
UUID('019353e3-00b4-83f9-a127-222222222222'),
]),
"m_vector": DataAPIVector([0.4, -0.6, 0.2]),
},
{
"match_id": "challenge6",
"round": 1,
"winner": "Donna",
"m_vector": [0.9, -0.1, -0.3],
},
{"match_id": "challenge6", "round": 2, "winner": "Erick"},
{"match_id": "challenge6", "round": 3, "winner": "Fiona"},
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
],
)
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector
# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"when": {
"$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
}}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]
# Hybrid search with vector sort and non-vector filtering:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Victor'}]
# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
{},
sort={"m_vector": [0.2, 0.3, 0.4]},
projection={"winner": True},
limit=3,
include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...
# Non-vector sorting on a 'partitionSort' column:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...
# Using skip
and limit
:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
skip=1,
limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]
# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5"},
sort={"winner": SortMode.ASCENDING},
projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...
# Using .map()
on a cursor:
winner_cursor = my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL
# Some other examples of cursor manipulation
matches_cursor = my_table.find(
sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
next(matches_cursor)
except StopIteration:
print("StopIteration triggered.")
# StopIteration triggered.
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector
# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
# (R:1): winner Donna
# (R:2): winner Erick
# (R:3): winner Fiona
# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...
# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"when": {
"$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
}}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...
# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
{},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]
# Hybrid search with vector sort and non-vector filtering:
my_table.find(
{"match_id": "fight4"},
sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
projection={"winner": True},
limit=3,
).to_list()
# [{'winner': 'Victor'}]
# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
{},
sort={"m_vector": [0.2, 0.3, 0.4]},
projection={"winner": True},
limit=3,
include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...
# Non-vector sorting on a 'partitionSort' column:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...
# Using skip
and limit
:
my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
skip=1,
limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]
# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
{"match_id": "fight5"},
sort={"winner": SortMode.ASCENDING},
projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...
# Using .map()
on a cursor:
winner_cursor = my_table.find(
{"match_id": "fight5"},
sort={"round": SortMode.DESCENDING},
projection={"winner": True},
limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL
# Some other examples of cursor manipulation
matches_cursor = my_table.find(
sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
next(matches_cursor)
except StopIteration:
print("StopIteration triggered.")
# StopIteration triggered.
For more information, see the Client reference.
Use find
to retrieve multiple rows matching your filter
and sort
criteria.
Find all rows matching a basic equality filter:
await table.find({ matchId: 'challenge6' });
The previous example is shorthand for the $eq
(equals) operator.
There are other filter operators you can use, such as the $gte
(greater than or equal to) operator:
await table.find({ winner: { $gte: 15 } });
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
// Provide a pre-generated query vector
await table.find({}, { sort: { mVector: vector([.2, .3, .4]) } });
// Generate a query vector with vectorize
await table.find({}, { sort: { mVector: 'text to vectorize' } });
To run a hybrid search, combine a filter with a vector search (this example also uses a projection):
table.find<{ winner: string }>({ matchId: 'fight4' }, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
});
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An object that defines filter criteria using the Data API filter syntax. For more information and examples, see Data API operators. You cannot filter on To perform a vector search, use |
|
|
The options for this operation |
Options (TableFindOptions
):
Name | Type | Summary |
---|---|---|
|
|
The |
|
|
Select a subset of columns to include in the response for the returned rows:
If empty or unspecified, the default projection (all columns) is used. DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as Additionally, DataStax recommends providing your own type for the returned rows because projections can break typing guarantees.
If your query includes For more information and examples, see projection clauses. |
|
|
Optionally specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned from the table.
The returned cursor stops yielding rows either when it reaches the |
|
|
If true, the returned rows include a If your query includes |
|
|
If true, you can call the You can’t use |
|
|
The client-side timeout for this operation. |
Returns:
TableFindCursor<RSchema>
- The FindCursor
, which you can configure and iterate over lazily.
Example:
Full script
import { CreateTableDefinition, DataAPIClient, InferTablePrimaryKey, InferTableSchema, timestamp, uuid, vector } from '@datastax/astra-db-ts';
// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });
// Create table schema using bespoke Data API table definition syntax, and then infer the type.
// For information about table typing and definitions, see the documentation for createTable.
const TableDefinition = <const>{
columns: {
matchId: 'text'
round: 'tinyint',
mVector: { type: 'vector', dimension: 3 },
score: 'int',
when: 'timestamp',
winner: 'text',
fighters: { type: 'set', valueType: 'uuid' },
},
primaryKey: {
partitionBy: ['matchId'],
partitionSort: { round: 1 },
},
} satisfies CreateTableDefinition;
type TableSchema = InferTableSchema<typeof TableDefinition>;
(async function () {
// Create a table with the given TableSchema type if a 'games' table doesn't already exist
const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });
// Insert some rows in an unordered fashion.
await table.insertMany([
{ matchId: 'fight4', round: 1, winner: 'Victor', score: 18, when: timestamp('2024-11-28T11:30:00Z'), fighters: new Set([UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-83f9-a127-222222222222')]), mVector: vector([0.4, -0.6, 0.2]) },
{ matchId: 'challenge6', round: 1, winner: 'Donna', mVector: vector([0.9, -0.1, -0.3]) },
{ matchId: 'challenge6', round: 2, winner: 'Erick' },
{ matchId: 'challenge6', round: 3, winner: 'Fiona' },
{ matchId: 'fight5', round: 3, winner: 'Caio Gozer' },
]);
// Create a secondary index on the 'score' column if it doesn't already exist
await table.createIndex('round_idx', 'round', { ifNotExists: true });
// Create a secondary index on the 'winner' column with case-insensitivity if it doesn't already exist
await table.createIndex('winner_idx', 'winner', {
options: {
caseSensitive: false,
},
ifNotExists: true,
});
// Create a vector index on the 'mVector' column if it doesn't already exist
await table.createVectorIndex('m_vector_idx', 'mVector', {
options: {
metric: 'dot_product',
},
});
// Use findOne and find to query rows in a table, including vector search.
// findOne examples
// Find a row with an exact match on one column.
// Inherently uses the equality ($eq) filter operator.
await table.findOne({ 'matchId': 'challenge6' }).then(console.log);
// If there is no match, the response is null.
await table.findOne({ 'matchId': 'not_real' }).then(console.log);
// Projections optimize bandwidth by returning specified columns instead of the entire row.
// Specify the exact return type to prevent accidental type errors.
// This example prints the values for 'round' and 'winner' in the matching row.
await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
projection: { round: 1, winner: 1 },
}).then(console.log);
// Find a row using other filter operators on other indexed columns.
await table.findOne({ score: { $gte: 15 } }).then(console.log);
// (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
// The Data API returns a warning, if you are listening for or logging warnings.
await table.findOne({ score: { $gt: timestamp() } });
// To get any row, pass an empty filter.
await table.findOne({}).then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.findOne<{ winner: string }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 }
}).then(console.log);
// includeSimilarity returns the similarity score for the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.findOne<{ winner: string, $similarity: number }>({}, {
sort: { mVector: vector([.2, .3, .4]) },
projection: { winner: 1 },
includeSimilarity: true,
}).then(console.log);
// You can also use sort for regular ascending/descending sorts on a given column.
await table.findOne({ matchId: 'round5' }, {
sort: { round: -1 },
}).then(console.log);
// find examples
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;
for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);
// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);
// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
.sort({ mVector: vector([.2, .3, .4]) })
.project<{ winner: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
.sort({ mVector: vector([.2, .3, .4]) })
.includeSimilarity(true)
.project<{ winner: number, $similarity: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
.sort({ round: -1 })
.toArray()
.then(console.log);
// You can also use mapping.
await table.find({ matchId: 'fight5' })
.sort({ round: -1 })
.limit(5)
.map(row => row.winner.toUpperCase())
.toArray()
.then(console.log);
// Uncomment the following line to drop the table and any related indexes.
// await table.drop();
})();
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;
for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
console.log(`(R:${row.round}): winner ${row.winner}`);
}
// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);
// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);
// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);
// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
.sort({ mVector: vector([.2, .3, .4]) })
.project<{ winner: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
.sort({ mVector: vector([.2, .3, .4]) })
.includeSimilarity(true)
.project<{ winner: number, $similarity: number }>({ winner: 1 })
.toArray()
.then(console.log);
// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
.sort({ round: -1 })
.toArray()
.then(console.log);
// You can also use mapping.
await table.find({ matchId: 'fight5' })
.sort({ round: -1 })
.limit(5)
.map(row => row.winner.toUpperCase())
.toArray()
.then(console.log);
For more information, see the Client reference.
Run a find with a non-vector filter condition:
TableCursor<Row> row2 = table
.find(eq("match_id", "challenge6"));
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
Run a vector search and get an iterable over the returned results:
TableFindOptions options = new TableFindOptions()
.sort(Sort.vector("m_vector", new float[] {0.4f, -0.6f, 0.2f}))
.limit(3);
TableCursor<Row> row = table.find(foptions);
Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:
Filter filter = eq("match_id", "fight4");
TableFindOptions options = new TableFindOptions()
.projection(include("winner"))
.sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));
List<Row> result = table
.find(filter, options)
.toList();
Parameters:
Name | Type | Summary |
---|---|---|
|
A filter expressing which condition the returned rows must satisfy.
You can use filter operators to compare columns with literal values.
Filters can be instantiated with its constructor and specialized with method You cannot filter on To perform a vector search, use |
|
|
A wrapper for the different options and specialization of this search. |
|
|
|
This parameter acts a formal specifier for the type checker.
If omitted, the resulting cursor is implicitly a |
Name | Type | Summary |
---|---|---|
|
The |
|
|
Select a subset of columns to include in the response for the returned rows:
If empty or unspecified, the default projection (all columns) is used. DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as For more information and examples, see projection clauses. |
|
|
|
If true, the returned rows include a |
|
|
If true, you can call the You can’t use |
|
|
Optionally specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned from the table.
The returned cursor stops yielding rows either when it reaches the |
|
|
A timeout, in milliseconds (long), to impose on the underlying API request. If not provided, the Table defaults apply. |
Returns:
TableCursor<T, R>
- An object representing the stream of results.
You can manipulate a TableCursor
in various ways.
Typically, it is iterated over and yields the search results, managing pagination discretely as needed.
Invoking .to_list()
on a TableCursor
causes it to consume all rows, and then materialize the entire set of results as the returned list.
This is not recommended for queries that return a large number of results.
Example:
package com.datastax.astra.client.tables;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.List;
import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;
public class FindMany {
public static void main(String[] args) {
Database db = new DataAPIClient("token").getDatabase("endpoint");
Table<Row> table = db.getTable("games");
Filter filter = eq("match_id", "tournamentA");
TableFindOptions options = new TableFindOptions()
// .projection(include("match_id", "winner", "field3"))
.limit(2)
//.sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
.includeSortVector(true)
.includeSimilarity(true);
TableCursor<Row, Row> row = table.find(filter, options);
row.forEach(r -> {
System.out.println("Row: " + r);
});
TableCursor<Row, Game> gameCursor = table.find(filter, options, Game.class);
gameCursor.forEach(g -> {
System.out.println("Game: " + g.getWinner());
});
TableCursor<Row, Row> row2 = table.find(eq("match_id", "tournamentA"));
row2.getSortVector().ifPresent(v -> {
System.out.println("Sort Vector: " + v);
});
Filter filter3 = eq("match_id", "fight4");
TableFindOptions options3 = new TableFindOptions()
.projection(include("winner"))
.sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));
List<Row> result = table.find(filter3, options3).toList();
}
}
Use find
to retrieve multiple rows matching your filter
and sort
criteria.
Retrieve multiple rows by a given filter
clause:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"COLUMN_NAME": {
"FILTER_OPERATOR": FILTER_VALUE
}
}
"options": {
"skip": 5,
"limit": 10
}
}
}' | jq
The filter
can include one or more columns, filter operators, and values.
For example, the $eq
(equals) operator finds exact matches.
The following example retrieves rows where the customer
column has the name "Jasmine S." and the city "Jersey City":
"find": {
"filter": {
"customer": {
"$eq": {
"name": "Jasmine S.",
"city": "Jersey City"
}
}
}
}
$eq
is the default operator if you do not specify an operator:
"find": {
"filter": {
"customer": { "name": "Jasmine S.", "city": "Jersey City" }
}
}
For more information about filter operators, expand the following Filter operator examples or see Data API operators.
Filter operator examples
Use $ne
to match rows that do not have the given filter value:
"find": {
"filter": {
"$ne": {
"state": "NJ"
}
}
}
Use $in
to match any of a series of specified values:
"find": {
"filter": {
"city": {
"$in": [ "Jersey City", "Orange" ]
}
}
}
Similarly, use $nin
to match rows that don’t have any of the specified values.
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
For more information, see Vector type.
# Provide the query vector as an array
"find": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] }
}
# Provide the query vector with $binary
"find": {
"sort": { "vect_emb": { "$binary": "PczMzb5MzM0+mZma" } }
}
# Generate a query vector with vectorize
"find": {
"sort": { "vect_emb": "Text to vectorize" }
}
# Perform a vector search and include the similarity score in the response
"find": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
"options": {
"includeSimilarity": true
}
}
To perform a hybrid search, use both sort
and filter
:
"find": {
"sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
"filter": { "year": { "$gt": 2000 } }
}
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The Data API command to retrieve multiple rows in a table based on one or more of |
|
|
An object that defines filter criteria using the Data API filter syntax. For a list of available operators, see Data API operators. You cannot filter on To perform a vector search, use |
|
|
Perform a vector search or set the order in which rows are returned. For more information and examples, see sort clauses and Vector type. |
|
|
Select a subset of columns to include in the response for each returned row.
If empty or unset, the default projection is used.
The default projection includes all columns, but it omits The response always omits For more information and examples, see projection clauses. |
|
|
Specify a number of rows to bypass (skip) before returning rows.
The first This parameter is only valid with |
|
|
Limit the total number of rows returned.
Pagination can occur if more than 20 rows are returned in the current set of matching rows.
Once the |
|
|
If true, the response includes a |
|
|
If true, the response includes the
You can’t use |
Returns:
A successful response can include a data
object that contains documents
and nextPageState
.
-
documents
is an array of objects where each object represents a row matching the given query. The returned values for each row object depend on theprojection
andoptions
. -
nextPageState
can benull
or an ID. If it is an ID, then you can use that ID to fetch the next page of rows that match the filter. If it isnull
or omitted, then there are no more matches or pages available.Some
find
operations don’t paginate, even if there are additional matches. For example:-
Operations that require in-memory sort, such as allow filtering on non-indexed columns. The Data API returns a warning if this happens.
-
Vector searches. Vector searches return a maximum of 1000 rows, unless you specify a lower
limit
or your table does not have 1000 rows. -
Certain combinations of
sort
andfilter
options.
-
{
"data": {
"documents": [
{
"name": "Sami Minh",
"email": "sami@example.com",
"graduated": true,
"graduation_year": 2024
},
{
"name": "Kiran Jay",
"email": "kiran@example.com",
"graduated": true,
"graduation_year": 2024
}
],
"nextPageState": null
}
}
In the event of pagination, you must issue a subsequent request with a pageState
ID to fetch the next page of rows that matched the filter.
As long as there is a subsequent page with matching rows, the transaction returns a nextPageState
ID, which you use as the pageState
for the subsequent request.
Each paginated request is exactly the same as the original request, except for the addition of the pageState
in the options
object:
{
"find": {
"filter": { "graduation_year": { "$eq": 2024 } },
"options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" }
}
}
Continue issuing requests with the subsequent pageState
ID until you have fetched all matching rows.
Example:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/students" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"find": {
"filter": {
"graduation_year": 2024
}
}
}' | jq