Find rows

This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

The Data API tables commands are available through HTTP and the clients.

If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide.

Find multiple rows that match a query.

For best performance, filter and sort on indexed columns, partition keys, and clustering keys.

Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete.

An empty filter ("filter": {}) does not use allow filtering, but it can still be an inefficient and long-running operation.

Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications.

A row represents a single record of data in a table in an Astra DB Serverless database.

You use the Table class to work with rows through the Data API clients. For instructions to get a Table object, see Work with tables.

For general information about working with rows, including common operations and operators, see Work with rows.

For more information about the Data API and clients, see Get started with the Data API.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Use find to retrieve rows that match your filter and sort criteria.

Run a find with a non-vector filter condition:

my_table.find({"match_id": "challenge6"})

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

Run a vector search and get an iterable over the returned results:

my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    limit=3,
)

Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:

my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
).to_list()

Parameters:

Name Type Summary

filter

dict | None

A dictionary expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. For example:

  • {"match_no": 123}: Uses the implied equality ($eq) operator. Shorthand for {"match_no": {"$eq": 123}}.

  • {"match_no": 123, "round": "C"}: Uses the implied equality operator and combines the two conditions with an implicit $and.

  • (Not recommended) {}: An empty filter returns all rows. This is slow and inefficient.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

sort

dict | None

This dictionary parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches. The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection

dict | None

Select a subset of columns to include in the response for the returned row:

  • Include only the given columns: {"column1": True, "column2": True}

  • Include all columns except the given columns: {"column1": False, "column2": False}

  • Include all columns (default if empty or unspecified): {"*": True}

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see projection clauses.

row_type

type

This parameter acts a formal specifier for the type checker. If omitted, the resulting cursor is implicitly a TableFindCursor[ROW, ROW], meaning that it maintains the same type for the returned rows as that of the rows in the table. Strictly typed code may want to specify this parameter, especially when a projection is given. For more information, see Typing support.

skip

int | None

Optionally specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if skip=5, the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit

int | None

Limit the total number of rows returned from the table. The returned cursor stops yielding rows either when it reaches the limit or there are no more rows to return.

include_similarity

bool | None

If true, the returned rows include a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

include_sort_vector

bool | None

If true, you can call the get_sort_vector method on the returned cursor to get the vector used for the vector search. The default is false. This is only relevant for vector search (sort on a vector column) when you want to get the sort vector from the returned cursor. This can be useful with vectorize because you don’t know the sort vector in advance.

You can’t use include_sort_vector with find_one, but you can use include_sort_vector and limit=1 with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

request_timeout_ms

int

A timeout, in milliseconds, to impose on each individual HTTP request to the Data API to accomplish the operation. If not provided, the Table defaults apply. This parameter is aliased as timeout_ms for convenience.

Returns:

TableFindCursor - An object of type astrapy.cursors.TableFindCursor, representing the stream of results.

You can manipulate a TableFindCursor in various ways. Typically, it is iterated over and yields the search results, managing pagination discretely as needed. For more information, see FindCursor.

Invoking .to_list() on a TableFindCursor causes it to consume all rows, and then materialize the entire set of results as the returned list. This is not recommended for find operations that can return a large number of results.

While you iterate over a cursor, rows are retrieved in chunks progressively. It is possible for retrieved chunks to reflect real-time changes (inserts, updates, and deletions) on the table.

Example response
TableFindCursor("games", idle, consumed so far: 0)

Example:

Full script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions

my_table.create_index(
    "score_index",
    column="score",
)

my_table.create_index(
    "winner_index",
    column="winner",
    options=TableIndexOptions(
        ascii=False,
        normalize=True,
        case_sensitive=False,
    ),
)

my_table.create_vector_index(
    "m_vector_index",
    column="m_vector",
    options=TableVectorIndexOptions(
        metric=VectorMetric.DOT_PRODUCT,
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
        {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
    ],
)

from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"when": {
        "$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
    }}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
    {},
    sort={"m_vector": [0.2, 0.3, 0.4]},
    projection={"winner": True},
    limit=3,
    include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...

# Non-vector sorting on a 'partitionSort' column:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...

# Using skip and limit:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    skip=1,
    limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]

# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5"},
    sort={"winner": SortMode.ASCENDING},
    projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...

# Using .map() on a cursor:
winner_cursor = my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL

# Some other examples of cursor manipulation
matches_cursor = my_table.find(
    sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
    next(matches_cursor)
except StopIteration:
    print("StopIteration triggered.")
# StopIteration triggered.
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"when": {
        "$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
    }}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
    {},
    sort={"m_vector": [0.2, 0.3, 0.4]},
    projection={"winner": True},
    limit=3,
    include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...

# Non-vector sorting on a 'partitionSort' column:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...

# Using skip and limit:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    skip=1,
    limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]

# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5"},
    sort={"winner": SortMode.ASCENDING},
    projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...

# Using .map() on a cursor:
winner_cursor = my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL

# Some other examples of cursor manipulation
matches_cursor = my_table.find(
    sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
    next(matches_cursor)
except StopIteration:
    print("StopIteration triggered.")
# StopIteration triggered.

For more information, see the Client reference.

Use find to retrieve multiple rows matching your filter and sort criteria.

Find all rows matching a basic equality filter:

await table.find({ matchId: 'challenge6' });

The previous example is shorthand for the $eq (equals) operator. There are other filter operators you can use, such as the $gte (greater than or equal to) operator:

await table.find({ winner: { $gte: 15 } });

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

// Provide a pre-generated query vector
await table.find({}, { sort: { mVector: vector([.2, .3, .4]) } });

// Generate a query vector with vectorize
await table.find({}, { sort: { mVector: 'text to vectorize' } });

To run a hybrid search, combine a filter with a vector search (this example also uses a projection):

table.find<{ winner: string }>({ matchId: 'fight4' }, {
  sort: { mVector: vector([.2, .3, .4]) },
  projection: { winner: 1 },
});

Parameters:

Name Type Summary

filter

TableFilter

An object that defines filter criteria using the Data API filter syntax. For more information and examples, see Data API operators.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

option?

TableFindOptions

The options for this operation

Options (TableFindOptions):

Name Type Summary

sort?

Sort

The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection?

Projection

Select a subset of columns to include in the response for the returned rows:

  • Include only the given columns: {column1: 1, column2: 1}

  • Include all columns except the given columns: {column1: 0, column2: 0}

If empty or unspecified, the default projection (all columns) is used.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

Additionally, DataStax recommends providing your own type for the returned rows because projections can break typing guarantees. If your query includes projection, then you must include $similarity in the type of the returned rows.

For more information and examples, see projection clauses.

skip?

number

Optionally specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if skip=5, the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit?

number

Limit the total number of rows returned from the table. The returned cursor stops yielding rows either when it reaches the limit or there are no more rows to return.

includeSimilarity?

boolean

If true, the returned rows include a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

If your query includes projection, then you must manually include $similarity in the type of the returned row. If you don’t include projection, then $similarity is inferred to be a part of the returned row. For more information, see sort clauses and projection clauses.

includeSortVector?

boolean

If true, you can call the cursor.getSortVector() method to get the vector used for the vector search. The default is false. This is only relevant for vector search (sort on a vector column) when you want to get the sort vector from the returned cursor. This can be useful with vectorize because you don’t know the sort vector in advance.

You can’t use includeSortVector with findOne, but you can use includeSortVector and limit(1) with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

timeout?

number | TimeoutDescriptor

The client-side timeout for this operation.

Returns:

TableFindCursor<RSchema> - The FindCursor, which you can configure and iterate over lazily.

Example:

Full script
import { CreateTableDefinition, DataAPIClient, InferTablePrimaryKey, InferTableSchema, timestamp, uuid, vector } from '@datastax/astra-db-ts';

// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });

// Create table schema using bespoke Data API table definition syntax, and then infer the type.
// For information about table typing and definitions, see the documentation for createTable.
const TableDefinition = <const>{
  columns: {
    matchId: 'text'
    round: 'tinyint',
    mVector: { type: 'vector', dimension: 3 },
    score: 'int',
    when: 'timestamp',
    winner: 'text',
    fighters: { type: 'set', valueType: 'uuid' },
  },
  primaryKey: {
    partitionBy: ['matchId'],
    partitionSort: { round: 1 },
  },
} satisfies CreateTableDefinition;

type TableSchema = InferTableSchema<typeof TableDefinition>;

(async function () {
  // Create a table with the given TableSchema type if a 'games' table doesn't already exist
  const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });

  // Insert some rows in an unordered fashion.
  await table.insertMany([
    { matchId: 'fight4', round: 1, winner: 'Victor', score: 18, when: timestamp('2024-11-28T11:30:00Z'), fighters: new Set([UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-83f9-a127-222222222222')]), mVector: vector([0.4, -0.6, 0.2]) },
    { matchId: 'challenge6', round: 1, winner: 'Donna', mVector: vector([0.9, -0.1, -0.3]) },
    { matchId: 'challenge6', round: 2, winner: 'Erick' },
    { matchId: 'challenge6', round: 3, winner: 'Fiona' },
    { matchId: 'fight5', round: 3, winner: 'Caio Gozer' },
  ]);

  // Create a secondary index on the 'score' column if it doesn't already exist
    await table.createIndex('round_idx', 'round', { ifNotExists: true });

  // Create a secondary index on the 'winner' column with case-insensitivity if it doesn't already exist
  await table.createIndex('winner_idx', 'winner', {
    options: {
      caseSensitive: false,
    },
    ifNotExists: true,
  });

  // Create a vector index on the 'mVector' column if it doesn't already exist
  await table.createVectorIndex('m_vector_idx', 'mVector', {
    options: {
      metric: 'dot_product',
    },
  });

// Use findOne and find to query rows in a table, including vector search.

// findOne examples

  // Find a row with an exact match on one column.
  // Inherently uses the equality ($eq) filter operator.
  await table.findOne({ 'matchId': 'challenge6' }).then(console.log);

  // If there is no match, the response is null.
  await table.findOne({ 'matchId': 'not_real' }).then(console.log);

  // Projections optimize bandwidth by returning specified columns instead of the entire row.
  // Specify the exact return type to prevent accidental type errors.
  // This example prints the values for 'round' and 'winner' in the matching row.
  await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
    projection: { round: 1, winner: 1 },
  }).then(console.log);

  // Find a row using other filter operators on other indexed columns.
  await table.findOne({ score: { $gte: 15 } }).then(console.log);

  // (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
  // The Data API returns a warning, if you are listening for or logging warnings.
  await table.findOne({ score: { $gt: timestamp() } });

  // To get any row, pass an empty filter.
  await table.findOne({}).then(console.log);

  // Use sort to perform a vector search on a vector-indexed column.
  await table.findOne<{ winner: string }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 }
  }).then(console.log);

  // includeSimilarity returns the similarity score for the vector search.
  // If you use a projection, you must include $similarity in the projection type.
  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

  // You can also use sort for regular ascending/descending sorts on a given column.
  await table.findOne({ matchId: 'round5' }, {
    sort: { round: -1 },
  }).then(console.log);

// find examples

  // Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
  // Lazily iterate over results to print output like:
  // - (R:1): winner Donna
  // - (R:2): winner Erick
  // - (R:3): winner Fiona
  for await (const row of table.find({ matchId: 'challenge6' })) {
    console.log(`(R:${row.round}): winner ${row.winner}`);
  }

  // Projections optimize bandwidth by returning specified columns instead of entire rows.
  // Specify the exact return type to prevent accidental type errors.
  type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;

  for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
    console.log(`(R:${row.round}): winner ${row.winner}`);
  }

  // Another example of the implied equality operator.
  await table.find({ matchId: 'challenge6' }).toArray().then(console.log);

  // Find rows using other filter operators on other indexed columns.
  // You can filter on non-indexed columns, but this is long-running and inefficient.
  await table.find({ score: { $gte: 15 } }).toArray().then(console.log);

  // (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
  await table.find({}).toArray().then(console.log);

  // Use sort to perform a vector search on a vector-indexed column.
  await table.find({})
    .sort({ mVector: vector([.2, .3, .4]) })
    .project<{ winner: number }>({ winner: 1 })
    .toArray()
    .then(console.log);

  // Use sort and filter together for a hybrid search.
  // This example also use includeSimilarity to return the similarity scores of the vector search.
  // If you use a projection, you must include $similarity in the projection type.
  await table.find({ matchId: 'fight4' })
    .sort({ mVector: vector([.2, .3, .4]) })
    .includeSimilarity(true)
    .project<{ winner: number, $similarity: number }>({ winner: 1 })
    .toArray()
    .then(console.log);

  // Use sort for regular ascending/descending sorts on a given column.
  await table.find({ matchId: 'round5' })
    .sort({ round: -1 })
    .toArray()
    .then(console.log);

  // You can also use mapping.
  await table.find({ matchId: 'fight5' })
    .sort({ round: -1 })
    .limit(5)
    .map(row => row.winner.toUpperCase())
    .toArray()
    .then(console.log);

  // Uncomment the following line to drop the table and any related indexes.
  // await table.drop();
})();
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
  console.log(`(R:${row.round}): winner ${row.winner}`);
}

// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;

for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
  console.log(`(R:${row.round}): winner ${row.winner}`);
}

// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);

// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);

// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);

// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
  .sort({ mVector: vector([.2, .3, .4]) })
  .project<{ winner: number }>({ winner: 1 })
  .toArray()
  .then(console.log);

// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
  .sort({ mVector: vector([.2, .3, .4]) })
  .includeSimilarity(true)
  .project<{ winner: number, $similarity: number }>({ winner: 1 })
  .toArray()
  .then(console.log);

// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
  .sort({ round: -1 })
  .toArray()
  .then(console.log);

// You can also use mapping.
await table.find({ matchId: 'fight5' })
  .sort({ round: -1 })
  .limit(5)
  .map(row => row.winner.toUpperCase())
  .toArray()
  .then(console.log);

For more information, see the Client reference.

Run a find with a non-vector filter condition:

TableCursor<Row> row2 = table
  .find(eq("match_id", "challenge6"));

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

Run a vector search and get an iterable over the returned results:

TableFindOptions options = new TableFindOptions()
 .sort(Sort.vector("m_vector", new float[] {0.4f, -0.6f, 0.2f}))
 .limit(3);

TableCursor<Row> row = table.find(foptions);

Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:

Filter filter = eq("match_id", "fight4");

TableFindOptions options = new TableFindOptions()
 .projection(include("winner"))
 .sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));

List<Row> result = table
  .find(filter, options)
  .toList();

Parameters:

Name Type Summary

filter

Filter

A filter expressing which condition the returned rows must satisfy. You can use filter operators to compare columns with literal values. Filters can be instantiated with its constructor and specialized with method where(..) or leverage the class Filters.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

options

TableFindOptions

A wrapper for the different options and specialization of this search.

rowClass

Class<?>

This parameter acts a formal specifier for the type checker. If omitted, the resulting cursor is implicitly a TableFindCursor<T>, meaning that the response maintains the same type for the returned rows as the rows in the table itself. Strictly typed code may want to specify this parameter, especially when a projection is given. For related information, albeit in the context of the Python client, see Typing support.

Name Type Summary

sort

Sort

The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection

Projection

Select a subset of columns to include in the response for the returned rows:

  • Include only the given columns: Projection.include("column1","column2")

  • Include all columns except the given columns: Projection.exclude("column1","column2")

If empty or unspecified, the default projection (all columns) is used.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see projection clauses.

includeSimilarity

boolean

If true, the returned rows include a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

includeSortVector

boolean

If true, you can call the getSortVector() method on the returned cursor to get the vector used for the vector search. The default is false. This is only relevant for vector search (sort on a vector column) when you want to get the sort vector from the returned cursor. This can be useful with vectorize because you don’t know the sort vector in advance.

You can’t use includeSortVector with findOne, but you can use includeSortVector and limit(1) with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

skip

int

Optionally specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if skip(5), the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit

int

Limit the total number of rows returned from the table. The returned cursor stops yielding rows either when it reaches the limit or there are no more rows to return.

timeout

long or Duration

A timeout, in milliseconds (long), to impose on the underlying API request. If not provided, the Table defaults apply.

Returns:

TableCursor<T, R> - An object representing the stream of results.

You can manipulate a TableCursor in various ways. Typically, it is iterated over and yields the search results, managing pagination discretely as needed.

Invoking .to_list() on a TableCursor causes it to consume all rows, and then materialize the entire set of results as the returned list. This is not recommended for queries that return a large number of results.

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.List;

import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;

public class FindMany {
 public static void main(String[] args) {
   Database db = new DataAPIClient("token").getDatabase("endpoint");

   Table<Row> table = db.getTable("games");

   Filter filter = eq("match_id", "tournamentA");

   TableFindOptions options = new TableFindOptions()
     // .projection(include("match_id", "winner", "field3"))
       .limit(2)
     //.sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
     .includeSortVector(true)
     .includeSimilarity(true);

   TableCursor<Row, Row> row = table.find(filter, options);
    row.forEach(r -> {
      System.out.println("Row: " + r);
    });

     TableCursor<Row, Game> gameCursor = table.find(filter, options, Game.class);
     gameCursor.forEach(g -> {
         System.out.println("Game: " + g.getWinner());
     });


   TableCursor<Row, Row> row2 = table.find(eq("match_id", "tournamentA"));
   row2.getSortVector().ifPresent(v -> {
       System.out.println("Sort Vector: " + v);
   });

   Filter filter3 = eq("match_id", "fight4");
   TableFindOptions options3 = new TableFindOptions()
   .projection(include("winner"))
   .sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));
    List<Row> result = table.find(filter3, options3).toList();





 }
}

Use find to retrieve multiple rows matching your filter and sort criteria.

Retrieve multiple rows by a given filter clause:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "COLUMN_NAME": {
        "FILTER_OPERATOR": FILTER_VALUE
      }
    }
    "options": {
      "skip": 5,
      "limit": 10
    }
  }
}' | jq

The filter can include one or more columns, filter operators, and values.

For example, the $eq (equals) operator finds exact matches. The following example retrieves rows where the customer column has the name "Jasmine S." and the city "Jersey City":

"find": {
  "filter": {
    "customer": {
      "$eq": {
        "name": "Jasmine S.",
        "city": "Jersey City"
      }
    }
  }
}

$eq is the default operator if you do not specify an operator:

"find": {
  "filter": {
    "customer": { "name": "Jasmine S.", "city": "Jersey City" }
  }
}

For more information about filter operators, expand the following Filter operator examples or see Data API operators.

Filter operator examples

Use $ne to match rows that do not have the given filter value:

"find": {
  "filter": {
    "$ne": {
      "state": "NJ"
    }
  }
}

Use $in to match any of a series of specified values:

"find": {
  "filter": {
    "city": {
      "$in": [ "Jersey City", "Orange" ]
    }
  }
}

Similarly, use $nin to match rows that don’t have any of the specified values.

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

# Provide the query vector as an array
"find": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] }
}

# Provide the query vector with $binary
"find": {
  "sort": { "vect_emb": { "$binary": "PczMzb5MzM0+mZma" } }
}

# Generate a query vector with vectorize
"find": {
  "sort": { "vect_emb": "Text to vectorize" }
}

# Perform a vector search and include the similarity score in the response
"find": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
  "options": {
    "includeSimilarity": true
  }
}

To perform a hybrid search, use both sort and filter:

"find": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
  "filter": { "year": { "$gt": 2000 } }
}

Parameters:

Name Type Summary

find

command

The Data API command to retrieve multiple rows in a table based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. For a list of available operators, see Data API operators.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

sort

object

Perform a vector search or set the order in which rows are returned. For more information and examples, see sort clauses and Vector type.

projection

object

Select a subset of columns to include in the response for each returned row. If empty or unset, the default projection is used. The default projection includes all columns, but it omits null values.

The response always omits null values, even if you include them in projection.

For more information and examples, see projection clauses.

skip

integer

Specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if "skip": 5, the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit

integer

Limit the total number of rows returned. Pagination can occur if more than 20 rows are returned in the current set of matching rows. Once the limit is reached, either in a single response or the last page of a paginated response, nothing more is returned.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

options.includeSortVector

boolean

If true, the response includes the sortVector. The default is false. This is only relevant for vector search (sort on a vector column) when you want the response to include the sort vector. This can be useful with vectorize because you don’t know the sort vector in advance.

"find": {
  "sort": { "vect_emb": [ "some string" ] },
  "options": {
    "includeSortVector": true
  }
}

You can’t use includeSortVector with findOne, but you can use includeSortVector and limit: 1 with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

Returns:

A successful response can include a data object that contains documents and nextPageState.

  • documents is an array of objects where each object represents a row matching the given query. The returned values for each row object depend on the projection and options.

  • nextPageState can be null or an ID. If it is an ID, then you can use that ID to fetch the next page of rows that match the filter. If it is null or omitted, then there are no more matches or pages available.

    Some find operations don’t paginate, even if there are additional matches. For example:

    • Operations that require in-memory sort, such as allow filtering on non-indexed columns. The Data API returns a warning if this happens.

    • Vector searches. Vector searches return a maximum of 1000 rows, unless you specify a lower limit or your table does not have 1000 rows.

    • Certain combinations of sort and filter options.

{
  "data": {
    "documents": [
      {
        "name": "Sami Minh",
        "email": "sami@example.com",
        "graduated": true,
        "graduation_year": 2024
      },
      {
        "name": "Kiran Jay",
        "email": "kiran@example.com",
        "graduated": true,
        "graduation_year": 2024
      }
    ],
    "nextPageState": null
  }
}

In the event of pagination, you must issue a subsequent request with a pageState ID to fetch the next page of rows that matched the filter. As long as there is a subsequent page with matching rows, the transaction returns a nextPageState ID, which you use as the pageState for the subsequent request.

Each paginated request is exactly the same as the original request, except for the addition of the pageState in the options object:

{
  "find": {
    "filter": { "graduation_year": { "$eq": 2024 } },
    "options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" }
  }
}

Continue issuing requests with the subsequent pageState ID until you have fetched all matching rows.

Example:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/students" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "graduation_year": 2024
    }
  }
}' | jq

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com