Find rows reference

This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

A row represents a single record of data in a table in a Astra DB Serverless database. You use the Table class to work with rows through the Data API clients. For instructions to get a Table object, see Work with tables.

For general information about working with rows, including common operations and operators, see Work with rows.

Prerequisites

Required client version

The Data API tables commands are available through HTTP and the clients. If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide.

Find a row

Retrieve a single row from a table using various filter and query options.

For best performance, filter and sort on indexed columns, partition keys, and clustering keys.

Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete.

An empty filter ("filter": {}) does not use allow filtering, but it can still be an inefficient and long-running operation.

Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Use find_one to retrieve a single row matching your filter and sort criteria.

Retrieve a single row from a table that matches a given column and value:

my_table.find_one({"winner": "Caio Gozer"})

The previous example is shorthand for the $eq (equals) operator. There are other filter operators you can use, such as the $gte (greater than or equal to) operator:

my_table.find_one({"score": {"$gte": 15}})

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

# Provide a pre-generated query vector
my_table.find_one({}, sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])})

# Generate a query vector with vectorize
my_table.find_one({}, sort={"m_vector": "Text to vectorize"})

Parameters:

Name Type Summary

filter

dict | None

A dictionary expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. For example:

  • {}: An empty filter arbitrarily returns any one row.

  • {"match_no": 123}: Uses the implied equality ($eq) operator. Shorthand for {"match_no": {"$eq": 123}}.

  • {"match_no": 123, "round": "C"}: Uses the implied equality operator and combines the two conditions with an implicit $and.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

sort

dict | None

This dictionary parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches. The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection

dict | None

Select a subset of columns to include in the response for the returned row:

  • Include only the given columns: {"column1": True, "column2": True}

  • Include all columns except the given columns: {"column1": False, "column2": False}

  • Include all columns (default if empty or unspecified): {"*": True}

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see Projection clauses.

include_similarity

bool | None

If true, the returned row includes a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

general_method_timeout_ms

int | None

A timeout, in milliseconds, to impose on the underlying API request. If not provided, the Table defaults apply. This parameter is aliased as request_timeout_ms and timeout_ms for convenience.

Returns:

dict | None - A dictionary expressing the result if a row is found, otherwise None.

Example response

The following example response is reformatted for clarity:

{
  'match_id': 'challenge6',
  'round': 1,
  'fighters': DataAPISet([]),
  'm_vector': DataAPIVector([0.9, -0.1, -0.3]),
  'score': None,
  'when': None,
  'winner': 'Donna'
}

Example:

Full script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions

my_table.create_index(
    "score_index",
    column="score",
)

my_table.create_index(
    "winner_index",
    column="winner",
    options=TableIndexOptions(
        ascii=False,
        normalize=True,
        case_sensitive=False,
    ),
)

my_table.create_vector_index(
    "m_vector_index",
    column="m_vector",
    options=TableVectorIndexOptions(
        metric=VectorMetric.DOT_PRODUCT,
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
        {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
    ],
)

from astrapy.constants import SortMode
from astrapy.data_types import DataAPITimestamp, DataAPIVector

# Filter on the partitioning:
my_table.find_one({"match_id": "challenge6"})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# A find with no matches:
str(my_table.find_one({"match_id": "not_real"}))
# 'None'

# Optimize bandwidth using a projection:
my_table.find_one({"match_id": "challenge6"}, projection={"round": True, "winner": True})
# {'round': 1, 'winner': 'Donna'}

# Filter on primary key:
my_table.find_one({"match_id": "challenge6", "round": 1})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on a regular indexed column:
my_table.find_one({"winner": "Caio Gozer"})
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Non-equality filter on a regular indexed column:
my_table.find_one({"score": {"$gte": 15}})
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
    {"when": {
        "$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
    }}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Empty filter:
my_table.find_one({})
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
    {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find_one({"round": 3, "winner": "Caio Gozer"})
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find_one(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Donna'}

# Hybrid search with vector sort and non-vector filtering:
my_table.find_one(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Victor'}

# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find_one(
    {},
    sort={"m_vector": [0.2, 0.3, 0.4]},
    projection={"winner": True},
    include_similarity=True,
)
# {'winner': 'Donna', '$similarity': 0.515}

# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
)
# {'winner': 'Caio Gozer'}

# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find_one(
    {"match_id": "fight5"},
    sort={"winner": SortMode.ASCENDING},
    projection={"winner": True},
)
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# {'winner': 'Adam Zuul'}
from astrapy.constants import SortMode
from astrapy.data_types import DataAPITimestamp, DataAPIVector

# Filter on the partitioning:
my_table.find_one({"match_id": "challenge6"})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# A find with no matches:
str(my_table.find_one({"match_id": "not_real"}))
# 'None'

# Optimize bandwidth using a projection:
my_table.find_one({"match_id": "challenge6"}, projection={"round": True, "winner": True})
# {'round': 1, 'winner': 'Donna'}

# Filter on primary key:
my_table.find_one({"match_id": "challenge6", "round": 1})
# {'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on a regular indexed column:
my_table.find_one({"winner": "Caio Gozer"})
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Non-equality filter on a regular indexed column:
my_table.find_one({"score": {"$gte": 15}})
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
    {"when": {
        "$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
    }}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Empty filter:
my_table.find_one({})
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find_one(
    {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
)
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find_one({"round": 3, "winner": "Caio Gozer"})
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# {'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find_one(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Donna'}

# Hybrid search with vector sort and non-vector filtering:
my_table.find_one(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Victor'}

# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find_one(
    {},
    sort={"m_vector": [0.2, 0.3, 0.4]},
    projection={"winner": True},
    include_similarity=True,
)
# {'winner': 'Donna', '$similarity': 0.515}

# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
)
# {'winner': 'Caio Gozer'}

# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find_one(
    {"match_id": "fight5"},
    sort={"winner": SortMode.ASCENDING},
    projection={"winner": True},
)
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# {'winner': 'Adam Zuul'}

For more information, see the Client reference.

Use findOne to retrieve a single row matching your filter and sort criteria.

Retrieve a single row from a table that matches a given column and value:

await table.findOne({ winner: "Gray Tist" });

The previous example is shorthand for the $eq (equals) operator. There are other filter operators you can use, such as the $gte (greater than or equal to) operator:

await table.findOne({ winner: { $gte: 15 } });

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

// Provide a pre-generated query vector
await table.findOne({}, { sort: { mVector: vector([.2, .3, .4]) } });

// Generate a query vector with vectorize
await table.findOne({}, { sort: { mVector: 'text to vectorize' } });

Parameters:

Name Type Summary

filter

TableFilter

An object that defines filter criteria using the Data API filter syntax. Use filter operators to compare columns with literal values. For more information and examples, see Data API operators and Find rows using filter clauses.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

options?

TableFindOneOptions

The options for this operation

Options (TableFindOneOptions):

Name Type Summary

sort?

Sort

The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection?

Projection

Select a subset of columns to include in the response for the returned row:

  • Include only the given columns: {column1: 1, column2: 1}

  • Include all columns except the given columns: {column1: 0, column2: 0}

If empty or unspecified, the default projection (all columns) is used.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

Additionally, DataStax recommends providing your own type for the returned row because projections can break typing guarantees. If your query includes projection, then you must include $similarity in the type of the returned row.

For more information and examples, see Projection clauses.

includeSimilarity?

boolean

If true, the returned row includes a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

If your query includes projection, then you must manually include $similarity in the type of the returned row. If you don’t include projection, then $similarity is inferred to be a part of the returned row. For more information, see Sort clauses and Projection clauses.

timeout?

number | TimeoutDescriptor

The client-side timeout for this operation.

Returns:

Promise<RSchema | null> - A promise which resolves to the result if a row is found, otherwise null.

Example response
{
  match_id: "challenge6",
  round: 1,
  fighters: Set([]),
  mVector: DataAPIVector([0.9, -0.1, -0.3]),
  score: null,
  when: null,
  winner: "Donna",
}

Example:

Full script
import { CreateTableDefinition, DataAPIClient, InferTablePrimaryKey, InferTableSchema, timestamp, uuid, vector } from '@datastax/astra-db-ts';

// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });

// Create table schema using bespoke Data API table definition syntax, and then infer the type.
// For information about table typing and definitions, see the documentation for createTable.
const TableDefinition = <const>{
  columns: {
    matchId: 'text'
    round: 'tinyint',
    mVector: { type: 'vector', dimension: 3 },
    score: 'int',
    when: 'timestamp',
    winner: 'text',
    fighters: { type: 'set', valueType: 'uuid' },
  },
  primaryKey: {
    partitionBy: ['matchId'],
    partitionSort: { round: 1 },
  },
} satisfies CreateTableDefinition;

type TableSchema = InferTableSchema<typeof TableDefinition>;

(async function () {
  // Create a table with the given TableSchema type if a 'games' table doesn't already exist
  const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });

  // Insert some rows in an unordered fashion.
  await table.insertMany([
    { matchId: 'fight4', round: 1, winner: 'Victor', score: 18, when: timestamp('2024-11-28T11:30:00Z'), fighters: new Set([UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-83f9-a127-222222222222')]), mVector: vector([0.4, -0.6, 0.2]) },
    { matchId: 'challenge6', round: 1, winner: 'Donna', mVector: vector([0.9, -0.1, -0.3]) },
    { matchId: 'challenge6', round: 2, winner: 'Erick' },
    { matchId: 'challenge6', round: 3, winner: 'Fiona' },
    { matchId: 'fight5', round: 3, winner: 'Caio Gozer' },
  ]);

  // Create a secondary index on the 'score' column if it doesn't already exist
    await table.createIndex('round_idx', 'round', { ifNotExists: true });

  // Create a secondary index on the 'winner' column with case-insensitivity if it doesn't already exist
  await table.createIndex('winner_idx', 'winner', {
    options: {
      caseSensitive: false,
    },
    ifNotExists: true,
  });

  // Create a vector index on the 'mVector' column if it doesn't already exist
  await table.createVectorIndex('m_vector_idx', 'mVector', {
    options: {
      metric: 'dot_product',
    },
  });

// Use findOne and find to query rows in a table, including vector search.

// findOne examples

  // Find a row with an exact match on one column.
  // Inherently uses the equality ($eq) filter operator.
  await table.findOne({ 'matchId': 'challenge6' }).then(console.log);

  // If there is no match, the response is null.
  await table.findOne({ 'matchId': 'not_real' }).then(console.log);

  // Projections optimize bandwidth by returning specified columns instead of the entire row.
  // Specify the exact return type to prevent accidental type errors.
  // This example prints the values for 'round' and 'winner' in the matching row.
  await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
    projection: { round: 1, winner: 1 },
  }).then(console.log);

  // Find a row using other filter operators on other indexed columns.
  await table.findOne({ score: { $gte: 15 } }).then(console.log);

  // (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
  // The Data API returns a warning, if you are listening for or logging warnings.
  await table.findOne({ score: { $gt: timestamp() } });

  // To get any row, pass an empty filter.
  await table.findOne({}).then(console.log);

  // Use sort to perform a vector search on a vector-indexed column.
  await table.findOne<{ winner: string }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 }
  }).then(console.log);

  // includeSimilarity returns the similarity score for the vector search.
  // If you use a projection, you must include $similarity in the projection type.
  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

  // You can also use sort for regular ascending/descending sorts on a given column.
  await table.findOne({ matchId: 'round5' }, {
    sort: { round: -1 },
  }).then(console.log);

// find examples

  // Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
  // Lazily iterate over results to print output like:
  // - (R:1): winner Donna
  // - (R:2): winner Erick
  // - (R:3): winner Fiona
  for await (const row of table.find({ matchId: 'challenge6' })) {
    console.log(`(R:${row.round}): winner ${row.winner}`);
  }

  // Projections optimize bandwidth by returning specified columns instead of entire rows.
  // Specify the exact return type to prevent accidental type errors.
  type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;

  for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
    console.log(`(R:${row.round}): winner ${row.winner}`);
  }

  // Another example of the implied equality operator.
  await table.find({ matchId: 'challenge6' }).toArray().then(console.log);

  // Find rows using other filter operators on other indexed columns.
  // You can filter on non-indexed columns, but this is long-running and inefficient.
  await table.find({ score: { $gte: 15 } }).toArray().then(console.log);

  // (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
  await table.find({}).toArray().then(console.log);

  // Use sort to perform a vector search on a vector-indexed column.
  await table.find({})
    .sort({ mVector: vector([.2, .3, .4]) })
    .project<{ winner: number }>({ winner: 1 })
    .toArray()
    .then(console.log);

  // Use sort and filter together for a hybrid search.
  // This example also use includeSimilarity to return the similarity scores of the vector search.
  // If you use a projection, you must include $similarity in the projection type.
  await table.find({ matchId: 'fight4' })
    .sort({ mVector: vector([.2, .3, .4]) })
    .includeSimilarity(true)
    .project<{ winner: number, $similarity: number }>({ winner: 1 })
    .toArray()
    .then(console.log);

  // Use sort for regular ascending/descending sorts on a given column.
  await table.find({ matchId: 'round5' })
    .sort({ round: -1 })
    .toArray()
    .then(console.log);

  // You can also use mapping.
  await table.find({ matchId: 'fight5' })
    .sort({ round: -1 })
    .limit(5)
    .map(row => row.winner.toUpperCase())
    .toArray()
    .then(console.log);

  // Uncomment the following line to drop the table and any related indexes.
  // await table.drop();
})();
// Find a row with an exact match on one column.
// Inherently uses the equality ($eq) filter operator.
await table.findOne({ 'matchId': 'challenge6' }).then(console.log);

// If there is no match, the response is null.
await table.findOne({ 'matchId': 'not_real' }).then(console.log);

// Projections optimize bandwidth by returning specified columns instead of the entire row.
// Specify the exact return type to prevent accidental type errors.
// This example prints the values for 'round' and 'winner' in the matching row.
await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
  projection: { round: 1, winner: 1 },
}).then(console.log);

// Find a row using other filter operators on other indexed columns.
await table.findOne({ score: { $gte: 15 } }).then(console.log);

// (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
// The Data API returns a warning, if you are listening for or logging warnings.
await table.findOne({ score: { $gt: timestamp() } });

// To get any row, pass an empty filter.
await table.findOne({}).then(console.log);

// Use sort to perform a vector search on a vector-indexed column.
await table.findOne<{ winner: string }>({}, {
  sort: { mVector: vector([.2, .3, .4]) },
  projection: { winner: 1 }
}).then(console.log);

// includeSimilarity returns the similarity score for the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.findOne<{ winner: string, $similarity: number }>({}, {
  sort: { mVector: vector([.2, .3, .4]) },
  projection: { winner: 1 },
  includeSimilarity: true,
}).then(console.log);

// You can also use sort for regular ascending/descending sorts on a given column.
await table.findOne({ matchId: 'round5' }, {
  sort: { round: -1 },
}).then(console.log);

For more information, see the Client reference.

Use findOne to retrieve a single row matching your filter and sort criteria.

Find one row based on an equality predicate on a column (shorthand for the $eq operator):

// Filter can be built with a where clause
Filter f1 = new Filter()
 .where("winner")
 .isEqualsTo("Caio Gozer");

// Class Filters provides a shorthand for the above
Optional<Row> row = myTable
  .findOne(Filters.eq("winner", "Caio Gozer"));

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

// Given a vector
DataAPIVector vector =
  new DataAPIVector(new float[] {0.2f, 0.3f, 0.4f});

// Perform a vector search
myTable.findOne(new TableFindOneOptions()
  .sort(Sort.vector("m_vector", vector)));

There are other filter operators you can use, such as the $gte (greater than or equal to) operator, as well as other findOne options:

// Import to shorten the code
import static com.datastax.astra.client.core.query.Filters.*;
import static com.datastax.astra.client.core.query.Projection.include;

// Filters can combine multiple conditions
Filter filter = and(
 eq("match_id", "match_0"),
 gt("round", 1),
 eq("winner", "Victor"));

// Some options can be added to the findOne
TableFindOneOptions options = new TableFindOneOptions()
  // Include only these columns in the response
  .projection(include("match_id", "winner", "winner"))
  // Perform a vector similarity search
  .sort(Sort.vector("m_vector", new float[] {0.4f, -0.6f, 0.2f}))
  // Return the similarity score of the vector search
  .includeSimilarity(true);

Optional<Row> row = myTable
  .findOne(filter, options);

You can use dynamic object mapping instead of the default Row object. For example, assume the following entity mapping for a table:

@EntityTable("games")
public class Game {
    @Column(name ="match_id")
    private String matchId;
    @Column(name ="m_vector")
    private DataAPIVector vector;
    private Integer round;
    private Integer score;
    private Instant when;
    private String winner;
    private Set<java.util.UUID> fighters;
}

You can use the findOne method to directly map the result to the entity:

Filter filter = and(
 eq("match_id", "match_0"),
 gt("round", 1),
 eq("winner", "Victor"));
Table<Game> tableGame = db.getTable("games", Game.class);
Optional<Game> myGame = tableGame.findOne(filter);

// Providing a new rowClass for the result (projection?)
Optional<MiniGame> myGame = tableGame.findOne(filter, MiniGame.class);

Parameters:

Name Type Summary

filter

Filter

A filter expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. Filters can be instantiated with its constructor and specialized with method where(..) or leverage the class Filters.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

options

TableFindOneOptions

A wrapper for the different options and specialization of this search.

rowClass

Class<?>

This parameter acts a formal specifier for the type checker. If omitted, the resulting cursor is implicitly an Optional<T>, meaning that the response maintains the same type for the returned rows as the rows in the table itself. Strictly typed code may want to specify this parameter, especially when a projection is given. For related information, albeit in the context of the Python client, see Typing support.

Name Type Summary

sort

Sort

This parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches. The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection

Projection

Select a subset of columns to include in the response for the returned row:

  • Include only the given columns: Projection.include("column1","column2")

  • Include all columns except the given columns: Projection.exclude("column1","column2")

If empty or unspecified, the default projection (return all columns) is used.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see Projection clauses.

includeSimilarity

boolean

If true, the returned row includes a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

timeout

long or Duration

A timeout, in milliseconds (long), to impose on the underlying API request. If not provided, the Table defaults apply.

Returns:

Optional<R> - An optional object of type R that represents the result, if a row is found, otherwise it is Optional.empty(). If rowClass is not specified, then the type R is the same type as that of the rows in the table.

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOneOptions;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.mapping.Column;
import lombok.Data;

import java.util.Optional;

import static com.datastax.astra.client.core.query.Filters.and;
import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Filters.gt;

public class FindOne {
 public static void main(String[] args) {
   Database db = new DataAPIClient("token").getDatabase("endpoint");

   Table<Row> tableRow = db.getTable("games");

   Filter filter = and(
     eq("match_id", "mtch_0"),
     gt("round", 1),
     eq("winner", "Victor"));

   TableFindOneOptions options = new TableFindOneOptions()
     .sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
     .includeSimilarity(true);

     // Find a row
   Optional<Row> row = tableRow.findOne(filter, options);
   row.ifPresent(r -> {
     System.out.println("Row: " + r);
     DataAPIVector v = r.getVector("m_vector");
     System.out.println(r.getInstant("when"));
   });

   // Find a game by match_id
   Table<Game> tableGame = db.getTable("games", Game.class);
   Optional<Game> row2 = tableGame.findOne(filter, options);
     row2.ifPresent(game -> {
         System.out.println("game: " + game.getVector());
         System.out.println(game.getFighters());
         System.out.println(game.getMatchId());
   });

   // Returning another type because you project
   TableFindOneOptions optionsProjection = new TableFindOneOptions()
     .projection(Projection.include("match_id", "winner", "field3"))
     .sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
     .includeSimilarity(true);

   Optional<MiniGame> mini = tableRow.findOne(filter, optionsProjection, MiniGame.class);
   mini.ifPresent(m -> {
    System.out.println("MiniGame: " + m.getWinner());
    System.out.println("MiniGame: " + m.getMatchId());
   });

   Optional<MiniGame> mini2 = tableGame.findOne(filter, optionsProjection, MiniGame.class);
   mini.ifPresent(m -> {
    System.out.println("MiniGame 2: " + m.getWinner());
    System.out.println("MiniGame 2: " + m.getMatchId());
   });

 }

 @Data
 public static class MiniGame {
    @Column(name = "match_id")
    private String matchId;
    private String winner;
 }
}

Use findOne to retrieve a single row matching your filter and sort criteria.

Retrieve a single row from a table that matches a given column and value:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOne": {
    "filter": {
      "COLUMN_NAME": "FILTER_VALUE"
    }
  }
}' | jq

The previous example is shorthand for the $eq (equals) operator. There are other filter operators you can use, such as the $gt (greater than) operator:

"findOne": {
  "filter": { "year": { "$gt": 2000 } }
}

For more filter operator examples, see Find rows using filter clauses.

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

# Provide the query vector as an array
"findOne": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] }
}

# Provide the query vector with $binary
"findOne": {
  "sort": { "vect_emb": { "$binary": "PczMzb5MzM0+mZma" } }
}

# Generate a query vector with vectorize
"findOne": {
  "sort": { "vect_emb": "Text to vectorize" }
}

# Perform a vector search and include the similarity score in the response
"findOne": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
  "options": {
    "includeSimilarity": true
  }
}

To perform a hybrid search, use both sort and filter:

"findOne": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
  "filter": { "year": { "$gt": 2000 } }
}

Parameters:

Name Type Summary

findOne

command

The Data API command to retrieve a row in a table based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. Use filter operators to compare columns with literal values. For more information and examples, see the preceding syntax examples, Data API operators, and Find rows using filter clauses.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

sort

object

Perform a vector similarity search or set the sequence of matches. For more information and examples, see Sort clauses and Vector type.

projection

object

Select a subset of columns to include in the response for the returned row:

  • Include only the given columns: {"column1": True, "column2": True}

  • Include all columns except the given columns: {"column1": False, "column2": False}

If empty or unset, the default projection is used. The default projection includes all columns, but it omits null values.

Over HTTP, the response always omits null values, even if you include them in projection.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see Projection clauses.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

Returns:

A successful response includes a document object that describes the matching row. The contents of document depends on projection. The response always omits columns with null values, regardless of projection.

Example response
"data": {
  "document": {
    "name": "Tal Noor",
    "email": "tal@example.com",
    "graduated": true,
    "graduation_year": 2014
  }
}

If your query used a non-indexed column, the response can include a warning that the query used allow filtering, which is inefficient and potentially resource-intensive.

Example:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/students" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOne": {
    "filter": { "email": "tal@example.com" }
  }
}' | jq

Find rows using filter clauses

Find multiple rows that match a query.

For best performance, filter and sort on indexed columns, partition keys, and clustering keys.

Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete.

An empty filter ("filter": {}) does not use allow filtering, but it can still be an inefficient and long-running operation.

Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Use find to retrieve rows that match your filter and sort criteria.

Run a find with a non-vector filter condition:

my_table.find({"match_id": "challenge6"})

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

Run a vector search and get an iterable over the returned results:

my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    limit=3,
)

Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:

my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
).to_list()

Parameters:

Name Type Summary

filter

dict | None

A dictionary expressing which condition the returned row must satisfy. You can use filter operators to compare columns with literal values. For example:

  • {"match_no": 123}: Uses the implied equality ($eq) operator. Shorthand for {"match_no": {"$eq": 123}}.

  • {"match_no": 123, "round": "C"}: Uses the implied equality operator and combines the two conditions with an implicit $and.

  • (Not recommended) {}: An empty filter returns all rows. This is slow and inefficient.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

sort

dict | None

This dictionary parameter controls the sorting order and, therefore, determines which row is returned if there are multiple matches. The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection

dict | None

Select a subset of columns to include in the response for the returned row:

  • Include only the given columns: {"column1": True, "column2": True}

  • Include all columns except the given columns: {"column1": False, "column2": False}

  • Include all columns (default if empty or unspecified): {"*": True}

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see Projection clauses.

row_type

type

This parameter acts a formal specifier for the type checker. If omitted, the resulting cursor is implicitly a TableFindCursor[ROW, ROW], meaning that it maintains the same type for the returned rows as that of the rows in the table. Strictly typed code may want to specify this parameter, especially when a projection is given. For more information, see Typing support.

skip

int | None

Optionally specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if skip=5, the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit

int | None

Limit the total number of rows returned from the table. The returned cursor stops yielding rows either when it reaches the limit or there are no more rows to return.

include_similarity

bool | None

If true, the returned rows include a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

include_sort_vector

bool | None

If true, you can call the get_sort_vector method on the returned cursor to get the vector used for the vector search. The default is false. This is only relevant for vector search (sort on a vector column) when you want to get the sort vector from the returned cursor. This can be useful with vectorize because you don’t know the sort vector in advance.

You can’t use include_sort_vector with find_one, but you can use include_sort_vector and limit=1 with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

request_timeout_ms

int

A timeout, in milliseconds, to impose on each individual HTTP request to the Data API to accomplish the operation. If not provided, the Table defaults apply. This parameter is aliased as timeout_ms for convenience.

Returns:

TableFindCursor - An object of type astrapy.cursors.TableFindCursor, representing the stream of results.

You can manipulate a TableFindCursor in various ways. Typically, it is iterated over and yields the search results, managing pagination discretely as needed. For more information, see FindCursor.

Invoking .to_list() on a TableFindCursor causes it to consume all rows, and then materialize the entire set of results as the returned list. This is not recommended for find operations that can return a large number of results.

While you iterate over a cursor, rows are retrieved in chunks progressively. It is possible for retrieved chunks to reflect real-time changes (inserts, updates, and deletions) on the table.

Example response
TableFindCursor("games", idle, consumed so far: 0)

Example:

Full script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions

my_table.create_index(
    "score_index",
    column="score",
)

my_table.create_index(
    "winner_index",
    column="winner",
    options=TableIndexOptions(
        ascii=False,
        normalize=True,
        case_sensitive=False,
    ),
)

my_table.create_vector_index(
    "m_vector_index",
    column="m_vector",
    options=TableVectorIndexOptions(
        metric=VectorMetric.DOT_PRODUCT,
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
        {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
    ],
)

from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"when": {
        "$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
    }}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
    {},
    sort={"m_vector": [0.2, 0.3, 0.4]},
    projection={"winner": True},
    limit=3,
    include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...

# Non-vector sorting on a 'partitionSort' column:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...

# Using skip and limit:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    skip=1,
    limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]

# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5"},
    sort={"winner": SortMode.ASCENDING},
    projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...

# Using .map() on a cursor:
winner_cursor = my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL

# Some other examples of cursor manipulation
matches_cursor = my_table.find(
    sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
    next(matches_cursor)
except StopIteration:
    print("StopIteration triggered.")
# StopIteration triggered.
from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Iterate over results:
for row in my_table.find({"match_id": "challenge6"}):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Optimize bandwidth using a projection:
proj = {"round": True, "winner": True}
for row in my_table.find({"match_id": "challenge6"}, projection=proj):
    print(f"(R:{row['round']}): winner {row['winner']}")
# will print:
#   (R:1): winner Donna
#   (R:2): winner Erick
#   (R:3): winner Fiona

# Filter on the partition key:
my_table.find({"match_id": "challenge6"}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on primary key:
my_table.find({"match_id": "challenge6", "round": 1}).to_list()
# [{'match_id': 'challenge6', 'round': 1, 'fighters': DataAPISet([]), ...

# Filter on a regular indexed column:
my_table.find({"winner": "Caio Gozer"}).to_list()
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Non-equality filter on a regular indexed column:
my_table.find({"score": {"$gte": 15}}).to_list()
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"when": {
        "$gte": DataAPITimestamp.from_string("1999-12-31T01:23:44Z")
    }}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Empty filter (not recommended performance-wise):
my_table.find({}).to_list()
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# [{'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...

# Filter on the primary key and a regular non-indexed column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"}
).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Filter on a regular non-indexed column (and incomplete primary key)
# (not recommended performance-wise)
my_table.find({"round": 3, "winner": "Caio Gozer"}).to_list()
# The Data API returned a warning: {'errorCode': 'MISSING_INDEX', ...
# [{'match_id': 'fight5', 'round': 3, 'fighters': DataAPISet([]), ...

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

# Return the numeric value of the vector similarity
# (also demonstrating that one can pass a plain list for a vector):
my_table.find(
    {},
    sort={"m_vector": [0.2, 0.3, 0.4]},
    projection={"winner": True},
    limit=3,
    include_similarity=True,
).to_list()
# [{'winner': 'Donna', '$similarity': 0.515}, {'winner': 'Victor', ...

# Non-vector sorting on a 'partitionSort' column:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
).to_list()
# [{'winner': 'Caio Gozer'}, {'winner': 'Betta Vigo'}, ...

# Using skip and limit:
my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    skip=1,
    limit=2,
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Betta Vigo'}, {'winner': 'Adam Zuul'}]

# Non-vector sorting on a regular column:
# (not recommended performance-wise)
my_table.find(
    {"match_id": "fight5"},
    sort={"winner": SortMode.ASCENDING},
    projection={"winner": True},
).to_list()
# The Data API returned a warning: {'errorCode': 'IN_MEMORY_SORTING...
# [{'winner': 'Adam Zuul'}, {'winner': 'Betta Vigo'}, ...

# Using .map() on a cursor:
winner_cursor = my_table.find(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
    limit=5,
)
print("/".join(winner_cursor.map(lambda row: row["winner"].upper())))
# CAIO GOZER/BETTA VIGO/ADAM ZUUL

# Some other examples of cursor manipulation
matches_cursor = my_table.find(
    sort={"m_vector": DataAPIVector([-0.1, 0.15, 0.3])}
)
matches_cursor.has_next()
# True
next(matches_cursor)
# {'match_id': 'fight4', 'round': 1, 'fighters': DataAPISet([UUID('0193...
matches_cursor.consumed
# 1
matches_cursor.rewind()
matches_cursor.consumed
# 0
matches_cursor.has_next()
# True
matches_cursor.close()
try:
    next(matches_cursor)
except StopIteration:
    print("StopIteration triggered.")
# StopIteration triggered.

For more information, see the Client reference.

Use find to retrieve multiple rows matching your filter and sort criteria.

Find all rows matching a basic equality filter:

await table.find({ matchId: 'challenge6' });

The previous example is shorthand for the $eq (equals) operator. There are other filter operators you can use, such as the $gte (greater than or equal to) operator:

await table.find({ winner: { $gte: 15 } });

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

// Provide a pre-generated query vector
await table.find({}, { sort: { mVector: vector([.2, .3, .4]) } });

// Generate a query vector with vectorize
await table.find({}, { sort: { mVector: 'text to vectorize' } });

To run a hybrid search, combine a filter with a vector search (this example also uses a projection):

table.find<{ winner: string }>({ matchId: 'fight4' }, {
  sort: { mVector: vector([.2, .3, .4]) },
  projection: { winner: 1 },
});

Parameters:

Name Type Summary

filter

TableFilter

An object that defines filter criteria using the Data API filter syntax. For more information and examples, see Data API operators and Find rows using filter clauses.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

option?

TableFindOptions

The options for this operation

Options (TableFindOptions):

Name Type Summary

sort?

Sort

The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection?

Projection

Select a subset of columns to include in the response for the returned rows:

  • Include only the given columns: {column1: 1, column2: 1}

  • Include all columns except the given columns: {column1: 0, column2: 0}

If empty or unspecified, the default projection (all columns) is used.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

Additionally, DataStax recommends providing your own type for the returned rows because projections can break typing guarantees. If your query includes projection, then you must include $similarity in the type of the returned rows.

For more information and examples, see Projection clauses.

skip?

number

Optionally specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if skip=5, the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit?

number

Limit the total number of rows returned from the table. The returned cursor stops yielding rows either when it reaches the limit or there are no more rows to return.

includeSimilarity?

boolean

If true, the returned rows include a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

If your query includes projection, then you must manually include $similarity in the type of the returned row. If you don’t include projection, then $similarity is inferred to be a part of the returned row. For more information, see Sort clauses and Projection clauses.

includeSortVector?

boolean

If true, you can call the cursor.getSortVector() method to get the vector used for the vector search. The default is false. This is only relevant for vector search (sort on a vector column) when you want to get the sort vector from the returned cursor. This can be useful with vectorize because you don’t know the sort vector in advance.

You can’t use includeSortVector with findOne, but you can use includeSortVector and limit(1) with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

timeout?

number | TimeoutDescriptor

The client-side timeout for this operation.

Returns:

TableFindCursor<RSchema> - The FindCursor, which you can configure and iterate over lazily.

Example:

Full script
import { CreateTableDefinition, DataAPIClient, InferTablePrimaryKey, InferTableSchema, timestamp, uuid, vector } from '@datastax/astra-db-ts';

// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });

// Create table schema using bespoke Data API table definition syntax, and then infer the type.
// For information about table typing and definitions, see the documentation for createTable.
const TableDefinition = <const>{
  columns: {
    matchId: 'text'
    round: 'tinyint',
    mVector: { type: 'vector', dimension: 3 },
    score: 'int',
    when: 'timestamp',
    winner: 'text',
    fighters: { type: 'set', valueType: 'uuid' },
  },
  primaryKey: {
    partitionBy: ['matchId'],
    partitionSort: { round: 1 },
  },
} satisfies CreateTableDefinition;

type TableSchema = InferTableSchema<typeof TableDefinition>;

(async function () {
  // Create a table with the given TableSchema type if a 'games' table doesn't already exist
  const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });

  // Insert some rows in an unordered fashion.
  await table.insertMany([
    { matchId: 'fight4', round: 1, winner: 'Victor', score: 18, when: timestamp('2024-11-28T11:30:00Z'), fighters: new Set([UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-83f9-a127-222222222222')]), mVector: vector([0.4, -0.6, 0.2]) },
    { matchId: 'challenge6', round: 1, winner: 'Donna', mVector: vector([0.9, -0.1, -0.3]) },
    { matchId: 'challenge6', round: 2, winner: 'Erick' },
    { matchId: 'challenge6', round: 3, winner: 'Fiona' },
    { matchId: 'fight5', round: 3, winner: 'Caio Gozer' },
  ]);

  // Create a secondary index on the 'score' column if it doesn't already exist
    await table.createIndex('round_idx', 'round', { ifNotExists: true });

  // Create a secondary index on the 'winner' column with case-insensitivity if it doesn't already exist
  await table.createIndex('winner_idx', 'winner', {
    options: {
      caseSensitive: false,
    },
    ifNotExists: true,
  });

  // Create a vector index on the 'mVector' column if it doesn't already exist
  await table.createVectorIndex('m_vector_idx', 'mVector', {
    options: {
      metric: 'dot_product',
    },
  });

// Use findOne and find to query rows in a table, including vector search.

// findOne examples

  // Find a row with an exact match on one column.
  // Inherently uses the equality ($eq) filter operator.
  await table.findOne({ 'matchId': 'challenge6' }).then(console.log);

  // If there is no match, the response is null.
  await table.findOne({ 'matchId': 'not_real' }).then(console.log);

  // Projections optimize bandwidth by returning specified columns instead of the entire row.
  // Specify the exact return type to prevent accidental type errors.
  // This example prints the values for 'round' and 'winner' in the matching row.
  await table.findOne<Pick<TableSchema, 'round' | 'winner'>>({ 'matchId': 'challenge6', round: 1 }, {
    projection: { round: 1, winner: 1 },
  }).then(console.log);

  // Find a row using other filter operators on other indexed columns.
  await table.findOne({ score: { $gte: 15 } }).then(console.log);

  // (Not recommended) You can filter on a non-indexed column, but it is long-running and inefficient.
  // The Data API returns a warning, if you are listening for or logging warnings.
  await table.findOne({ score: { $gt: timestamp() } });

  // To get any row, pass an empty filter.
  await table.findOne({}).then(console.log);

  // Use sort to perform a vector search on a vector-indexed column.
  await table.findOne<{ winner: string }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 }
  }).then(console.log);

  // includeSimilarity returns the similarity score for the vector search.
  // If you use a projection, you must include $similarity in the projection type.
  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

  // You can also use sort for regular ascending/descending sorts on a given column.
  await table.findOne({ matchId: 'round5' }, {
    sort: { round: -1 },
  }).then(console.log);

// find examples

  // Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
  // Lazily iterate over results to print output like:
  // - (R:1): winner Donna
  // - (R:2): winner Erick
  // - (R:3): winner Fiona
  for await (const row of table.find({ matchId: 'challenge6' })) {
    console.log(`(R:${row.round}): winner ${row.winner}`);
  }

  // Projections optimize bandwidth by returning specified columns instead of entire rows.
  // Specify the exact return type to prevent accidental type errors.
  type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;

  for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
    console.log(`(R:${row.round}): winner ${row.winner}`);
  }

  // Another example of the implied equality operator.
  await table.find({ matchId: 'challenge6' }).toArray().then(console.log);

  // Find rows using other filter operators on other indexed columns.
  // You can filter on non-indexed columns, but this is long-running and inefficient.
  await table.find({ score: { $gte: 15 } }).toArray().then(console.log);

  // (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
  await table.find({}).toArray().then(console.log);

  // Use sort to perform a vector search on a vector-indexed column.
  await table.find({})
    .sort({ mVector: vector([.2, .3, .4]) })
    .project<{ winner: number }>({ winner: 1 })
    .toArray()
    .then(console.log);

  // Use sort and filter together for a hybrid search.
  // This example also use includeSimilarity to return the similarity scores of the vector search.
  // If you use a projection, you must include $similarity in the projection type.
  await table.find({ matchId: 'fight4' })
    .sort({ mVector: vector([.2, .3, .4]) })
    .includeSimilarity(true)
    .project<{ winner: number, $similarity: number }>({ winner: 1 })
    .toArray()
    .then(console.log);

  // Use sort for regular ascending/descending sorts on a given column.
  await table.find({ matchId: 'round5' })
    .sort({ round: -1 })
    .toArray()
    .then(console.log);

  // You can also use mapping.
  await table.find({ matchId: 'fight5' })
    .sort({ round: -1 })
    .limit(5)
    .map(row => row.winner.toUpperCase())
    .toArray()
    .then(console.log);

  // Uncomment the following line to drop the table and any related indexes.
  // await table.drop();
})();
// Find rows with an exact match on one column, inherently using the equality ($eq) filter operator.
// Lazily iterate over results to print output like:
// - (R:1): winner Donna
// - (R:2): winner Erick
// - (R:3): winner Fiona
for await (const row of table.find({ matchId: 'challenge6' })) {
  console.log(`(R:${row.round}): winner ${row.winner}`);
}

// Projections optimize bandwidth by returning specified columns instead of entire rows.
// Specify the exact return type to prevent accidental type errors.
type ProjectedSchema = Pick<TableSchema, 'round' | 'winner'>;

for await (const row of table.find({ matchId: 'challenge6' }).project<ProjectedSchema>({ round: 1, winner: 1 })) {
  console.log(`(R:${row.round}): winner ${row.winner}`);
}

// Another example of the implied equality operator.
await table.find({ matchId: 'challenge6' }).toArray().then(console.log);

// Find rows using other filter operators on other indexed columns.
// You can filter on non-indexed columns, but this is long-running and inefficient.
await table.find({ score: { $gte: 15 } }).toArray().then(console.log);

// (Not recommended; long-running and inefficient) To get all rows, pass an empty filter.
await table.find({}).toArray().then(console.log);

// Use sort to perform a vector search on a vector-indexed column.
await table.find({})
  .sort({ mVector: vector([.2, .3, .4]) })
  .project<{ winner: number }>({ winner: 1 })
  .toArray()
  .then(console.log);

// Use sort and filter together for a hybrid search.
// This example also use includeSimilarity to return the similarity scores of the vector search.
// If you use a projection, you must include $similarity in the projection type.
await table.find({ matchId: 'fight4' })
  .sort({ mVector: vector([.2, .3, .4]) })
  .includeSimilarity(true)
  .project<{ winner: number, $similarity: number }>({ winner: 1 })
  .toArray()
  .then(console.log);

// Use sort for regular ascending/descending sorts on a given column.
await table.find({ matchId: 'round5' })
  .sort({ round: -1 })
  .toArray()
  .then(console.log);

// You can also use mapping.
await table.find({ matchId: 'fight5' })
  .sort({ round: -1 })
  .limit(5)
  .map(row => row.winner.toUpperCase())
  .toArray()
  .then(console.log);

For more information, see the Client reference.

Run a find with a non-vector filter condition:

TableCursor<Row> row2 = table
  .find(eq("match_id", "challenge6"));

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

Run a vector search and get an iterable over the returned results:

TableFindOptions options = new TableFindOptions()
 .sort(Sort.vector("m_vector", new float[] {0.4f, -0.6f, 0.2f}))
 .limit(3);

TableCursor<Row> row = table.find(foptions);

Run hybrid search with a filter and vector-similarity sorting, apply a projection to the returned results, and then materialize the matches into a list:

Filter filter = eq("match_id", "fight4");

TableFindOptions options = new TableFindOptions()
 .projection(include("winner"))
 .sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));

List<Row> result = table
  .find(filter, options)
  .toList();

Parameters:

Name Type Summary

filter

Filter

A filter expressing which condition the returned rows must satisfy. You can use filter operators to compare columns with literal values. Filters can be instantiated with its constructor and specialized with method where(..) or leverage the class Filters.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

options

TableFindOptions

A wrapper for the different options and specialization of this search.

rowClass

Class<?>

This parameter acts a formal specifier for the type checker. If omitted, the resulting cursor is implicitly a TableFindCursor<T>, meaning that the response maintains the same type for the returned rows as the rows in the table itself. Strictly typed code may want to specify this parameter, especially when a projection is given. For related information, albeit in the context of the Python client, see Typing support.

Name Type Summary

sort

Sort

The sort parameter can express either a vector search or regular ascending/descending sorting. For more information, see Sort clauses for rows.

projection

Projection

Select a subset of columns to include in the response for the returned rows:

  • Include only the given columns: Projection.include("column1","column2")

  • Include all columns except the given columns: Projection.exclude("column1","column2")

If empty or unspecified, the default projection (all columns) is used.

DataStax recommends using projections to optimize bandwidth, especially to avoid unnecessarily returning large columns, such as vector columns with highly dimensional embeddings.

For more information and examples, see Projection clauses.

includeSimilarity

boolean

If true, the returned rows include a $similarity key with the numeric similarity score representing the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

includeSortVector

boolean

If true, you can call the getSortVector() method on the returned cursor to get the vector used for the vector search. The default is false. This is only relevant for vector search (sort on a vector column) when you want to get the sort vector from the returned cursor. This can be useful with vectorize because you don’t know the sort vector in advance.

You can’t use includeSortVector with findOne, but you can use includeSortVector and limit(1) with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

skip

int

Optionally specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if skip(5), the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit

int

Limit the total number of rows returned from the table. The returned cursor stops yielding rows either when it reaches the limit or there are no more rows to return.

timeout

long or Duration

A timeout, in milliseconds (long), to impose on the underlying API request. If not provided, the Table defaults apply.

Returns:

TableCursor<T, R> - An object representing the stream of results.

You can manipulate a TableCursor in various ways. Typically, it is iterated over and yields the search results, managing pagination discretely as needed.

Invoking .to_list() on a TableCursor causes it to consume all rows, and then materialize the entire set of results as the returned list. This is not recommended for queries that return a large number of results.

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.List;

import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;

public class FindMany {
 public static void main(String[] args) {
   Database db = new DataAPIClient("token").getDatabase("endpoint");

   Table<Row> table = db.getTable("games");

   Filter filter = eq("match_id", "tournamentA");

   TableFindOptions options = new TableFindOptions()
     // .projection(include("match_id", "winner", "field3"))
       .limit(2)
     //.sort(Sort.vector("m_vector", new DataAPIVector(new float[] {0.4f, -0.6f, 0.2f})))
     .includeSortVector(true)
     .includeSimilarity(true);

   TableCursor<Row, Row> row = table.find(filter, options);
    row.forEach(r -> {
      System.out.println("Row: " + r);
    });

     TableCursor<Row, Game> gameCursor = table.find(filter, options, Game.class);
     gameCursor.forEach(g -> {
         System.out.println("Game: " + g.getWinner());
     });


   TableCursor<Row, Row> row2 = table.find(eq("match_id", "tournamentA"));
   row2.getSortVector().ifPresent(v -> {
       System.out.println("Sort Vector: " + v);
   });

   Filter filter3 = eq("match_id", "fight4");
   TableFindOptions options3 = new TableFindOptions()
   .projection(include("winner"))
   .sort(Sort.vector("m_vector", new float[] {0.2f, 0.3f, 0.4f}));
    List<Row> result = table.find(filter3, options3).toList();





 }
}

Use find to retrieve multiple rows matching your filter and sort criteria.

Retrieve multiple rows by a given filter clause:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "COLUMN_NAME": {
        "FILTER_OPERATOR": FILTER_VALUE
      }
    }
    "options": {
      "skip": 5,
      "limit": 10
    }
  }
}' | jq

The filter can include one or more columns, filter operators, and values.

For example, the $eq (equals) operator finds exact matches. The following example retrieves rows where the customer column has the name "Jasmine S." and the city "Jersey City":

"find": {
  "filter": {
    "customer": {
      "$eq": {
        "name": "Jasmine S.",
        "city": "Jersey City"
      }
    }
  }
}

$eq is the default operator if you do not specify an operator:

"find": {
  "filter": {
    "customer": { "name": "Jasmine S.", "city": "Jersey City" }
  }
}

For more information about filter operators, expand the following Filter operator examples or see Data API operators.

Filter operator examples

Use $ne to match rows that do not have the given filter value:

"find": {
  "filter": {
    "$ne": {
      "state": "NJ"
    }
  }
}

Use $in to match any of a series of specified values:

"find": {
  "filter": {
    "city": {
      "$in": [ "Jersey City", "Orange" ]
    }
  }
}

Similarly, use $nin to match rows that don’t have any of the specified values.

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string. For more information, see Vector type.

# Provide the query vector as an array
"find": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] }
}

# Provide the query vector with $binary
"find": {
  "sort": { "vect_emb": { "$binary": "PczMzb5MzM0+mZma" } }
}

# Generate a query vector with vectorize
"find": {
  "sort": { "vect_emb": "Text to vectorize" }
}

# Perform a vector search and include the similarity score in the response
"find": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
  "options": {
    "includeSimilarity": true
  }
}

To perform a hybrid search, use both sort and filter:

"find": {
  "sort": { "vect_emb": [ 0.1, -0.2, 0.3 ] },
  "filter": { "year": { "$gt": 2000 } }
}

Parameters:

Name Type Summary

find

command

The Data API command to retrieve multiple rows in a table based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. For a list of available operators, see Data API operators.

You cannot filter on map, list, or set columns.

To perform a vector search, use sort instead of filter.

sort

object

Perform a vector similarity search or set the order in which rows are returned. For more information and examples, see Sort clauses and Vector type.

projection

object

Select a subset of columns to include in the response for each returned row. If empty or unset, the default projection is used. The default projection includes all columns, but it omits null values.

The response always omits null values, even if you include them in projection.

For more information and examples, see Projection clauses.

skip

integer

Specify a number of rows to bypass (skip) before returning rows. The first n rows matching the query are discarded from the results, and the results begin at the skip+1 row. For example, if "skip": 5, the first 5 rows are discarded, and the results begin at the 6th row.

This parameter is only valid with sort.

limit

integer

Limit the total number of rows returned. Pagination can occur if more than 20 rows are returned in the current set of matching rows. Once the limit is reached, either in a single response or the last page of a paginated response, nothing more is returned.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the row’s vector. This is only valid for vector search (sort on a vector column).

options.includeSortVector

boolean

If true, the response includes the sortVector. The default is false. This is only relevant for vector search (sort on a vector column) when you want the response to include the sort vector. This can be useful with vectorize because you don’t know the sort vector in advance.

"find": {
  "sort": { "vect_emb": [ "some string" ] },
  "options": {
    "includeSortVector": true
  }
}

You can’t use includeSortVector with findOne, but you can use includeSortVector and limit: 1 with find. However, because vector search is approximate (as in approximate nearest neighbor), the lower your limit, the more likely you are to find an approximate, but not maximal, match.

Returns:

A successful response can include a data object that contains documents and nextPageState.

  • documents is an array of objects where each object represents a row matching the given query. The returned values for each row object depend on the projection and options.

  • nextPageState can be null or an ID. If it is an ID, then you can use that ID to fetch the next page of rows that match the filter. If it is null or omitted, then there are no more matches or pages available.

    Some find operations don’t paginate, even if there are additional matches. For example:

    • Operations that require in-memory sort, such as allow filtering on non-indexed columns. The Data API returns a warning if this happens.

    • Vector searches. Vector searches return a maximum of 1000 rows, unless you specify a lower limit or your table does not have 1000 rows.

    • Certain combinations of sort and filter options.

{
  "data": {
    "documents": [
      {
        "name": "Sami Minh",
        "email": "sami@example.com",
        "graduated": true,
        "graduation_year": 2024
      },
      {
        "name": "Kiran Jay",
        "email": "kiran@example.com",
        "graduated": true,
        "graduation_year": 2024
      }
    ],
    "nextPageState": null
  }
}

In the event of pagination, you must issue a subsequent request with a pageState ID to fetch the next page of rows that matched the filter. As long as there is a subsequent page with matching rows, the transaction returns a nextPageState ID, which you use as the pageState for the subsequent request.

Each paginated request is exactly the same as the original request, except for the addition of the pageState in the options object:

{
  "find": {
    "filter": { "graduation_year": { "$eq": 2024 } },
    "options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" }
  }
}

Continue issuing requests with the subsequent pageState ID until you have fetched all matching rows.

Example:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/students" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "graduation_year": 2024
    }
  }
}' | jq

Find distinct values across rows

Get a list of the distinct values in a certain column in a table.

distinct is a client-side operation, which effectively browses all required rows using the logic of the find command, and then collects the unique values found for the given key. There can be performance, latency, and billing implications if there are many matching rows.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Use a specified filter condition to list distinct values for a column:

my_table.distinct("winner", filter={"match_id": "challenge6"})

Parameters:

Name Type Summary

key

str

The name of the field whose value is inspected across rows. Keys are typically just column names, although they can use dot notation to select particular entries in map columns. For set and list columns, individual entries are unrolled automatically. Particularly for lists, numeric indices can be used in the key dot-notation syntax. Examples of acceptable key values include "a_column", "map_column.map_key", and "list_column.2".

filter

dict

A dictionary expressing which condition the inspected rows must satisfy. You can use filter operators to compare columns with literal values. For example:

  • {"match_no": 123}: Uses the implied equality ($eq) operator. Shorthand for {"match_no": {"$eq": 123}}.

  • {"match_no": 123, "round": "C"}: Uses the implied equality operator and combines the two conditions with an implicit $and.

  • {}: An empty filter (all rows).

general_method_timeout_ms

int | None

A timeout, in milliseconds, to impose on the underlying API request. If not provided, the Table defaults apply. This parameter is aliased as timeout_ms for convenience.

request_timeout_ms

int | None

A timeout, in milliseconds, for each API request performed while scrolling through matches (see find). If not provided, this object’s defaults apply.

Example:

Full example script
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
    ],
)

# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']

# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...

# Over a column containing null values
# (also with composite filter):
my_table.distinct(
    "score",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]

# distinct over a set column (automatically "unrolled"):
my_table.distinct(
    "fighters",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...
# distinct within a partition:
my_table.distinct("winner", filter={"match_id": "challenge6"})
# ['Donna', 'Erick', 'Fiona']

# distinct values across the whole table:
# (not recommended performance-wise)
my_table.distinct("winner")
# The Data API returned a warning: {'errorCode': 'ZERO_FILTER_OPERATIONS', ...
# ['Victor', 'Adam Zuul', 'Betta Vigo', 'Caio Gozer', 'Donna', 'Erick', ...

# Over a column containing null values
# (also with composite filter):
my_table.distinct(
    "score",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [18, None]

# distinct over a set column (automatically "unrolled"):
my_table.distinct(
    "fighters",
    filter={"match_id": {"$in": ["fight4", "tournamentA"]}},
)
# [UUID('0193539a-2770-8c09-a32a-111111111111'), UUID('019353e3-00b4-...

The TypeScript client doesn’t have a distinct operation for tables.

Use a filter condition to find rows and then list distinct values in a given column:

// Find with no options
Filter filter = Filters.eq("match_id", "challenge6");

// Show you the distinct match id in the table
List<String> winners = table
 .distinct("winner", filter, String.class);

// Adding an Options
TableDistinctOptions options = new TableDistinctOptions()
  .timeout(1000);
List<String> winners2 = table
 .distinct("winner", filter, String.class, options);

Parameters:

Name Type Summary

fieldName

String

The name of the field on which perform the distinct

filter

Filter

A filter expressing which condition the inspected rows must satisfy. You can use filter operators to compare columns with literal values. Filters can be instantiated with its constructor and specialized with method where(..) or leverage the class Filters.

columnClass

Class<?>

Represents the expected type of the column returned. For example, if the target column is text or ascii, then the returned value type is String.class.

options

TableDistinctOptions

A wrapper for the different options and specialization of this search.

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Filters;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableDistinctOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.cursor.TableCursor;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.List;

import static com.datastax.astra.client.core.query.Filters.eq;
import static com.datastax.astra.client.core.query.Projection.include;

public class Distinct {
 public static void main(String[] args) {
   Database db = new DataAPIClient("token").getDatabase("endpoint");

   Table<Row> table = db.getTable("games");

   // Show you all match id in the table
   List<Row> matches = table.find(null, new TableFindOptions()
      .projection(include("match_id"))).toList();
     matches.forEach(System.out::println);

     Filter filter = Filters.eq("match_id", "challenge6");

     TableDistinctOptions options = new TableDistinctOptions()
             .timeout(1000);
    // Show you the distinct match id in the table
     table.distinct("match_id", filter, String.class)
             .forEach(System.out::println);
 }
}

This operation has no literal equivalent in HTTP. Instead, you can use Find rows using filter clauses, and then use projection and a utility like jq to extract specific values from the response.

Sort clauses

You can use sort clauses to organize results by similarity to a given vector or by regular ascending or descending order.

For regular sorts, sort can include one or more non-vector columns. If you specify multiple columns, the sorting applies hierarchically to the rows.

For vector search, sort can include only one vector column.

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string.

For more information, see Vector type.

  • Python

  • TypeScript

  • Java

  • curl

The Python client uses the sort dictionary parameter for vector search or regular sorting.

When no particular order is required:

sort={}  # (default when parameter not provided)

For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order:

from astrapy.constants import SortMode

# Ascending sort (two equivalent forms)
sort={"round": SortMode.ASCENDING}
sort={"round": +1}

# Descending sort (two equivalent forms)
sort={"round": SortMode.DESCENDING}
sort={"round": -1}

Be aware of the order when chaining sorts, as the sorting applies hierarchically:

# Multi-column sort
sort={
    "week_no": SortMode.ASCENDING,
    "day_of_week": SortMode.ASCENDING,
}

# Equivalent form for multi-column sort
sort={"week_no": +1, "day_of_week": +1}

You can use sort to perform a vector similarity (ANN) search.

For a vector search, you can provide a pre-generated query vector in the form {"vector_column": QUERY_VECTOR}, where QUERY_VECTOR is a list of floats or DataAPIVector. If the specified vector column has a vectorize embedding provider integration, you can generate a query vector from a string using the form {"vector_column": "query text"}.

# Provide a pre-generated query vector as array,
# And then sort by similarity to the given vector.
sort={"vector_column": [0.1, -0.2, 0.3]}

# Provide a binary-encoded pre-generated query vector,
# And then sort by similarity to the given vector.
# Binary encoding is preferrable for performance.
sort={"vector_column": DataAPIVector([0.1, -0.2, 0.3])}

# Generate a vector from a string,
# Run a similarity search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"vector_column": "Query text to vectorize"}
Sort example (full script)
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions

my_table.create_index(
    "score_index",
    column="score",
)

my_table.create_index(
    "winner_index",
    column="winner",
    options=TableIndexOptions(
        ascii=False,
        normalize=True,
        case_sensitive=False,
    ),
)

my_table.create_vector_index(
    "m_vector_index",
    column="m_vector",
    options=TableVectorIndexOptions(
        metric=VectorMetric.DOT_PRODUCT,
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
        {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
    ],
)

from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Top vector result using "sort" (on an indexed vector column):
my_table.find_one(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Donna'}

# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
)
# {'winner': 'Caio Gozer'}

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

When no particular order is required:

{ sort: {} }  // (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

{ sort: { column: +1 } }  // ascending
{ sort: { column: -1 } }  // descending

Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion For example, when sorting first by a column1 and then by column2:

{ sort: { column1: 1, column2: 1 } }

You can use sort to perform a vector similarity (ANN) search. You can supply a pre-generated query vector or, if the vector column has a vectorize integration, you can generate a query vector from a string. For more information, see Vector type.

// Provide a pre-generated query vector as array,
// And then sort by similarity to the given vector.
{ sort: { vector_column: vector([0.4, 0.15, -0.5]) } }

// Generate a vector from a string,
// Run a similarity search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration.
{ sort: { vector_column: "Text to vectorize" } }

If your vector search includes projection and includeSimilarity: true, then you must manually include $similarity in the type of the returned rows. For example:

  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

If you don’t include projection, then $similarity is inferred to be a part of the returned rows.

Example:

import { CreateTableDefinition, DataAPIClient, InferTableSchema, vector } from '@datastax/astra-db-ts';

// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });

// Create table schema using bespoke Data API table definition syntax
const TableDefinition = <const>{
  columns: {
    matchId: 'text'
    round: 'tinyint',
    mVector: { type: 'vector', dimension: 3 },
    score: 'int',
    when: 'timestamp',
    winner: 'text',
    fighters: { type: 'set', valueType: 'uuid' },
  },
  primaryKey: {
    partitionBy: ['matchId'],
    partitionSort: { round: 1 },
  },
} satisfies CreateTableDefinition;

type TableSchema = InferTableSchema<typeof TableDefinition>;

(async function () {
  // Create the table if it doesn't already exist
  const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });

  // Create the necessary indexes
  await table.createIndex('winner_idx', 'winner', { ifNotExists: true });
  await table.createVectorIndex('m_vector_idx', 'mVector', { ifNotExists: true });

  // Inserts a few rows for demonstration purposes
  await table.insertMany([
    { matchId: 'fight4', round: 1, winner: 'Victor',      mVector: vector([.1, .2, .3]) },
    { matchId: 'fight5', round: 1, winner: 'Gray Tist',   mVector: vector([.1, .2, .3]) },
    { matchId: 'fight5', round: 2, winner: 'Cham P. Yun', mVector: vector([.9, .8, .7]) },
    { matchId: 'fight5', round: 3, winner: 'Winona',      mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 1, winner: 'Wynn Uhr',    mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 2, winner: 'Zeeb Est',    mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 3, winner: 'Theeg Oat',   mVector: vector([.1, .2, .3]) },
  ]);

  // Random match
  const row0 = await table.findOne({});
  console.log('Match winner:', row0?.winner);

  // Simple single-field sort ('Victor')
  const row1 = await table.findOne({}, { sort: { matchId: 1 } });
  console.log('First match winner:', row1?.winner);

  // Simple multi-field sort ('Wynn Uhr')
  const row3 = await table.findOne({}, { sort: { matchId: -1, round: 1 } });
  console.log('Last match, first round winner:', row3?.winner);

  // Vector sort ('Cham P. Yun')
  const row4 = await table.findOne({}, { sort: { mVector: vector([.9, .8, .7]) } });
  console.log('Match with the closest matching vector:', row4?.winner);

  // Uncomment the following line to drop the table and any related indexes.
  // await table.drop();
})();

The Java client uses sort() for vector search or regular sorting.

The sort() operations are optional. Use them only when needed.

For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order. Be aware of the order when chaining sorts, as the sorting applies hierarchically.

import com.datastax.astra.client.core.query.Sort;

Sort s1 = Sort.ascending("field1");
Sort s2 = Sort.descending("field2");
TableFindOptions options =
  new TableFindOptions().sort(s1, s2);

To perform a vector search, use the Sort.vector(..,..) and Sort.vectorize(..,..) methods.

These methods take two arguments: a column name and a query vector.

If you omit the column name, the client uses the default of $vector or $vectorize, depending on the method. These are reserved fields for collections. With tables, it is important that you specify the name of your vector column in the first argument. This is because a table can contain multiple vector columns, and vector columns can have any valid name.

You can provide sort vectors to these methods in the same way that you write vectors:

// This example uses TableFindOptions, but you can use any sortable object

// Sort with DataAPIVector
DataAPIVector vector = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
new TableFindOptions()
  .sort(Sort.vector("m_vector", vector));

// Sort with float[]
new TableFindOptions()
  .sort(Sort.vector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));

// Sort with vectorize
new TableFindOptions()
  .sort(Sort.vectorize("m_vector", "Text To Serialize"));

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOneOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.definition.rows.Row;

public class WorkingWithSorts {
    public static void main(String[] args) {
        Database db = new DataAPIClient("token").getDatabase("endpoint");

        Table<Row> tableRow = db.getTable("TABLE_NAME");

        // Sort Clause for a vector
        Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;

        // Sort Clause for other fields
        Sort s1 = Sort.ascending("field1");
        Sort s2 = Sort.descending("field2");

        // Build the sort clause
        new TableFindOptions().sort(s1, s2);

        // Adding vector
        new TableFindOptions().sort(
                Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}), s1, s2);

    }
}
# Provide the query vector as an array
"sort": { "vector_column": [ 0.1, -0.2, 0.3 ] }

# Provide the query vector with $binary
"sort": { "vector_column": { "$binary": "PczMzb5MzM0+mZma" } }

# Generate a query vector with vectorize
"sort": { "vector_column": "Text to vectorize" }

For HTTP commands with sort, see Find a row and Find rows using filter clauses.

Sort results and optimization

For best performance, filter and sort on indexed columns, partition keys, and clustering keys.

Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete.

An empty filter ("filter": {}) does not use allow filtering, but it can still be an inefficient and long-running operation.

Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications.

Some combinations of arguments impose an implicit upper bound on the number of rows returned by the Data API:

  • Vector ANN search returns up to 1000 rows per search operation, unless you set a lower limit.

  • When using an ascending or descending sort criterion, the Data API returns up to 20 rows at once. The returned rows are the top results across the whole collection based on the filter criteria.

    These provisions can also apply when running subsequent commands on client cursors, such as .distinct().

Over HTTP, pagination can occur if there are more than 20 matching rows, but, in some cases, the nextPageState is null regardless of the actual presence of additional results. This can occur with or without sort.

For ascending or descending sort clauses that do not automatically paginate, it is sometimes possible to use the limit and skip options to control the number of rows returned and the starting point of the results, as a form of manual pagination.

When you don’t specify sorting criteria (by vector or otherwise), the following can occur:

  • Client cursors can scroll through an arbitrary number of rows because the Data API and the client periodically exchange new chunks of rows.

  • If rows are added or removed after starting a find operation, pagination and client cursor behavior depends on database internals. There is no guarantee as to whether or not the response will pick up such "real-time" changes in the data.

For more information about handling pagination and cursors, see Find rows using filter clauses.

Projection clauses

In findOne and find, you can use the projection parameter to select specific columns to include or exclude from the response. The projection can’t mix inclusions and exclusions.

If projection is empty or unspecified, the Data API applies the default projection. For rows, the default projection includes all columns.

Over HTTP, the Data API always excludes null values in responses, regardless of projection For example, if find matches multiple rows, the response can include different columns for each row, depending on which columns are null in each row. You cannot forcibly include null values in the response.

The Python, TypeScript and Java clients reconstruct returned rows by adding the columns omitted in the API response. Therefore, when using a client, rows from findOne and find operations always have the same columns, as specified by the projection, regardless of the presence of null values.

In order to optimize the response size and improve performance, DataStax recommends always providing an explicit projection tailored to the needs of the application.

Projection syntax

A projection is expressed as a mapping of column names to boolean values.

Use true mapping to include only the specified columns. For example, the following true mapping returns the columns name and email:

{ "name": true, "email": true }

Alternatively, use a false mapping to exclude the specified columns. All other non-excluded columns are returned. For example, the following false mapping returns all columns except name and email:

{ "name": false, "email": false }

The values in a projection map can be objects, booleans, decimals, or integers, but the Data API ultimately evaluates all of these as booleans.

For example, the following projection evaluates to true (include) for all four columns:

{ "column1": true, "column2": 1, "column3": 90.0, "column4": { "keep": "yes!" } }

Whereas this projection evaluates to false (exclude) for all four columns:

{ "column1": false, "column2": 0, "column3": 0.0, "column4": {} }

Passing null-like types (such as {}, null or 0) for the whole projection mapping is equivalent to not specifying a projection, causing the Data API to use the default projection.

A projection can’t mix include and exclude projections. It can contain only true or only false values for the given columns. For example, {"column1": true, "column2": false} is an invalid projection that results in an API error.

A projection can’t include or exclude sub-column values, such as keys in map columns.

To include all columns, you can use the wildcard projection { "*": true }. This is equivalent to the default projection. The Data API doesn’t support false wildcard projections for tables.

Projection examples by language

  • Python

  • TypeScript

  • Java

  • curl

For the Python client, the projection can be any of the following:

  • A dictionary (dict[str, Any]) to include specific columns in the response, like {column_name: True}.

  • A dictionary (dict[str, Any]) to exclude specific columns from the response, like {column_name: False}.

  • A list or other iterable over key names that are implied to be included in the projection.

# The following two statements are equivalent:
document = table.find_one(
   {"user_id": 101},
   projection={"name": True, "city": True},
)
document = table.find_one(
   {"user_id": 101},
   projection=["name", "city"],
)

# Return the whole rows (default):
table.find({}, projection={'*': True}).to_list()

# Exclude certain columns:
table.find({}, projection={'name': False}).to_list()

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

The TypeScript client takes in an untyped JS object for the projection parameter.

When specifying a projection, you must handle the return type carefully.

To avoid some false typing claims, when it cannot be inferred that a projection is not passed, Partial<Row> is used as the return type for find & findOne operations. This is only Partial, not DeepPartial, so it only makes the top-level keys optional.

DataStax recommends that you provide your own explicit type override on find/findOne operations when using projections.

interface TableSchema {
  matchId: string,
  round: number,
  score?: number | null,
  winner?: string | null,
}

// row1 is of type TableSchema
const row1 = await table.findOne({});

// row2 is of type Partial<TableSchema>
const row2 = await table.findOne({}, {
  projection: { score: 1, winner: 1 }
});

// row3 is of type { score?: string | null, winner?: number | null }
const row3 = await table.findOne<Pick<TableSchema, 'score' | 'winner'>>({}, {
  projection: { score: 1, winner: 1 }
});

// Projections also work with cursors
// rows are of type { score: number, winner: string, $similarity: number }
const cursor = await table.find({})
  .project<{ score: string, winner: number }>({ score: 1, winner: 1 })
  .includeSimilarity();

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

If you perform a vector search that includes projection and includeSimilarity: true, then you must manually include $similarity in the type of the returned rows. For example:

  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

If you don’t include projection, then $similarity is inferred to be a part of the returned rows.

To support the projection mechanism, the Java client has different Options classes that provide the projection method in the helpers. This method takes an array of Projection classes with the column name and a boolean flag indicating inclusion or exclusion:

Projection p1 = new Projection("column1", true);
Projection p2 = new Projection("column2", true);
TableFindOneOptions options1 = new TableFindOneOptions().projection(p1, p2);

To simplify this syntax, you can use the Projections syntactic sugar:

TableFindOneOptions options2 = new TableFindOneOptions()
  .projection(Projections.include("column1", "column2"));

TableFindOneOptions options3 =new TableFindOneOptions()
  .projection(Projections.exclude("column1", "column2"));

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

projection is an optional parameter on find and findOne.

The following example returns the first 100 rows matching the given query, after skipping the first 20 matches. The response contains only the name and email value for each returned row, if they are not null.

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": { "email": "tal@example.com" },
    "projection": { "name": true, "email": true },
    "options": {
      "skip": 20,
      "limit": 100
    }
  }
}' | jq

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com