Sort clauses for tables

You can use sort clauses to organize results by similarity to a given vector or by regular ascending or descending order.

For regular sorts, sort can include one or more non-vector columns. If you specify multiple columns, the sorting applies hierarchically to the rows.

For vector search, sort can include only one vector column.

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string.

For more information, see Vector type.

Python
TypeScript
Java
curl

The Python client uses the sort dictionary parameter for vector search or regular sorting.

When no particular order is required:

sort={}  # (default when parameter not provided)

For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order:

from astrapy.constants import SortMode

# Ascending sort (two equivalent forms)
sort={"round": SortMode.ASCENDING}
sort={"round": +1}

# Descending sort (two equivalent forms)
sort={"round": SortMode.DESCENDING}
sort={"round": -1}

Be aware of the order when chaining sorts, as the sorting applies hierarchically:

# Multi-column sort
sort={
    "week_no": SortMode.ASCENDING,
    "day_of_week": SortMode.ASCENDING,
}

# Equivalent form for multi-column sort
sort={"week_no": +1, "day_of_week": +1}

You can use sort to perform a vector search.

For a vector search, you can provide a pre-generated query vector in the form {"vector_column": QUERY_VECTOR}, where QUERY_VECTOR is a list of floats or DataAPIVector. If the specified vector column has a vectorize embedding provider integration, you can generate a query vector from a string using the form {"vector_column": "query text"}.

# Provide a pre-generated query vector as array,
# And then sort by similarity to the given vector.
sort={"vector_column": [0.1, -0.2, 0.3]}

# Provide a binary-encoded pre-generated query vector,
# And then sort by similarity to the given vector.
# Binary encoding is preferrable for performance.
sort={"vector_column": DataAPIVector([0.1, -0.2, 0.3])}

# Generate a vector from a string,
# Run a vector search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"vector_column": "Query text to vectorize"}

Sort example (full script)

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions

my_table.create_index(
    "score_index",
    column="score",
)

my_table.create_index(
    "winner_index",
    column="winner",
    options=TableIndexOptions(
        ascii=False,
        normalize=True,
        case_sensitive=False,
    ),
)

my_table.create_vector_index(
    "m_vector_index",
    column="m_vector",
    options=TableVectorIndexOptions(
        metric=VectorMetric.DOT_PRODUCT,
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
        {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
    ],
)

from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Top vector result using "sort" (on an indexed vector column):
my_table.find_one(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Donna'}

# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
)
# {'winner': 'Caio Gozer'}

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

When no particular order is required:

{ sort: {} }  // (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

{ sort: { column: +1 } }  // ascending
{ sort: { column: -1 } }  // descending

Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion For example, when sorting first by a column1 and then by column2:

{ sort: { column1: 1, column2: 1 } }

You can use sort to perform a vector search. You can supply a pre-generated query vector or, if the vector column has a vectorize integration, you can generate a query vector from a string. For more information, see Vector type.

// Provide a pre-generated query vector as array,
// And then sort by similarity to the given vector.
{ sort: { vector_column: vector([0.4, 0.15, -0.5]) } }

// Generate a vector from a string,
// Run a vector search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration.
{ sort: { vector_column: "Text to vectorize" } }

If your vector search includes projection and includeSimilarity: true, then you must manually include $similarity in the type of the returned rows. For example:

  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

If you don’t include projection, then $similarity is inferred to be a part of the returned rows.

Example:

import { CreateTableDefinition, DataAPIClient, InferTableSchema, vector } from '@datastax/astra-db-ts';

// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });

// Create table schema using bespoke Data API table definition syntax
const TableDefinition = <const>{
  columns: {
    matchId: 'text'
    round: 'tinyint',
    mVector: { type: 'vector', dimension: 3 },
    score: 'int',
    when: 'timestamp',
    winner: 'text',
    fighters: { type: 'set', valueType: 'uuid' },
  },
  primaryKey: {
    partitionBy: ['matchId'],
    partitionSort: { round: 1 },
  },
} satisfies CreateTableDefinition;

type TableSchema = InferTableSchema<typeof TableDefinition>;

(async function () {
  // Create the table if it doesn't already exist
  const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });

  // Create the necessary indexes
  await table.createIndex('winner_idx', 'winner', { ifNotExists: true });
  await table.createVectorIndex('m_vector_idx', 'mVector', { ifNotExists: true });

  // Inserts a few rows for demonstration purposes
  await table.insertMany([
    { matchId: 'fight4', round: 1, winner: 'Victor',      mVector: vector([.1, .2, .3]) },
    { matchId: 'fight5', round: 1, winner: 'Gray Tist',   mVector: vector([.1, .2, .3]) },
    { matchId: 'fight5', round: 2, winner: 'Cham P. Yun', mVector: vector([.9, .8, .7]) },
    { matchId: 'fight5', round: 3, winner: 'Winona',      mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 1, winner: 'Wynn Uhr',    mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 2, winner: 'Zeeb Est',    mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 3, winner: 'Theeg Oat',   mVector: vector([.1, .2, .3]) },
  ]);

  // Random match
  const row0 = await table.findOne({});
  console.log('Match winner:', row0?.winner);

  // Simple single-field sort ('Victor')
  const row1 = await table.findOne({}, { sort: { matchId: 1 } });
  console.log('First match winner:', row1?.winner);

  // Simple multi-field sort ('Wynn Uhr')
  const row3 = await table.findOne({}, { sort: { matchId: -1, round: 1 } });
  console.log('Last match, first round winner:', row3?.winner);

  // Vector sort ('Cham P. Yun')
  const row4 = await table.findOne({}, { sort: { mVector: vector([.9, .8, .7]) } });
  console.log('Match with the closest matching vector:', row4?.winner);

  // Uncomment the following line to drop the table and any related indexes.
  // await table.drop();
})();

The Java client uses sort() for vector search or regular sorting.

The sort() operations are optional. Use them only when needed.

For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order. Be aware of the order when chaining sorts, as the sorting applies hierarchically.

import com.datastax.astra.client.core.query.Sort;

Sort s1 = Sort.ascending("field1");
Sort s2 = Sort.descending("field2");
TableFindOptions options =
  new TableFindOptions().sort(s1, s2);

To perform a vector search, use the Sort.vector(..,..) and Sort.vectorize(..,..) methods.

These methods take two arguments: a column name and a query vector.

If you omit the column name, the client uses the default of $vector or $vectorize, depending on the method. These are reserved fields for collections. With tables, it is important that you specify the name of your vector column in the first argument. This is because a table can contain multiple vector columns, and vector columns can have any valid name.

You can provide sort vectors to these methods in the same way that you write vectors:

// This example uses TableFindOptions, but you can use any sortable object

// Sort with DataAPIVector
DataAPIVector vector = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
new TableFindOptions()
  .sort(Sort.vector("m_vector", vector));

// Sort with float[]
new TableFindOptions()
  .sort(Sort.vector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));

// Sort with vectorize
new TableFindOptions()
  .sort(Sort.vectorize("m_vector", "Text To Serialize"));

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOneOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.definition.rows.Row;

public class WorkingWithSorts {
    public static void main(String[] args) {
        Database db = new DataAPIClient("token").getDatabase("endpoint");

        Table<Row> tableRow = db.getTable("TABLE_NAME");

        // Sort Clause for a vector
        Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;

        // Sort Clause for other fields
        Sort s1 = Sort.ascending("field1");
        Sort s2 = Sort.descending("field2");

        // Build the sort clause
        new TableFindOptions().sort(s1, s2);

        // Adding vector
        new TableFindOptions().sort(
                Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}), s1, s2);

    }
}

# Provide the query vector as an array
"sort": { "vector_column": [ 0.1, -0.2, 0.3 ] }

# Provide the query vector with $binary
"sort": { "vector_column": { "$binary": "PczMzb5MzM0+mZma" } }

# Generate a query vector with vectorize
"sort": { "vector_column": "Text to vectorize" }

For HTTP commands with sort, see Find a row and Find rows.

Sort results and optimization

For best performance, filter and sort on indexed columns, partition keys, and clustering keys.

Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete.

An empty filter ("filter": {}) does not use allow filtering, but it can still be an inefficient and long-running operation.

Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications.

Some combinations of arguments impose an implicit upper bound on the number of rows returned by the Data API:

Vector search returns up to 1000 rows per search operation, unless you set a lower limit.
When using an ascending or descending sort criterion, the Data API returns up to 20 rows at once. The returned rows are the top results across the whole collection based on the filter criteria.

These provisions can also apply when running subsequent commands on client cursors, such as .distinct().

Over HTTP, pagination can occur if there are more than 20 matching rows, but, in some cases, the nextPageState is null regardless of the actual presence of additional results. This can occur with or without sort.

For ascending or descending sort clauses that do not automatically paginate, it is sometimes possible to use the limit and skip options to control the number of rows returned and the starting point of the results, as a form of manual pagination.

When you don’t specify sorting criteria (by vector or otherwise), the following can occur:

Client cursors can scroll through an arbitrary number of rows because the Data API and the client periodically exchange new chunks of rows.
If rows are added or removed after starting a find operation, pagination and client cursor behavior depends on database internals. There is no guarantee as to whether or not the response will pick up such "real-time" changes in the data.

For more information about handling pagination and cursors, see Find rows.

Sort clauses for tables

Sort results and optimization

Was this helpful?

Give Feedback