Work with rows

This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

The Data API tables commands are available through HTTP and the clients.

If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide.

A row represents a single record of data in a table in an Astra DB Serverless database.

You use the Table class to work with rows through the Data API clients. For instructions to get a Table object, see Get a table.

For more information about the Data API and clients, see Get started with the Data API.

Row commands

Column data types

The Data API doesn’t support all CQL type literals and formats. For example, it doesn’t support all methods for representing dates, times, and timestamps with numerical values. Only specific formats are accepted and returned.

The following sections describe the data types that the Data API supports for table columns as well as guidance for working with certain data types.

Number types

The Data API supports the following column types for numbers that are not reserved for representing dates and times.

The Java client accepts and returns native Java types. The TypeScript client accepts and returns JavaScript numbers for all number types except varint and decimal. For HTTP and the Python client, use the formats noted in the following list.

bigint: A long number that accepts values containing digits and - ranging from -9223372036854775808 to 9223372036854775807 (-2^63 to 2^63-1). Only numbers that can be converted to an integer value without loss of information are valid.

The Python client accepts and returns int. The TypeScript client accepts and returns JavaScript numbers.
decimal: A variable-precision decimal returned as a number, including optional - and exponent.

You can store non-decimal values, and those values can be returned without decimal notation. For example, 23 is a valid decimal, and it can be returned as 23 or 23.0.

decimal does not support NaN, Infinity, or -Infinity values.

The Python client accepts decimal.DECIMAL, int, and float, and it returns decimal.DECIMAL.
double: A 64-bit IEEE-754 floating point value that can be a number (including an exponent and +/- 0) or a quoted string containing NaN, Infinity, or -Infinity.

The Python client accepts int and float, including float("NaN"), float("Infinity"), and float("-Infinity"), and it returns float. For more information, see Python client usage: Serdes options and custom data types.
float: A 32-bit IEEE-754 floating point value that can be a number (including E and +/- 0) or a quoted string containing NaN, Infinity, or -Infinity.

The Python client accepts int and float, including float("NaN"), float("Infinity"), and float("-Infinity"), and it returns float. For more information, see Python client usage: Serdes options and custom data types.
int: A 32-bit signed integer returned and accepted as a number, including optional - and exponent. Only numbers that can be converted to a 4-byte integer without loss of information are valid. Exponents must express a whole number.

The Python client accepts and returns int.
smallint: A 2-byte integer returned and accepted as a number, including optional - and exponent. Only numbers that can be converted to a 2-byte integer without loss of information are valid. Exponents must express a whole number.

The Python client accepts and returns int.
tinyint: A 1-byte integer returned and accepted as a number, including optional - and exponent. Only numbers that can be converted to a 1-byte integer without loss of information are valid. Exponents must express a whole number.

The Python client accepts and returns int.
varint: An arbitrary-precision integer of Java type returned and accepted as a number, including optional - and exponent.

The Python client accepts and returns int.

In Data API requests, the limit on maximum number length applies to the literal number of characters sent in the command’s JSON payload. A value’s literal representation in JSON can’t exceed 100 characters, such as a number that is 100 digits long. However, you can use exponents to pass values that represent more than 100 digits, such as 1e120000. In Data API responses, the Data API may return the fully-expressed number, beyond 100 characters.

Date and time types

The Data API supports date, duration, time, and timestamp types for representing dates and times.

For more information about reading and writing these types in the Python client, see Python client usage: Serdes options and custom data types.

Date

A date is a date with no corresponding time value represented in RFC 3339 full-date format (YYYY-MM-DD) where the year is at least 4 digits. If the year is positive and greater than 4 digits, it must be prefixed by +, such as +YYYYYY-MM-DD.

Duration

A duration is a value representing a duration. The preferred format is ISO-8601 interval format, such as P3Y6M4DT12H30M5S.

ISO-8601 duration format rules

Prefixed by P (positive duration) or -P (negative duration).
Contains a non-empty sequence of one or more sets of quantities and units in strict decreasing order.
Quantities are non-negative integers, with the exception of S (seconds), which can be a non-negative decimal.
Units must use the following values in the order given: Y, M, D, H, M, S. Note that the first M is months and the second M is minutes.
If the duration includes sub-day units (hours, minutes, seconds), you must include T before the first sub-day quantity, such as P10DT2H30M or PT2H30M.
Can skip units, but the units must still follow the given order, such as P1Y22DT45M30S.
Cannot repeat units.
Cannot be empty. For zero duration, you must specify at least one unit with a 0 quantity, such as P0Y.

For lossless inserts down to nanosecond precision, use the Apache Cassandra® notation for durations, such as 12y3mo1d or 12y3mo1d12h30m5s.

Cassandra duration notation rules

A non-empty sequence of one or more sets of quantities and units in strict decreasing order.
Optionally prefixed by - (negative duration).
Quantities are non-negative integers of at least one digit.
Units must use the following case-insensitive values in the order given: y, mo, w, d, h, m, s, ms, us (or µs), ns.
Can skip units, but the units must still follow the given order, such as 1y22d45m30s.
Cannot repeat units.
Cannot be empty. For zero duration, you must specify at least one unit with a 0 quantity, such as 0s.

Time

A time is a time of day without an associated date represented in RFC 3339 partial-time format including HH:MM:SS and optional nanoseconds (.nnnnnnnnn).

In responses, sub-seconds are expanded in groups of three digits as needed. For example, a time written as "12:34:56.7887" is subsequently returned as "12:34:56.788700".

Timestamp

A timestamp is a date and time with millisecond precision as a string in complete RFC 3339 date-time format.

RFC 3339 date-time format rules

Include the following components in order given: Year, month, day, T, hour, minute, second, sub-seconds, offset. For example, 1984-01-10T12:01Z.

Only the sub-seconds are optional. All other components are required. If a required component has no value, use zero-padding to achieve the required format.
The year must be at least 4 digits, optionally prefixed by -. If the year is positive and more than 4 digits, it must be prefixed by +.
Month, day, hour, and minute must be 2 digits. For values 0-9, you must include a leading 0, such as 01 for January.
T must separate the date from the time, as in 1984-01-10T12:01Z
The offset can be Z (shorthand for +00:00) or an ISO-formatted offset consisting of + or - followed by HH:MM, such as +05:00 or -08:00. The offset hour and minute must be 2 digits.
Sub-seconds are optional and can include up to 3 digits (millisecond precision), defined as .nnn, such as 1984-01-10T12:01:23.456Z or 1984-01-10T12:01:23.4Z. Sub-seconds are always returned as 3 digits. For example, 1984-01-10T12:01:23.4Z is returned as 1984-01-10T12:01:23.400Z.

timestamp format examples

# Minimum expression
YYYY-MM-DDTHH:MM:SSZ
YYYY-MM-DDTHH:MM:SS+HH:MM
YYYY-MM-DDTHH:MM:SS-HH:MM

# Optional sub-seconds
YYYY-MM-DDTHH:MM:SS.nnnZ
YYYY-MM-DDTHH:MM:SS.nnn+HH:MM
YYYY-MM-DDTHH:MM:SS.nnn-HH:MM

# Optional negative sign before the year
-YYYY-MM-DDTHH:MM:SS.nnnZ
-YYYY-MM-DDTHH:MM:SS.nnn+HH:MM
-YYYY-MM-DDTHH:MM:SS.nnn-HH:MM

# Positive sign before the year if the year is positive and greater than 4 digits
+YYYYYY-MM-DDTHH:MM:SS.nnnZ
+YYYYYY-MM-DDTHH:MM:SS.nnn+HH:MM
+YYYYYY-MM-DDTHH:MM:SS.nnn-HH:MM

Limited timestamp access

The Data API supports the timestamp column type, but it doesn’t provide access to column timestamps. Additionally, the Data API doesn’t set timestamps for any mutation. If you require this timestamp information, DataStax recommends that use you CQL instead of the Data API.

UUID types

Unlike collections, which assign a default _id value to each document, the Data API does not assign default identifiers or default values to rows in tables. Each table’s primary key is the unique identifier for that table’s rows.

To store explicitly-typed UUID data in a table, the Data API fully supports the uuid type and partially supports the timeuuid type:

uuid: Fully supported by the Data API, and it accepts all UUID versions.

uuid data is generally returned and accepted as quoted strings in UUID format, including dashes, such as "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx". The Python client accepts str and UUID (all of astrapy.ids.uuid*), and it returns astrapy.ids.UUID (alias for stdlib uuid.UUID).

objectId is not supported in tables. You can store such hexidecimal strings in tables, but they are not recognized or handled as a true objectId.

timeuuid: Partially supported by the Data API. It is a specific type of UUID that includes timestamp information, and timeuuid data is returned and accepted as a JSON string.

With the Data API, you can read, write, update, and delete values in existing timeuuid columns, but you can’t use the Data API to create tables with timeuuid columns or add columns of type timeuuid to tables.

If you need a timeuuid column in a table, then you must create or alter the table with CQL before using the Data API to manage the values in the timeuuid column. Alternatively, consider using the uuid type.

Vector type

The Data API supports the special vector type (vector<float,dimensions>) for storing vectors, such as embeddings, which you can then index and use for vector searches.

In vector columns, you can store pre-generated vectors. You can also automatically generate vector embeddings from text strings.

When you create or alter a table, you can define multiple vector columns to store vectors with different dimensions or from different embedding models. However, you can run a vector search on only one column at a time.

Define a column to store pre-generated vectors

To store pre-generated vectors in a table, define a vector column with the dimension of the vectors. If you’re not sure what dimension to set, use the vector dimension your embeddings model produces.

The following examples are truncated to focus on the vector column.

Python
TypeScript
Java
curl

Define the vector column in your table definition using a fluent interface, CreateTableDefinition, or a plain dictionary.

With fluent interface
With CreateTableDefinition
With plain dictionary

table_definition = (
    CreateTableDefinition.builder()
    ...
    .add_vector_column("VECTOR_COLUMN_NAME", dimension=NUM_DIMENSIONS)
    ...
    .build()
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

table_definition = CreateTableDefinition(
    columns={
        ...
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=NUM_DIMENSIONS
        ),
        ...
    },
    ...
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

table_definition = {
    "columns": {
        ...
        "VECTOR_COLUMN_NAME": {"type": "vector", "dimension": NUM_DIMENSIONS},
        ...
    },
    ...
}
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

const table = await db.createTable('TABLE_NAME', {
  definition: {
    columns: {
      VECTOR_COLUMN_NAME: {
        type: 'vector',
        dimension: NUM_DIMENSIONS,
      },
      ...
    },
    ...
  },
});

import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;

int NUM_DIMENSIONS = 3;

TableDefinition tableDefinition = new TableDefinition()
 //...
 .addColumnVector("VECTOR_COLUMN_NAME", new ColumnDefinitionVector()
   .dimension(NUM_DIMENSIONS)
   .metric(COSINE))
 //...
;
Table myTable = database.createTable("TABLE_NAME", tableDefinition);

"columns": {
  "VECTOR_COLUMN_NAME": {
    "type": "vector",
    "dimension": NUM_DIMENSIONS
  }
}

Define a column to automatically generate vector embeddings

To automatically generate embeddings with vectorize, define a vector column with embedding provider service options. Astra DB stores the automatically-generated embeddings in this column.

Python
TypeScript
Java
curl

With fluent interface
With CreateTableDefinition
With plain dictionary

table_definition = (
    CreateTableDefinition.builder()
    ...
    .add_vector_column(
        "VECTOR_COLUMN_NAME",
        dimension=NUM_DIMENSIONS,
        service=VectorServiceOptions(
            provider="EMBEDDINGS_PROVIDER_NAME",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "ASTRA_KMS_API_KEY_NAME",
            },
        )
    )
    ...
    .build()
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

# CreateTableDefinition object:
table_definition = CreateTableDefinition(
    columns={
        ...
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=NUM_DIMENSIONS,
            service=VectorServiceOptions(
                provider="EMBEDDINGS_PROVIDER_NAME",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "ASTRA_KMS_API_KEY_NAME",
                },
            )
        ),
        ...
    },
    ...
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

table_definition = {
    "columns": {
        ...
        "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": NUM_DIMENSIONS,
            "service": {
                "provider": "EMBEDDINGS_PROVIDER_NAME",
                "modelName": "MODEL_NAME",
                "authentication": {
                    "providerKey": "ASTRA_KMS_API_KEY_NAME",
                },
            },
        },
        ...
    },
    ...
}
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

const table = await db.createTable('TABLE_NAME', {
  definition: {
    columns: {
      VECTOR_COLUMN_NAME: {
        type: 'vector',
        dimension: NUM_DIMENSIONS,
        service: {
          provider: 'EMBEDDINGS_PROVIDER_NAME',
          modelName: 'MODEL_NAME',
          authentication: {
            providerKey: 'ASTRA_KMS_API_KEY_NAME',
          },
        },
      },
      ...
    },
    ...
  },
});

VectorServiceOptions vectorServiceOptions = new VectorServiceOptions()
  .provider("EMBEDDINGS_PROVIDER_NAME")
  .modelName("MODEL_NAME")
  .authentication(Map.of("providerKey", "ASTRA_KMS_API_KEY_NAME"));

ColumnDefinitionVector columnDefinition = new ColumnDefinitionVector()
  .dimension(3)
  .metric(COSINE)
  .service(vectorServiceOptions);

TableDefinition tableDefinition = new TableDefinition()
  //...
  .addColumnVector("VECTOR_COLUMN_NAME", columnDefinition);

Table myTable = database.createTable("TABLE_NAME", tableDefinition);

"columns": {
  "COLUMN_NAME": "DATA_TYPE",
  "COLUMN_NAME": "DATA_TYPE",
  "VECTOR_COLUMN_NAME": {
    "type": "vector",
    "dimension": NUM_DIMENSIONS,
    "service": {
      "provider": "EMBEDDINGS_PROVIDER_NAME",
      "modelName": "MODEL_NAME",
      "authentication": {
        "providerKey": "ASTRA_KMS_API_KEY_NAME"
      }
    }
  }
}

Vectorize parameters

At minimum, you must specify the embedding provider and model name. However, dimension is optional if the specified model name has a default dimension value, and some embedding providers require additional parameters. To find required parameters and accepted values, use findEmbeddingProviders or see the documentation for your embedding provider integration.

If a table has multiple vector columns with vectorize integrations, all columns must use the same embedding provider integration, including the same dimension, model name, credential, and other settings.

Vectorize authentication

Most embedding provider integrations require authentication through Astra DB KMS or header authentication:

Astra DB KMS (providerKey): Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider. In your vector column definition, include an authentication object with providerKey set to the credential’s API Key name as given in Astra DB KMS. For more information, see Embedding provider authentication.
Header authentication: As an alternative to providerKey, you can omit the authentication object, and then provide the authentication key in an x-embedding-api-key header instead. Header authentication overrides Astra DB KMS if you set both.

Over HTTP, specify this header literally, such as --header "x-embedding-api-key: $EMBEDDING_PROVIDER_API_KEY", and include it with every command that uses vectorize, including writes and vector search.

With the clients, you include the respective header authentication parameter (embedding-api-key, embeddingApiKey, or embeddingAuthProvider) in the top-level create table or get table command parameters (not in the vector column definition). This attaches the embedding provider API key to the resulting Table instance, and then the client automatically attaches the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication.
No authentication: Astra-hosted embedding provider integrations don’t require authentication. However, your database must be in a supported region. For more information, see the documentation for the Astra-hosted embedding provider integrations.

You can use alterTable to add and drop entire vector columns or enable and disable embedding provider integrations on tables.

If you drop the embedding provider integration, the column retains the existing embeddings. You cannot automatically generate new embeddings unless you re-enable the embedding provider integration.

If you add an embedding provider integration to a vector column with existing embeddings, make sure the service options are compatible with the existing embeddings. This ensures accurate vector search results.

Write vector data

vector columns are required to perform vector searches or create vector indexes.

Python
TypeScript
Java
curl

You can insert pre-generated vector data into a vector column by supplying either a list of float values or a DataAPIVector object. The end result on the database will be identical. However, the DataAPIVector values are binary-encoded before sending, which results in improved insertion performance, especially when writing multiple rows.

If the column has a vectorize integration, you can automatically generate an embedding from a string by having the string in the row directly.

# Insert a list of floats sending it as array:
my_table.insert_one({"vector_column": [0.1, -0.2, 0.3], ...})

# Insert a list of floats sending it as binary-encoded:
my_table.insert_one({"vector_column": DataAPIVector([0.1, -0.2, 0.3]), ...})

# Generate the embedding with vectorize
my_table.insert_one({"vector_column": "Text to vectorize", ...})

For additional control over the behavior of vector encoding, you can set the following attributes of the table APIOptions.serdes_options:

Setting unroll_iterables_to_lists to True means that any iterable over numbers is unrolled into a list prior to insertion. That means you can directly use numpy objects, generators, and similar in all cases where a vector is expected. This setting defaults to False as it incurs a slight performance cost.
Setting binary_encode_vectors to False disables binary-encoding of vectors altogether: all vectors are sent to the API as plain arrays of floats. This defaults to True for an improved insertion performance.

For more information, see Python client usage: Serdes options and custom data types.

You can insert pre-generated vector data into a vector column by supplying either a list of numbers or a DataAPIVector object. The end result on the database will be identical. However, the DataAPIVector values are binary-encoded before sending, which results in improved insertion performance, especially when writing multiple rows.

Example using DataAPIVector (recommended):

// Using the explicit class constructor
await table.insertOne({ vectorColumn: new DataAPIVector([0.1, -0.2, 0.3]), ... });

// Using the vector function shorthand
await table.insertOne({ vectorColumn: vector([0.1, -0.2, 0.3]), ... });

You can create a vector from any DataAPIVectorLike object.

export type DataAPIVectorLike = DataAPIVector | number[] | { $binary: string } | Float32Array;

// Creating a DataAPIVector from a Float32Array
const vec = vector(Float32Array.of(.1, -.2, .3));

You can insert a number[] directly, but insertion performance will be decreased. You also will need to provide your own typing for the Table, since InferTableSchema assumes that you are using DataAPIVector objects.

When reading back the vector data, it will be returned as a DataAPIVector object, unless you provide your own deserialization logic to convert it back to a number[].

// Define the table schema manually
interface TableSchema {
  vectorColumn?: number[] | null,
  ...
}
const table = await db.createTable<TableSchema>('TABLE_NAME', { ... });

// Insert as a list of floats
await table.insertOne({ vectorColumn: [0.1, -0.2, 0.3], ... });

If the column has a vectorize integration, you can automatically generate an embedding from a string by having the string in the row directly.

The ability to be able to pass a string will be picked up by InferTableSchema if you provide a service block in the vector column definition.

// Insert using the vectorize feature
await table.insertOne({ vectorColumn: 'Text to vectorize', ... });

// You can still provide an explicit vector if you want
await table.insertOne({ vectorColumn: vector([0.1, -0.2, 0.3]), ... });

You can insert pre-generated vector data into a vector column by supplying either a list of floats or a DataAPIVector object. The end result on the database is the same. However, with the DataAPIVector object, the values are sent in binary-encoded format, which is more performant, especially when writing multiple rows.

Use a list of floats:

tableGames.insertOne(new Row()
  // primary key of the table
 .addText("match_id", "game2").addInt("round", 1)
 // add a vector
 .addVector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));

(Recommended) Use DataAPIVector:

DataAPIVector vector
  = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});

tableGames.insertOne(new Row()
 // primary key of the table
 .addText("match_id", "game1").addInt("round", 1)
 // add a vector with the DataAPIVector wrapper
 .addVector("m_vector", vector));

Underlying JSON representation for DataAPIVector

Because the DataAPIVector class uses binary-formatted serialization, the underlying HTTP request sends $binary in the JSON body. For example:

{
  "insertOne": {
    "document": {
      "match_id": "game1",
      "round": 1,
      "m_vector": {
        "$binary": "PczMzT5MzM0+mZma"
      }
    }
  }
}

For more information about DataAPIVector, see Java client usage.

If the column has a vectorize integration, you can automatically generate an embedding from a String by adding the string to the row directly:

// Assuming m_vector is a vector column with vectorize integration
tableGames.insertOne(new Row()
  // primary key of the table
  .addText("match_id", "game2").addInt("round", 1)
  // Add a vector as a string to generate vector embeddings from
  .addVectorize("m_vector", "Query text to vectorize"));

// Even if the column has a vectorize integration
// it is still valid to add embeddings directly
tableGames.insertOne(new Row()
  // We need a primary Key
  .addText("match_id", "game3").addInt("round", 1)
 // add a vector
  .addVector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));

You can insert pre-generated vector data into a vector column as an array or as a binary encoded value. Alternatively, if the column has a vectorize integration, you can automatically generate an embedding from a string.

# Insert vector data as an array
"vector_column": [ 0.1, -0.2, 0.3 ]

# Insert vector data with $binary
"vector_column": { "$binary": "PczMzb5MzM0+mZma" }

# Generate an embedding with vectorize
"vector_column": "Text to vectorize"

The $binary form is for writes only. Binary encoding can be more economical, in terms of space and resources, when sending vectors in JSON payloads. Once received, the Data API decodes the binary format, and then stores the resulting list of numbers in the table. When you read data from a vector column, the Data API returns the vector as a list of numbers, regardless of the initial write format.

Vector binary encodings specification

A d-dimensional vector is a list of d floating-point numbers that can be binary encoded.

To prepare for encoding, the list must be transformed into a sequence of bytes where each float is represented as four bytes in big-endian format. Then, the byte sequence is Base64-encoded, with = padding, if needed. For example, here are some vectors and their resulting Base64 encoded strings:

[0.1, -0.2, 0.3] = "PczMzb5MzM0+mZma"
[0.1, 0.2] = "PczMzT5MzM0="
[10, 10.5, 100, -91.19] = "QSAAAEEoAABCyAAAwrZhSA=="

Once encoded, you use $binary to pass the Base64 string to the Data API:

{ "$binary": "BASE64_STRING" }

You can use a script to encode your vectors, for example:

python
import base64
import struct

input_vector = [0.1, -0.2, 0.3]
d = len(input_vector)
pack_format = ">" + "f" * d
binary_encode = base64.b64encode(struct.pack(pack_format, *input_vector)).decode()

With vectorize, the table stores the resulting embedding only. The original text string is discarded, unless you manually write the value in another column in the same insert or update command.

If you enable vectorize on a vector column, you are not prevented from using pre-generated embeddings. When you write data to that column, you can use pre-generated embeddings or automatically generate embeddings. Likewise, when you run a vector search on that column, you can either provide a pre-generated query vector or automatically generate a query vector from a string.

Query and read vector data

A vector search retrieves rows that are most similar to a given vector. To run a vector search on a table, use a sort clause with an indexed vector column and a query vector. If your table has multiple vector columns, you can only sort on one vector column at a time.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string.

Python
TypeScript
Java
curl

# Run a vector search with the query vector encoded as array
my_table.find(sort={"vector_column": [0.1, -0.2, 0.3]}, ...)

# Run a vector search with the query vector binary-encoded
my_table.find(sort={"vector_column": DataAPIVector([0.1, -0.2, 0.3])}, ...)

# Run a vector search by generating the query embedding with vectorize:
my_table.find(sort={"vector_column": "Query text to vectorize"}, ...)

By default, the results are returned with DataAPIVector objects in their vector columns. If you set the table APIOptions.serdes_options.custom_datatypes_in_reading to False, all vectors are returned as plain lists of floats, regardless of the format in the underlying HTTP response. This setting also controls how the query vector itself is returned when you invoke find with include_sort_vector=True. For more information, see Python client usage: Serdes options and custom data types.

You can provide sort vectors in the same way that you write vectors:

// Run a vector search using a DataAPIVector
await table.find({}, { sort: { vectorColumn: vector([0.1, -0.2, 0.3]) } });

// Run a vector search using a plain list of floats
// You must provide your own typing or use an untyped table
await table.find({}, { sort: { vectorColumn: [0.1, -0.2, 0.3] } });

// Run a vector search using vectorize
await table.find({}, { sort: { vectorColumn: 'Query text to vectorize' } });

Regardless of how you provide vectors, the vector results are always returned as DataAPIVector objects. The can be converted to arrays lazily using vector.asArray().

If you want to avoid using DataAPIVector objects, you must provide your own custom serialization/deserialization logic for the Table instance. See Custom Ser/Des for more information.

This also applies to find operations that include the includeSortVector option:

// Run a vector search using vectorize
const cursor = await table.find({})
  .sort({ vectorColumn: 'Query text to vectorize' })
  .includeSortVector();

// The query vector is returned as a DataAPIVector object
const queryVector = await cursor.getSortVector();

To perform a vector search, use the Sort.vector(..,..) and Sort.vectorize(..,..) methods.

These methods take two arguments: a column name and a query vector.

If you omit the column name, the client uses the default of $vector or $vectorize, depending on the method. These are reserved fields for collections. With tables, it is important that you specify the name of your vector column in the first argument. This is because a table can contain multiple vector columns, and vector columns can have any valid name.

You can provide sort vectors to these methods in the same way that you write vectors:

// This example uses TableFindOptions, but you can use any sortable object

// Sort with DataAPIVector
DataAPIVector vector = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
new TableFindOptions()
  .sort(Sort.vector("m_vector", vector));

// Sort with float[]
new TableFindOptions()
  .sort(Sort.vector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));

// Sort with vectorize
new TableFindOptions()
  .sort(Sort.vectorize("m_vector", "Text To Serialize"));

# Provide the query vector as an array
"sort": { "vector_column": [ 0.1, -0.2, 0.3 ] }

# Provide the query vector with $binary
"sort": { "vector_column": { "$binary": "PczMzb5MzM0+mZma" } }

# Generate a query vector with vectorize
"sort": { "vector_column": "Text to vectorize" }

The $binary form can be more economical, in terms of space and resources, when sending vectors in JSON payloads. Once received, the Data API decodes the binary format, and then uses the resulting list of numbers for the vector search.

The includeSortVector option returns the query vector as a list of numbers, regardless of the format supplied in the original command.

For more information and examples, see Find rows.

BLOB type

The Data API supports blob type.

Due to a known issue with filtering on blob, DataStax does not recommend using blob in primary keys.

Python
TypeScript
Java
curl

The Python client automatically handles binary data for tables.

Any bytes values sent to the Data API, such as rows to insert or search filters values, are automatically converted to the format expected by the Data API. Likewise, binary data received from the Data API is returned as a bytes value.

# Assume the table has primary key "primary_column" and a "bytes_column" BLOB column
bytes_val = b"=\xfb\xe7m>\xe9x\xd5?I\xfb\xe7"

my_table.insert_one({"primary_column": "row0", "bytes_column": bytes_val})
# Return TableInsertOneResult(inserted_id={'primary_column': 'row0'}, ...  (shortened)

my_table.find_one({"primary_column": "row0"})["bytes_column"]
# Returns b'=\xfb\xe7m>\xe9x\xd5?I\xfb\xe7'

In the previous example, bytes_val, being bytes_val.hex() == "3dfbe76d3ee978d53f49fbe7", translates to the JSON representation {"$binary": "PfvnbT7peNU/Sfvn"}. For more information about the JSON representation, see the curl tab for BLOB.

The TypeScript client automatically handles binary data in a runtime-independent way, through the DataAPIBlob class:

// Using the explicit class constructor
await table.insertOne({ blobColumn: new DataAPIBlob([0x0, 0x1, 0x2]), ... });

// Using the `blob` function shorthand
await table.insertOne({ blobColumn: blob(Buffer.from[0x0, 0x1, 0x2]), ... });

You can create a vector from any DataAPIBlobLike object:

export type DataAPIBlobLike = DataAPIBlob | ArrayBuffer | Buffer | { $binary: string };

// Creating a DataAPIBlob from a base64 encoded string
const binary = blob({ $binary: 'ZXZlciBkcmVhbQ==' });

The same class is also returned when reading back the BLOB data:

const row = await table.findOne({ pkey: 'row0' });
console.log(row.blobColumn.asArrayBuffer());

The Java client automatically handles binary data for tables.

Arrays of bytes (byte[]) sent to the Data API, such as rows to insert or search filters, are automatically converted to the format expected by the Data API. Likewise, binary data received from the Data API is returned as a byte[].

byte[] bytesVal = {
 (byte) 0x3D, (byte) 0xFB, (byte) 0xE7, (byte) 0x6D,
 (byte) 0x3E, (byte) 0xE9, (byte) 0x78, (byte) 0xD5,
 (byte) 0x3F, (byte) 0x49, (byte) 0xFB, (byte) 0xE7
};

// Assume 'myTable' has TEXT-type primary key "primary_column" and "bytes_column" BLOB column
myTable.insertOne(new Row()
 .addText("primary_column", "row0")
 .addBlob("bytes_column", bytesVal));

myTable
 .findOne(Filters.eq("primary_column", "row0"))
 .ifPresent(row -> {
   byte[] bytesColumn = row.getBlob("bytes_column");
   // prints: [61, -5, -25, 109, 62, -23, 120, -43, 63, 73, -5, -25]
});

In the previous example, the array bytes_column translates to the JSON representation {"$binary": "PfvnbT7peNU/Sfvn"}. For more information about the JSON representation, see the curl tab for BLOB.

To write data to a blob column, use $binary with a Base64-encoded string representing the byte sequence you want to store:

"blob_column": { "$binary": "PfvnbT7peNU/Sfvn" }

Over HTTP, you can use $binary to write binary-encoded vectors to vector columns. However, writing binary-encoded vectors to blob columns does not treat that data as true vectors, such as for vector search. For more information about storing vector data and running vector search on tables, see Vector type.

For information and examples for encoding arrays and writing vectors with $binary, see the curl tab for Write vector data.

Other scalar types

ascii: A string of US-ASCII characters.

The Python client accepts and returns str.
boolean

The Python client accepts and returns bool.
inet: Accepts literal IPV4 and IPV6 values. Domain names are rejected for security reasons.

The Python client accepts and returns ipaddress.IPV4Addres and ipaddress.IPV6Address.
text: A UTF-8 encoded string.

The Python client accepts and returns str.

varchar: A UTF-8 encoded string. This is an alias for text. However, varchar is partially supported by the Data API, whereas text is fully supported.

With the Data API, you can read, write, update, and delete values in existing varchar columns, but you can’t use the Data API to create tables with varchar columns or add columns of type varchar to tables.

If you need a varchar column in a table, then you must create or alter the table with CQL before using the Data API to manage the values in the varchar column. Alternatively, consider using the text type.

Map, list, and set types

The Data API supports map, list, and set types for table columns with the following limitations:

No nested maps, lists, or sets, such as map<string,map<string,string>>.
Maps, lists, and sets are not supported in primary keys, indexes, or filter clauses.
Maps can have only text or ascii keys.
Frozen maps, lists, and sets created in CQL (outside the Data API) aren’t accessible through the Data API. For more information, see Support for existing tables.

Map
List
Set

map<keyType,valueType> is a typed set of key-value pairs represented as a JSON object where the keys are unique. When written to a table, maps are internally ordered by their keys, and they are returned in that order.

To define a map column in a table, you specify the map type, as well as the keyType and valueType of the items in the map. The keyType can be only text or ascii. Other key types are not supported.

Python
TypeScript
Java
curl

Example using the fluent interface:

table_definition = (
    CreateTableDefinition.builder()
    ...
    .add_map_column(
        "map_column",
        ColumnType.TEXT,  # or "text"
        ColumnType.INT,  # or "int"
    )
    ...
    .build()
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

Example using CreateTableDefinition:

table_definition = CreateTableDefinition(
    columns={
        ...
        "map_column": TableKeyValuedColumnTypeDescriptor(
            column_type="map",  # or TableKeyValuedColumnType.MAP
            key_type=ColumnType.TEXT,  # or "text"
            value_type=ColumnType.INT,  # or "int"
        ),
        ...
    },
    ...
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

Example using a plain dictionary:

table_definition = {
    "columns": {
        ...
        "map_column": {
            "type": "map", "keyType": "text", "valueType": "int"
        },
        ...
    },
    ...
}
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

const table = await db.createTable('TABLE_NAME', {
  definition: {
    columns: {
      MAP_COLUMN_NAME: {
        type: 'map',
        keyType: 'text',
        valueType: 'int',
      },
      ...
    },
    ...
  },
});

TableDefinition def = new TableDefinition()
  .addColumnMap("map_column", ColumnTypes.TEXT, ColumnTypes.INT);
database
  .createTable("TABLE_NAME", def);

"map_column": {
  "type": "map",
  "keyType": "text",
  "valueType": "int"
}

When you write to a map column, you provide key-value pairs representing items in the map. Keys must match the defined keyType format, and values must use the defined valueType format.

Python
TypeScript
Java
curl

The Python client accepts both simple dictionaries and instances of DataAPIMap for writes.

# Insert a map as a plain dictionary
my_table.insert_one({
    "map_column": {"key1": 42, "key2": 298},
    ...
})

# Insert a map using a DataAPIMap, initialized with a dictionary
my_table.insert_one({
    "map_column": DataAPIMap({"key1": 42, "key2": 298}),
    ...
})

# Insert a map using a DataAPIMap, initialized with a list of pairs
my_table.insert_one({
    "map_column": DataAPIMap([("key1", 42), ("key2", 298)]),
    ...
})

When reading rows, the Python client returns maps as DataAPIMap objects. To return standard Python dictionaries, set the table APIOptions.serdes_options.custom_datatypes_in_reading to False. The default for this setting is True.

Map data is represented by native JavaScript Maps.

await table.insertOne({ mapColumn: new Map([['key1', 42], ['key2', 298]]), ... });

While you can use plain JavaScript objects when inserting map data, it will still be read back as a native Map, unless custom serialization/deserialization logic is used.

Map<String, Integer> myMap = new HashMap<>();
myMap.put("key1", 42);
myMap.put("key2", 298);
table.insertOne(new Row().addMap("map_column", myMap));

"map_column": {
  "key1": 42,
  "key2": 298
}

list<valueType> is a typed collection of ordered, non-unique values represented as a JSON array. When written to a table, lists elements are ordered by their position in the list.

To define a list column in a table, you specify the list type and the valueType of the items in the list.

Python
TypeScript
Java
curl

Example using the fluent interface:

table_definition = (
    CreateTableDefinition.builder()
    ...
    .add_list_column(
        "list_column",
        ColumnType.TEXT,  # or "text"
    )
    ...
    .build()
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

Example using CreateTableDefinition:

table_definition = CreateTableDefinition(
    columns={
        ...
        "list_column": TableValuedColumnTypeDescriptor(
            column_type="list",  # or TableValuedColumnType.LIST
            value_type=ColumnType.TEXT,  # or "text"
        ),
        ...
    },
    ...
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

Example using a plain dictionary:

table_definition = {
    "columns": {
        ...
        "list_column": {
            "type": "list", "valueType": "text"
        },
        ...
    },
    ...
}
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

const table = await db.createTable('TABLE_NAME', {
  definition: {
    columns: {
      LIST_COLUMN_NAME: {
        type: 'list',
        valueType: 'text',
      },
      ...
    },
    ...
  },
});

TableDefinition def = new TableDefinition()
  .addColumnList("list_column", ColumnTypes.TEXT);
Table<Row> table = db.createTable("TABLE_NAME", def);

"list_column": {
  "type": "list",
  "valueType": "text"
}

When you write to a list column, you provide an array of values in the defined valueType format.

Python
TypeScript
Java
curl

my_table.insert_one({
    "list_column": ["Hello", "World", "!"],
    ...
})

List data is represented by native JavaScript arrays.

await table.insertOne({ listColumn: ['Hello', 'World', '!'], ... });

table.insertOne(new Row()
  .addList("list_column", List.of("Hello", "World", "!"))
);

"list_column": [ "Hello", "World", "!" ]

set<valueType> is a typed collection of unique values represented as a JSON array. When written to a table, sets are ordered by their values.

To define a set column in a table, you specify the set type and the valueType of the items in the set.

Python
TypeScript
Java
curl

Example using the fluent interface:

table_definition = (
    CreateTableDefinition.builder()
    ...
    .add_set_column(
        "set_column",
        ColumnType.INT,  # or "int"
    )
    ...
    .build()
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

Example using CreateTableDefinition:

table_definition = CreateTableDefinition(
    columns={
        ...
        "set_column": TableValuedColumnTypeDescriptor(
            column_type="set",  # or TableValuedColumnType.SET
            value_type=ColumnType.INT,  # or "int"
        ),
        ...
    },
    ...
)
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

Example using a plain dictionary:

table_definition = {
    "columns": {
        ...
        "set_column": {
            "type": "set", "valueType": "int"
        },
        ...
    },
    ...
}
my_table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
    ...
)

const table = await db.createTable('TABLE_NAME', {
  definition: {
    columns: {
      SET_COLUMN_NAME: {
        type: 'set',
        valueType: 'int',
      },
      ...
    },
    ...
  },
});
type TableSchema = InferTableSchema<typeof table>;

TableDefinition def = new TableDefinition()
  .addColumnSet("set_column", ColumnTypes.INT);
Table<Row> table = db.createTable("TABLE_NAME", def);

"set_column": {
  "type": "set",
  "valueType": "int"
}

When you write to a set column, you provide an set of values in the defined valueType format.

Python
TypeScript
Java
curl

The Python client accepts both Python sets and instances of DataAPISet.

# Insert a set as a plain Python set
my_table.insert_one({
    "set_column": {9, 8, 7},
    ...
})

# Insert a set using a DataAPISet, initialized with a plain set
my_table.insert_one({
    "set_column": DataAPISet({9, 8, 7}),
    ...
})

# Insert a set using a DataAPISet, initialized with a list of elements
my_table.insert_one({
    "set_column": DataAPISet([9, 8, 7]),
    ...
})

When reading rows, the Python client returns sets as DataAPISet objects, unless the table APIOptions.serdes_options.custom_datatypes_in_reading is False, in which case standard Python sets are used. The default for this setting is True.

Set data is represented by native JavaScript Sets.

await table.insertOne({ setColumn: new Set([9, 8, 8, 7]), ... });

While you can use plain JavaScript arrays when you insert set data, the data is read back as a native Set, unless you use custom serialization/deserialization logic.

table.insertOne(new Row()
  .addSet("set_column", Set.of(9,8,7))
);

"set_column": [ 9, 8, 7 ]

Unsupported types

Unsupported types, formats, and modifiers include:

User-defined types (UDTs)
Counters
Time To Live (TTL)
Column timestamps
Frozen
Static
Tuple

The following types are partially supported or supported with limitations:

timeuuid
varchar
map, list, and set

When working with existing CQL tables that you created outside the Data API, some of these types are partially supported. For more information, see Migrate from CQL.

Mismatched and unsupported type errors

The Data API returns warnings and errors if you use an unsupported type or attempt to insert a value that doesn’t match the column’s defined type.

For example, if you attempt to insert a value that doesn’t match the column’s type, the Data API returns "INVALID_COLUMN_VALUES".

Similarly, the Data API returns an error if you attempt to perform an operation on a column that isn’t compatible with the specified operation. For example, if you attempt to filter on a vector column, the Data API returns "INVALID_FILTER_EXPRESSION".

Operators

Data API provides query and update operators that you can use in filters to find and insert rows.

For examples, see the references for commands that use these operators, such as Find a row, Find rows, Find distinct values, and Update a row.

Query (filter) operators

You can use filter operators on all supported types except map, list, set, and vector.

Operator type Name Purpose

Operator type	Name	Purpose
Comparison query (Range)	`$gt`	Matches rows where the given column’s value is greater than the specified value.
	`$gte`	Matches rows where the given column’s value is greater than or equal to the specified value.
	`$lt`	Matches rows where the given column’s value is less than the specified value.
	`$lte`	Matches rows where the given column’s value is less than or equal to the specified value.
Comparison query (Exact)	`$eq`	Matches rows where a given column’s value equals the specified value. This is the default when you do not specify an operator. This is the only filter operator allowed in `updateOne` and `deleteOne`.
	`$ne`	Matches rows where a given column’s value does not equal the specified value.
	`$in`	Match one or more of an array of specified values. For example, `"filter": { "COLUMN_NAME": { "$in": [ "VALUE", "VALUE" ] } }`. If you have only one value to match, an array is not necessary, such as `{ "$in": "VALUE" }`.
	`$nin`	Matches any of the values that are NOT IN a given array.

Comparison query (Range)

$gt

Matches rows where the given column’s value is greater than the specified value.

$gte

Matches rows where the given column’s value is greater than or equal to the specified value.

$lt

Matches rows where the given column’s value is less than the specified value.

$lte

Matches rows where the given column’s value is less than or equal to the specified value.

Comparison query (Exact)

$eq

Matches rows where a given column’s value equals the specified value. This is the default when you do not specify an operator.

This is the only filter operator allowed in updateOne and deleteOne.

$ne

Matches rows where a given column’s value does not equal the specified value.

$in

Match one or more of an array of specified values. For example, "filter": { "COLUMN_NAME": { "$in": [ "VALUE", "VALUE" ] } }.

If you have only one value to match, an array is not necessary, such as { "$in": "VALUE" }.

$nin

Matches any of the values that are NOT IN a given array.

Update operators

You can use update operators on all supported types.

Name Purpose

Name	Purpose
`$set`	Sets a given column’s value to the specified value.
`$unset`	Sets a given column’s value to `null` or the equivalent empty form, such as `[]` or `{}` for `map`, `list`, and `set` types.

$set

Sets a given column’s value to the specified value.

$unset

Sets a given column’s value to null or the equivalent empty form, such as [] or {} for map, list, and set types.

$rename isn’t supported. Instead, use alterTable to add and drop columns.

Sort clauses

You can use sort clauses to organize results by similarity to a given vector or by regular ascending or descending order.

For regular sorts, sort can include one or more non-vector columns. If you specify multiple columns, the sorting applies hierarchically to the rows.

For vector search, sort can include only one vector column.

You can provide a pre-generated query vector or, if the vector column has a vectorize integration, you can automatically generate a query vector from a string.

For more information, see Vector type.

Python
TypeScript
Java
curl

The Python client uses the sort dictionary parameter for vector search or regular sorting.

When no particular order is required:

sort={}  # (default when parameter not provided)

For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order:

from astrapy.constants import SortMode

# Ascending sort (two equivalent forms)
sort={"round": SortMode.ASCENDING}
sort={"round": +1}

# Descending sort (two equivalent forms)
sort={"round": SortMode.DESCENDING}
sort={"round": -1}

Be aware of the order when chaining sorts, as the sorting applies hierarchically:

# Multi-column sort
sort={
    "week_no": SortMode.ASCENDING,
    "day_of_week": SortMode.ASCENDING,
}

# Equivalent form for multi-column sort
sort={"week_no": +1, "day_of_week": +1}

You can use sort to perform a vector search.

For a vector search, you can provide a pre-generated query vector in the form {"vector_column": QUERY_VECTOR}, where QUERY_VECTOR is a list of floats or DataAPIVector. If the specified vector column has a vectorize embedding provider integration, you can generate a query vector from a string using the form {"vector_column": "query text"}.

# Provide a pre-generated query vector as array,
# And then sort by similarity to the given vector.
sort={"vector_column": [0.1, -0.2, 0.3]}

# Provide a binary-encoded pre-generated query vector,
# And then sort by similarity to the given vector.
# Binary encoding is preferrable for performance.
sort={"vector_column": DataAPIVector([0.1, -0.2, 0.3])}

# Generate a vector from a string,
# Run a vector search,
# And then sort by similarity to the given vector.
# Requires a valid vectorize integration.
sort={"vector_column": "Query text to vectorize"}

Sort example (full script)

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
)

my_table = database.create_table(
    "games",
    definition=(
        CreateTableDefinition.builder()
        .add_column("match_id", ColumnType.TEXT)
        .add_column("round", ColumnType.TINYINT)
        .add_vector_column("m_vector", dimension=3)
        .add_column("score", ColumnType.INT)
        .add_column("when", ColumnType.TIMESTAMP)
        .add_column("winner", ColumnType.TEXT)
        .add_set_column("fighters", ColumnType.UUID)
        .add_partition_by(["match_id"])
        .add_partition_sort({"round": SortMode.ASCENDING})
        .build()
    ),
)

from astrapy.constants import VectorMetric
from astrapy.info import TableIndexOptions, TableVectorIndexOptions

my_table.create_index(
    "score_index",
    column="score",
)

my_table.create_index(
    "winner_index",
    column="winner",
    options=TableIndexOptions(
        ascii=False,
        normalize=True,
        case_sensitive=False,
    ),
)

my_table.create_vector_index(
    "m_vector_index",
    column="m_vector",
    options=TableVectorIndexOptions(
        metric=VectorMetric.DOT_PRODUCT,
    ),
)

from astrapy.data_types import (
    DataAPISet,
    DataAPITimestamp,
    DataAPIVector,
)
from astrapy.ids import UUID

insert_result = my_table.insert_many(
    [
        {
            "match_id": "fight4",
            "round": 1,
            "winner": "Victor",
            "score": 18,
            "when": DataAPITimestamp.from_string(
                "2024-11-28T11:30:00Z",
            ),
            "fighters": DataAPISet([
                UUID("0193539a-2770-8c09-a32a-111111111111"),
                UUID('019353e3-00b4-83f9-a127-222222222222'),
            ]),
            "m_vector": DataAPIVector([0.4, -0.6, 0.2]),
        },
        {
            "match_id": "challenge6",
            "round": 1,
            "winner": "Donna",
            "m_vector": [0.9, -0.1, -0.3],
        },
        {"match_id": "challenge6", "round": 2, "winner": "Erick"},
        {"match_id": "challenge6", "round": 3, "winner": "Fiona"},
        {"match_id": "fight5", "round": 3, "winner": "Caio Gozer"},
    ],
)

from astrapy.constants import SortMode
from astrapy.data_types import DataAPIVector

# Top vector result using "sort" (on an indexed vector column):
my_table.find_one(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
)
# {'winner': 'Donna'}

# Non-vector sorting on a 'partitionSort' column:
my_table.find_one(
    {"match_id": "fight5"},
    sort={"round": SortMode.DESCENDING},
    projection={"winner": True},
)
# {'winner': 'Caio Gozer'}

# Vector search with "sort" (on an appropriately-indexed vector column):
my_table.find(
    {},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Donna'}, {'winner': 'Victor'}]

# Hybrid search with vector sort and non-vector filtering:
my_table.find(
    {"match_id": "fight4"},
    sort={"m_vector": DataAPIVector([0.2, 0.3, 0.4])},
    projection={"winner": True},
    limit=3,
).to_list()
# [{'winner': 'Victor'}]

When no particular order is required:

{ sort: {} }  // (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

{ sort: { column: +1 } }  // ascending
{ sort: { column: -1 } }  // descending

Be aware of the order when chaining multiple sorts because ES2015+ guarantees string keys in order of insertion For example, when sorting first by a column1 and then by column2:

{ sort: { column1: 1, column2: 1 } }

You can use sort to perform a vector search. You can supply a pre-generated query vector or, if the vector column has a vectorize integration, you can generate a query vector from a string. For more information, see Vector type.

// Provide a pre-generated query vector as array,
// And then sort by similarity to the given vector.
{ sort: { vector_column: vector([0.4, 0.15, -0.5]) } }

// Generate a vector from a string,
// Run a vector search,
// And then sort by similarity to the given vector.
// Requires a valid vectorize integration.
{ sort: { vector_column: "Text to vectorize" } }

If your vector search includes projection and includeSimilarity: true, then you must manually include $similarity in the type of the returned rows. For example:

  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

If you don’t include projection, then $similarity is inferred to be a part of the returned rows.

Example:

import { CreateTableDefinition, DataAPIClient, InferTableSchema, vector } from '@datastax/astra-db-ts';

// Instantiate the client and connect to the database
const client = new DataAPIClient();
const db = client.db(process.env.CLIENT_DB_URL!, { token: process.env.CLIENT_DB_TOKEN! });

// Create table schema using bespoke Data API table definition syntax
const TableDefinition = <const>{
  columns: {
    matchId: 'text'
    round: 'tinyint',
    mVector: { type: 'vector', dimension: 3 },
    score: 'int',
    when: 'timestamp',
    winner: 'text',
    fighters: { type: 'set', valueType: 'uuid' },
  },
  primaryKey: {
    partitionBy: ['matchId'],
    partitionSort: { round: 1 },
  },
} satisfies CreateTableDefinition;

type TableSchema = InferTableSchema<typeof TableDefinition>;

(async function () {
  // Create the table if it doesn't already exist
  const table = await db.createTable<TableSchema>('games', { definition: TableDefinition, ifNotExists: true });

  // Create the necessary indexes
  await table.createIndex('winner_idx', 'winner', { ifNotExists: true });
  await table.createVectorIndex('m_vector_idx', 'mVector', { ifNotExists: true });

  // Inserts a few rows for demonstration purposes
  await table.insertMany([
    { matchId: 'fight4', round: 1, winner: 'Victor',      mVector: vector([.1, .2, .3]) },
    { matchId: 'fight5', round: 1, winner: 'Gray Tist',   mVector: vector([.1, .2, .3]) },
    { matchId: 'fight5', round: 2, winner: 'Cham P. Yun', mVector: vector([.9, .8, .7]) },
    { matchId: 'fight5', round: 3, winner: 'Winona',      mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 1, winner: 'Wynn Uhr',    mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 2, winner: 'Zeeb Est',    mVector: vector([.1, .2, .3]) },
    { matchId: 'fight7', round: 3, winner: 'Theeg Oat',   mVector: vector([.1, .2, .3]) },
  ]);

  // Random match
  const row0 = await table.findOne({});
  console.log('Match winner:', row0?.winner);

  // Simple single-field sort ('Victor')
  const row1 = await table.findOne({}, { sort: { matchId: 1 } });
  console.log('First match winner:', row1?.winner);

  // Simple multi-field sort ('Wynn Uhr')
  const row3 = await table.findOne({}, { sort: { matchId: -1, round: 1 } });
  console.log('Last match, first round winner:', row3?.winner);

  // Vector sort ('Cham P. Yun')
  const row4 = await table.findOne({}, { sort: { mVector: vector([.9, .8, .7]) } });
  console.log('Match with the closest matching vector:', row4?.winner);

  // Uncomment the following line to drop the table and any related indexes.
  // await table.drop();
})();

The Java client uses sort() for vector search or regular sorting.

The sort() operations are optional. Use them only when needed.

For regular (non-vector) sorts, you must specify the columns to sort on and ascending/descending order. Be aware of the order when chaining sorts, as the sorting applies hierarchically.

import com.datastax.astra.client.core.query.Sort;

Sort s1 = Sort.ascending("field1");
Sort s2 = Sort.descending("field2");
TableFindOptions options =
  new TableFindOptions().sort(s1, s2);

To perform a vector search, use the Sort.vector(..,..) and Sort.vectorize(..,..) methods.

These methods take two arguments: a column name and a query vector.

You can provide sort vectors to these methods in the same way that you write vectors:

// This example uses TableFindOptions, but you can use any sortable object

// Sort with DataAPIVector
DataAPIVector vector = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
new TableFindOptions()
  .sort(Sort.vector("m_vector", vector));

// Sort with float[]
new TableFindOptions()
  .sort(Sort.vector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));

// Sort with vectorize
new TableFindOptions()
  .sort(Sort.vectorize("m_vector", "Text To Serialize"));

Example:

package com.datastax.astra.client.tables;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Projection;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.commands.options.TableFindOneOptions;
import com.datastax.astra.client.tables.commands.options.TableFindOptions;
import com.datastax.astra.client.tables.definition.rows.Row;

public class WorkingWithSorts {
    public static void main(String[] args) {
        Database db = new DataAPIClient("token").getDatabase("endpoint");

        Table<Row> tableRow = db.getTable("TABLE_NAME");

        // Sort Clause for a vector
        Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;

        // Sort Clause for other fields
        Sort s1 = Sort.ascending("field1");
        Sort s2 = Sort.descending("field2");

        // Build the sort clause
        new TableFindOptions().sort(s1, s2);

        // Adding vector
        new TableFindOptions().sort(
                Sort.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}), s1, s2);

    }
}

# Provide the query vector as an array
"sort": { "vector_column": [ 0.1, -0.2, 0.3 ] }

# Provide the query vector with $binary
"sort": { "vector_column": { "$binary": "PczMzb5MzM0+mZma" } }

# Generate a query vector with vectorize
"sort": { "vector_column": "Text to vectorize" }

For HTTP commands with sort, see Find a row and Find rows.

Sort results and optimization

For best performance, filter and sort on indexed columns, partition keys, and clustering keys.

Filtering on non-indexed columns can use allow filtering, which is inefficient and resource-intensive, especially for large datasets. With the Data API clients, allow filtering operations can hit the client timeout limit before the underlying HTTP operation is complete.

An empty filter ("filter": {}) does not use allow filtering, but it can still be an inefficient and long-running operation.

Additionally, the Data API can perform in-memory sorting, depending on the columns you sort on, the table’s partitioning structure, and whether the sorted columns are indexed. In-memory sorts can have performance implications.

Some combinations of arguments impose an implicit upper bound on the number of rows returned by the Data API:

Vector search returns up to 1000 rows per search operation, unless you set a lower limit.
When using an ascending or descending sort criterion, the Data API returns up to 20 rows at once. The returned rows are the top results across the whole collection based on the filter criteria.

These provisions can also apply when running subsequent commands on client cursors, such as .distinct().

Over HTTP, pagination can occur if there are more than 20 matching rows, but, in some cases, the nextPageState is null regardless of the actual presence of additional results. This can occur with or without sort.

For ascending or descending sort clauses that do not automatically paginate, it is sometimes possible to use the limit and skip options to control the number of rows returned and the starting point of the results, as a form of manual pagination.

When you don’t specify sorting criteria (by vector or otherwise), the following can occur:

Client cursors can scroll through an arbitrary number of rows because the Data API and the client periodically exchange new chunks of rows.
If rows are added or removed after starting a find operation, pagination and client cursor behavior depends on database internals. There is no guarantee as to whether or not the response will pick up such "real-time" changes in the data.

For more information about handling pagination and cursors, see Find rows.

Projection clauses

In findOne and find, you can use the projection parameter to select specific columns to include or exclude from the response. The projection can’t mix inclusions and exclusions.

If projection is empty or unspecified, the Data API applies the default projection. For rows, the default projection includes all columns.

Over HTTP, the Data API always excludes null values in responses, regardless of projection For example, if find matches multiple rows, the response can include different columns for each row, depending on which columns are null in each row. You cannot forcibly include null values in the response.

The Python, TypeScript and Java clients reconstruct returned rows by adding the columns omitted in the API response. Therefore, when using a client, rows from findOne and find operations always have the same columns, as specified by the projection, regardless of the presence of null values.

In order to optimize the response size and improve performance, DataStax recommends always providing an explicit projection tailored to the needs of the application.

Projection syntax

A projection is expressed as a mapping of column names to boolean values.

Use true mapping to include only the specified columns. For example, the following true mapping returns the columns name and email:

{ "name": true, "email": true }

Alternatively, use a false mapping to exclude the specified columns. All other non-excluded columns are returned. For example, the following false mapping returns all columns except name and email:

{ "name": false, "email": false }

The values in a projection map can be objects, booleans, decimals, or integers, but the Data API ultimately evaluates all of these as booleans.

For example, the following projection evaluates to true (include) for all four columns:

{ "column1": true, "column2": 1, "column3": 90.0, "column4": { "keep": "yes!" } }

Whereas this projection evaluates to false (exclude) for all four columns:

{ "column1": false, "column2": 0, "column3": 0.0, "column4": {} }

Passing null-like types (such as {}, null or 0) for the whole projection mapping is equivalent to not specifying a projection, causing the Data API to use the default projection.

A projection can’t mix include and exclude projections. It can contain only true or only false values for the given columns. For example, {"column1": true, "column2": false} is an invalid projection that results in an API error.

A projection can’t include or exclude sub-column values, such as keys in map columns.

To include all columns, you can use the wildcard projection { "*": true }. This is equivalent to the default projection. The Data API doesn’t support false wildcard projections for tables.

Projection examples by language

Python
TypeScript
Java
curl

For the Python client, the projection can be any of the following:

A dictionary (dict[str, Any]) to include specific columns in the response, like {column_name: True}.
A dictionary (dict[str, Any]) to exclude specific columns from the response, like {column_name: False}.
A list or other iterable over key names that are implied to be included in the projection.

# The following two statements are equivalent:
document = table.find_one(
   {"user_id": 101},
   projection={"name": True, "city": True},
)
document = table.find_one(
   {"user_id": 101},
   projection=["name", "city"],
)

# Return the whole rows (default):
table.find({}, projection={'*': True}).to_list()

# Exclude certain columns:
table.find({}, projection={'name': False}).to_list()

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

The TypeScript client takes in an untyped JS object for the projection parameter.

When specifying a projection, you must handle the return type carefully.

To avoid some false typing claims, when it cannot be inferred that a projection is not passed, Partial<Row> is used as the return type for find & findOne operations. This is only Partial, not DeepPartial, so it only makes the top-level keys optional.

DataStax recommends that you provide your own explicit type override on find/findOne operations when using projections.

interface TableSchema {
  matchId: string,
  round: number,
  score?: number | null,
  winner?: string | null,
}

// row1 is of type TableSchema
const row1 = await table.findOne({});

// row2 is of type Partial<TableSchema>
const row2 = await table.findOne({}, {
  projection: { score: 1, winner: 1 }
});

// row3 is of type { score?: string | null, winner?: number | null }
const row3 = await table.findOne<Pick<TableSchema, 'score' | 'winner'>>({}, {
  projection: { score: 1, winner: 1 }
});

// Projections also work with cursors
// rows are of type { score: number, winner: string, $similarity: number }
const cursor = await table.find({})
  .project<{ score: string, winner: number }>({ score: 1, winner: 1 })
  .includeSimilarity();

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

If you perform a vector search that includes projection and includeSimilarity: true, then you must manually include $similarity in the type of the returned rows. For example:

  await table.findOne<{ winner: string, $similarity: number }>({}, {
    sort: { mVector: vector([.2, .3, .4]) },
    projection: { winner: 1 },
    includeSimilarity: true,
  }).then(console.log);

If you don’t include projection, then $similarity is inferred to be a part of the returned rows.

To support the projection mechanism, the Java client has different Options classes that provide the projection method in the helpers. This method takes an array of Projection classes with the column name and a boolean flag indicating inclusion or exclusion:

Projection p1 = new Projection("column1", true);
Projection p2 = new Projection("column2", true);
TableFindOneOptions options1 = new TableFindOneOptions().projection(p1, p2);

To simplify this syntax, you can use the Projections syntactic sugar:

TableFindOneOptions options2 = new TableFindOneOptions()
  .projection(Projections.include("column1", "column2"));

TableFindOneOptions options3 =new TableFindOneOptions()
  .projection(Projections.exclude("column1", "column2"));

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

projection is an optional parameter on find and findOne.

The following example returns the first 100 rows matching the given query, after skipping the first 20 matches. The response contains only the name and email value for each returned row, if they are not null.

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_TABLE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": { "email": "tal@example.com" },
    "projection": { "name": true, "email": true },
    "options": {
      "skip": 20,
      "limit": 100
    }
  }
}' | jq

For information about default projections and projection syntax, see the preceding explanation of projection clauses.

Work with rows

Row commands

Column data types

Number types

Date and time types

Date

Duration

Time

Timestamp

Limited timestamp access

UUID types

Vector type

Define a column to store pre-generated vectors

Define a column to automatically generate vector embeddings

Vectorize parameters

Vectorize authentication

Write vector data

Query and read vector data

BLOB type

Other scalar types

Map, list, and set types

Unsupported types

Mismatched and unsupported type errors

Operators

Query (filter) operators

Update operators

Sort clauses

Sort results and optimization

Projection clauses

Projection syntax

Projection examples by language

See also

Was this helpful?

Give Feedback