Work with rows
This Astra DB Serverless feature is currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms. |
A row represents a single record of data in a table in a Astra DB Serverless database.
You use the Table
class to work with rows through the Data API clients.
For instructions to get a Table
object, see Work with tables.
Prerequisites
-
Review the prerequisites and other information in Intro to Astra DB APIs.
-
Create a Serverless (Vector) database.
-
The Data API supports collections and tables in Serverless (Vector) databases. This includes semi-structured collections and structured table data that you would otherwise interact with through the CQL shell or a driver.
The Data API does not support Serverless (Non-Vector) databases.
-
Learn how to instantiate a
DataAPIClient
object and connect to your database.
Required client version
The Data API tables commands are available through HTTP and the clients. If you use a client, tables commands are available only in client versions 2.0-preview or later. For more information, see Data API client upgrade guide.
Column data types
The Data API doesn’t support all CQL type literals and formats. For example, it doesn’t support all methods for representing dates, times, and timestamps with numerical values. Only specific formats are accepted and returned.
The following sections describe the data types that the Data API supports for table columns as well as guidance for working with certain data types.
Number types
The Data API supports the following column types for numbers that are not reserved for representing dates and times.
The Java client accepts and returns native Java types.
The TypeScript client accepts and returns JavaScript numbers for all number types except |
-
bigint
: A long number that accepts values containing digits and-
ranging from-9223372036854775808
to9223372036854775807
(-2^63
to2^63-1
). Only numbers that can be converted to an integer value without loss of information are valid.The Python client accepts and returns
int
. The TypeScript client accepts and returns JavaScript numbers. -
decimal
: A variable-precision decimal returned as a number, including optional-
and exponent.You can store non-decimal values, and those values can be returned without decimal notation. For example,
23
is a validdecimal
, and it can be returned as23
or23.0
.decimal
does not supportNaN
,Infinity
, or-Infinity
values.The Python client accepts
decimal.DECIMAL
,int
, andfloat
, and it returnsdecimal.DECIMAL
. -
double
: A 64-bit IEEE-754 floating point value that can be a number (including an exponent and+/- 0
) or a quoted string containingNaN
,Infinity
, or-Infinity
.The Python client accepts
int
andfloat
, includingfloat("NaN")
,float("Infinity")
, andfloat("-Infinity")
, and it returnsfloat
. For more information, see Python client usage: Serdes options and custom data types. -
float
: A 32-bit IEEE-754 floating point value that can be a number (includingE
and+/- 0
) or a quoted string containingNaN
,Infinity
, or-Infinity
.The Python client accepts
int
andfloat
, includingfloat("NaN")
,float("Infinity")
, andfloat("-Infinity")
, and it returnsfloat
. For more information, see Python client usage: Serdes options and custom data types. -
int
: A 32-bit signed integer returned and accepted as a number, including optional-
and exponent. Only numbers that can be converted to a 4-byte integer without loss of information are valid. Exponents must express a whole number.The Python client accepts and returns
int
. -
smallint
: A 2-byte integer returned and accepted as a number, including optional-
and exponent. Only numbers that can be converted to a 2-byte integer without loss of information are valid. Exponents must express a whole number.The Python client accepts and returns
int
. -
tinyint
: A 1-byte integer returned and accepted as a number, including optional-
and exponent. Only numbers that can be converted to a 1-byte integer without loss of information are valid. Exponents must express a whole number.The Python client accepts and returns
int
. -
varint
: An arbitrary-precision integer of Java type returned and accepted as a number, including optional-
and exponent.The Python client accepts and returns
int
.
In Data API requests, the limit on maximum number length applies to the literal number of characters sent in the command’s JSON payload.
A value’s literal representation in JSON can’t exceed 100 characters, such as a number that is 100 digits long.
However, you can use exponents to pass values that represent more than 100 digits, such as 1e120000
.
In Data API responses, the Data API may return the fully-expressed number, beyond 100 characters.
Date and time types
The Data API supports date
, duration
, time
, and timestamp
types for representing dates and times.
For more information about reading and writing these types in the Python client, see Python client usage: Serdes options and custom data types. |
Date
A date
is a date with no corresponding time value represented in RFC 3339 full-date format (YYYY-MM-DD
) where the year is at least 4 digits.
If the year is positive and greater than 4 digits, it must be prefixed by +
, such as +YYYYYY-MM-DD
.
Duration
A duration
is a value representing a duration.
The preferred format is ISO-8601 interval format, such as P3Y6M4DT12H30M5S
.
ISO-8601 duration format rules
-
Prefixed by
P
(positive duration) or-P
(negative duration). -
Contains a non-empty sequence of one or more sets of quantities and units in strict decreasing order.
-
Quantities are non-negative integers, with the exception of
S
(seconds), which can be a non-negative decimal. -
Units must use the following values in the order given:
Y
,M
,D
,H
,M
,S
. Note that the firstM
is months and the secondM
is minutes. -
If the
duration
includes sub-day units (hours, minutes, seconds), you must includeT
before the first sub-day quantity, such asP10DT2H30M
orPT2H30M
. -
Can skip units, but the units must still follow the given order, such as
P1Y22DT45M30S
. -
Cannot repeat units.
-
Cannot be empty. For zero duration, you must specify at least one unit with a
0
quantity, such asP0Y
.
For lossless inserts down to nanosecond precision, use the Apache Cassandra® notation for durations, such as 12y3mo1d
or 12y3mo1d12h30m5s
.
Cassandra duration notation rules
-
A non-empty sequence of one or more sets of quantities and units in strict decreasing order.
-
Optionally prefixed by
-
(negative duration). -
Quantities are non-negative integers of at least one digit.
-
Units must use the following case-insensitive values in the order given:
y
,mo
,w
,d
,h
,m
,s
,ms
,us
(orµs
),ns
. -
Can skip units, but the units must still follow the given order, such as
1y22d45m30s
. -
Cannot repeat units.
-
Cannot be empty. For zero duration, you must specify at least one unit with a
0
quantity, such as0s
.
Time
A time
is a time of day without an associated date represented in RFC 3339 partial-time format including HH:MM:SS
and optional nanoseconds (.nnnnnnnnn
).
In responses, sub-seconds are expanded in groups of three digits as needed.
For example, a time written as "12:34:56.7887"
is subsequently returned as "12:34:56.788700"
.
Timestamp
A timestamp
is a date and time with millisecond precision as a string in complete RFC 3339 date-time format.
RFC 3339 date-time format rules
-
Include the following components in order given: Year, month, day,
T
, hour, minute, second, sub-seconds, offset. For example,1984-01-10T12:01Z
.Only the sub-seconds are optional. All other components are required. If a required component has no value, use zero-padding to achieve the required format.
-
The year must be at least 4 digits, optionally prefixed by
-
. If the year is positive and more than 4 digits, it must be prefixed by+
. -
Month, day, hour, and minute must be 2 digits. For values 0-9, you must include a leading
0
, such as01
for January. -
T
must separate the date from the time, as in1984-01-10T12:01Z
-
The offset can be
Z
(shorthand for+00:00
) or an ISO-formatted offset consisting of+
or-
followed byHH:MM
, such as+05:00
or-08:00
. The offset hour and minute must be 2 digits. -
Sub-seconds are optional and can include up to 3 digits (millisecond precision), defined as
.nnn
, such as1984-01-10T12:01:23.456Z
or1984-01-10T12:01:23.4Z
. Sub-seconds are always returned as 3 digits. For example,1984-01-10T12:01:23.4Z
is returned as1984-01-10T12:01:23.400Z
.
timestamp
format examples
# Minimum expression
YYYY-MM-DDTHH:MM:SSZ
YYYY-MM-DDTHH:MM:SS+HH:MM
YYYY-MM-DDTHH:MM:SS-HH:MM
# Optional sub-seconds
YYYY-MM-DDTHH:MM:SS.nnnZ
YYYY-MM-DDTHH:MM:SS.nnn+HH:MM
YYYY-MM-DDTHH:MM:SS.nnn-HH:MM
# Optional negative sign before the year
-YYYY-MM-DDTHH:MM:SS.nnnZ
-YYYY-MM-DDTHH:MM:SS.nnn+HH:MM
-YYYY-MM-DDTHH:MM:SS.nnn-HH:MM
# Positive sign before the year if the year is positive and greater than 4 digits
+YYYYYY-MM-DDTHH:MM:SS.nnnZ
+YYYYYY-MM-DDTHH:MM:SS.nnn+HH:MM
+YYYYYY-MM-DDTHH:MM:SS.nnn-HH:MM
Limited timestamp access
The Data API supports the timestamp
column type, but it doesn’t provide access to column timestamps.
Additionally, the Data API doesn’t set timestamps for any mutation.
If you require this timestamp information, DataStax recommends that use you CQL instead of the Data API.
UUID types
Unlike collections, which assign a default _id
value to each document, the Data API does not assign default identifiers or default values to rows in tables.
Each table’s primary key is the unique identifier for that table’s rows.
To store explicitly-typed UUID data in a table, the Data API fully supports the uuid
type and partially supports the timeuuid
type:
-
uuid
: Fully supported by the Data API, and it accepts all UUID versions.uuid
data is generally returned and accepted as quoted strings in UUID format, including dashes, such as"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
. The Python client acceptsstr
andUUID
(all ofastrapy.ids.uuid*
), and it returnsastrapy.ids.UUID
(alias for stdlibuuid.UUID
).objectId
is not supported in tables. You can store such hexidecimal strings in tables, but they are not recognized or handled as a trueobjectId
. -
timeuuid
: Partially supported by the Data API. It is a specific type of UUID that includes timestamp information, andtimeuuid
data is returned and accepted as a JSON string.With the Data API, you can read, write, update, and delete values in existing
timeuuid
columns, but you can’t use the Data API to create tables withtimeuuid
columns or add columns of typetimeuuid
to tables.If you need a
timeuuid
column in a table, then you must create or alter the table with CQL before using the Data API to manage the values in thetimeuuid
column. Alternatively, consider using theuuid
type.
Vector type
The Data API supports the special vector
type (vector<float,dimensions>
) for storing vectors, such as embeddings, which you can then index and use for vector ANN searches.
In vector
columns, you can store pre-generated vectors. You can also automatically generate vector embeddings from text strings.
When you create or alter a table, you can define multiple vector
columns to store vectors with different dimensions or from different embedding models.
However, you can run a vector search on only one column at a time.
Define a column to store pre-generated vectors
To store pre-generated vectors in a table, define a vector
column with the dimension
of the vectors.
If you’re not sure what dimension to set, use the vector dimension your embeddings model produces.
The following examples are truncated to focus on the vector
column.
-
Python
-
TypeScript
-
Java
-
curl
Define the vector
column in your table definition
using a fluent interface, CreateTableDefinition
, or a plain dictionary.
-
With fluent interface
-
With CreateTableDefinition
-
With plain dictionary
table_definition = (
CreateTableDefinition.builder()
...
.add_vector_column("VECTOR_COLUMN_NAME", dimension=NUM_DIMENSIONS)
...
.build()
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
table_definition = CreateTableDefinition(
columns={
...
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=NUM_DIMENSIONS
),
...
},
...
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
table_definition = {
"columns": {
...
"VECTOR_COLUMN_NAME": {"type": "vector", "dimension": NUM_DIMENSIONS},
...
},
...
}
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
const table = await db.createTable('TABLE_NAME', {
definition: {
columns: {
VECTOR_COLUMN_NAME: {
type: 'vector',
dimension: NUM_DIMENSIONS,
},
...
},
...
},
});
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
int NUM_DIMENSIONS = 3;
TableDefinition tableDefinition = new TableDefinition()
//...
.addColumnVector("VECTOR_COLUMN_NAME", new ColumnDefinitionVector()
.dimension(NUM_DIMENSIONS)
.metric(COSINE))
//...
;
Table myTable = database.createTable("TABLE_NAME", tableDefinition);
"columns": {
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": NUM_DIMENSIONS
}
}
Define a column to automatically generate vector embeddings
To automatically generate embeddings with vectorize, define a vector
column with embedding provider service
options.
Astra DB stores the automatically-generated embeddings in this column.
-
Python
-
TypeScript
-
Java
-
curl
-
With fluent interface
-
With CreateTableDefinition
-
With plain dictionary
table_definition = (
CreateTableDefinition.builder()
...
.add_vector_column(
"VECTOR_COLUMN_NAME",
dimension=NUM_DIMENSIONS,
service=VectorServiceOptions(
provider="EMBEDDINGS_PROVIDER_NAME",
model_name="MODEL_NAME",
authentication={
"providerKey": "ASTRA_KMS_API_KEY_NAME",
},
)
)
...
.build()
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
# CreateTableDefinition object:
table_definition = CreateTableDefinition(
columns={
...
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=NUM_DIMENSIONS,
service=VectorServiceOptions(
provider="EMBEDDINGS_PROVIDER_NAME",
model_name="MODEL_NAME",
authentication={
"providerKey": "ASTRA_KMS_API_KEY_NAME",
},
)
),
...
},
...
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
table_definition = {
"columns": {
...
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": NUM_DIMENSIONS,
"service": {
"provider": "EMBEDDINGS_PROVIDER_NAME",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "ASTRA_KMS_API_KEY_NAME",
},
},
},
...
},
...
}
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
const table = await db.createTable('TABLE_NAME', {
definition: {
columns: {
VECTOR_COLUMN_NAME: {
type: 'vector',
dimension: NUM_DIMENSIONS,
service: {
provider: 'EMBEDDINGS_PROVIDER_NAME',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'ASTRA_KMS_API_KEY_NAME',
},
},
},
...
},
...
},
});
VectorServiceOptions vectorServiceOptions = new VectorServiceOptions()
.provider("EMBEDDINGS_PROVIDER_NAME")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "ASTRA_KMS_API_KEY_NAME"));
ColumnDefinitionVector columnDefinition = new ColumnDefinitionVector()
.dimension(3)
.metric(COSINE)
.service(vectorServiceOptions);
TableDefinition tableDefinition = new TableDefinition()
//...
.addColumnVector("VECTOR_COLUMN_NAME", columnDefinition);
Table myTable = database.createTable("TABLE_NAME", tableDefinition);
"columns": {
"COLUMN_NAME": "DATA_TYPE",
"COLUMN_NAME": "DATA_TYPE",
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": NUM_DIMENSIONS,
"service": {
"provider": "EMBEDDINGS_PROVIDER_NAME",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "ASTRA_KMS_API_KEY_NAME"
}
}
}
}
Vectorize parameters
At minimum, you must specify the embedding provider and model name.
However, dimension
is optional if the specified model name has a default dimension value, and some embedding providers require additional parameters.
To find required parameters and accepted values, use findEmbeddingProviders or see the documentation for your embedding provider integration.
If a table has multiple vector
columns with vectorize integrations, all columns must use the same embedding provider integration, including the same dimension, model name, credential, and other settings.
Vectorize authentication
Most embedding provider integrations require authentication through Astra DB KMS or header authentication:
-
Astra DB KMS (
providerKey
): Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider. In yourvector
column definition, include anauthentication
object withproviderKey
set to the credential’s API Key name as given in Astra DB KMS. For more information, see Embedding provider authentication. -
Header authentication: As an alternative to
providerKey
, you can omit theauthentication
object, and then provide the authentication key in anx-embedding-api-key
header instead. Header authentication overrides Astra DB KMS if you set both.Over HTTP, specify this header literally, such as
--header "x-embedding-api-key: $EMBEDDING_PROVIDER_API_KEY"
, and include it with every command that uses vectorize, including writes and vector search.With the clients, you include the respective header authentication parameter (
embedding-api-key
,embeddingApiKey
, orembeddingAuthProvider
) in the top-level create table or get table command parameters (not in thevector
column definition). This attaches the embedding provider API key to the resultingTable
instance, and then the client automatically attaches thex-embedding-api-key
header with the specified key to any underlying HTTP request that requires vectorize authentication. -
No authentication: Astra-hosted embedding provider integrations don’t require authentication. However, your database must be in a supported region. For more information, see the documentation for the Astra-hosted embedding provider integrations.
You can use If you drop the embedding provider integration, the column retains the existing embeddings. You cannot automatically generate new embeddings unless you re-enable the embedding provider integration. If you add an embedding provider integration to a |
Write vector data
|
-
Python
-
TypeScript
-
Java
-
curl
You can insert pre-generated vector data into a vector
column by supplying either a list of float
values or a DataAPIVector
object. The end result on the database will be identical. However, the DataAPIVector values are binary-encoded before sending, which results in improved insertion performance, especially when writing multiple rows.
If the column has a vectorize integration, you can automatically generate an embedding from a string by having the string in the row directly.
# Insert a list of floats sending it as array:
my_table.insert_one({"vector_column": [0.1, -0.2, 0.3], ...})
# Insert a list of floats sending it as binary-encoded:
my_table.insert_one({"vector_column": DataAPIVector([0.1, -0.2, 0.3]), ...})
# Generate the embedding with vectorize
my_table.insert_one({"vector_column": "Text to vectorize", ...})
For additional control over the behavior of vector encoding, you can set the following attributes of the table APIOptions.serdes_options
:
-
Setting
unroll_iterables_to_lists
toTrue
means that any iterable over numbers is unrolled into a list prior to insertion. That means you can directly usenumpy
objects, generators, and similar in all cases where a vector is expected. This setting defaults toFalse
as it incurs a slight performance cost. -
Setting
binary_encode_vectors
toFalse
disables binary-encoding of vectors altogether: all vectors are sent to the API as plain arrays of floats. This defaults toTrue
for an improved insertion performance.
For more information, see Python client usage: Serdes options and custom data types.
You can insert pre-generated vector data into a vector
column by supplying either a list of numbers or a DataAPIVector
object. The end result on the database will be identical. However, the DataAPIVector values are binary-encoded before sending, which results in improved insertion performance, especially when writing multiple rows.
Example using DataAPIVector
(recommended):
// Using the explicit class constructor
await table.insertOne({ vectorColumn: new DataAPIVector([0.1, -0.2, 0.3]), ... });
// Using the vector
function shorthand
await table.insertOne({ vectorColumn: vector([0.1, -0.2, 0.3]), ... });
You can create a vector from any DataAPIVectorLike
object.
export type DataAPIVectorLike = DataAPIVector | number[] | { $binary: string } | Float32Array;
// Creating a DataAPIVector from a Float32Array
const vec = vector(Float32Array.of(.1, -.2, .3));
You can insert a number[]
directly, but insertion performance will be decreased. You also will need to provide your own typing for the Table
, since InferTableSchema
assumes that you are using DataAPIVector
objects.
When reading back the vector data, it will be returned as a DataAPIVector
object, unless you provide your own deserialization logic to convert it back to a number[]
.
// Define the table schema manually
interface TableSchema {
vectorColumn?: number[] | null,
...
}
const table = await db.createTable<TableSchema>('TABLE_NAME', { ... });
// Insert as a list of floats
await table.insertOne({ vectorColumn: [0.1, -0.2, 0.3], ... });
If the column has a vectorize integration, you can automatically generate an embedding from a string by having the string in the row directly.
The ability to be able to pass a string will be picked up by InferTableSchema
if you provide a service
block in the vector
column definition.
// Insert using the vectorize feature
await table.insertOne({ vectorColumn: 'Text to vectorize', ... });
// You can still provide an explicit vector if you want
await table.insertOne({ vectorColumn: vector([0.1, -0.2, 0.3]), ... });
You can insert pre-generated vector data into a vector
column by supplying either a list of floats or a DataAPIVector
object.
The end result on the database is the same.
However, with the DataAPIVector
object, the values are sent in binary-encoded format, which is more performant, especially when writing multiple rows.
Use a list of floats:
tableGames.insertOne(new Row()
// primary key of the table
.addText("match_id", "game2").addInt("round", 1)
// add a vector
.addVector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));
(Recommended) Use DataAPIVector
:
DataAPIVector vector
= new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
tableGames.insertOne(new Row()
// primary key of the table
.addText("match_id", "game1").addInt("round", 1)
// add a vector with the DataAPIVector wrapper
.addVector("m_vector", vector));
Underlying JSON representation for DataAPIVector
Because the DataAPIVector
class uses binary-formatted serialization, the underlying HTTP request sends $binary
in the JSON body.
For example:
{
"insertOne": {
"document": {
"match_id": "game1",
"round": 1,
"m_vector": {
"$binary": "PczMzT5MzM0+mZma"
}
}
}
}
For more information about DataAPIVector
, see Java client usage.
If the column has a vectorize integration, you can automatically generate an embedding from a String
by adding the string to the row directly:
// Assuming m_vector is a vector column with vectorize integration
tableGames.insertOne(new Row()
// primary key of the table
.addText("match_id", "game2").addInt("round", 1)
// Add a vector as a string to generate vector embeddings from
.addVectorize("m_vector", "Query text to vectorize"));
// Even if the column has a vectorize integration
// it is still valid to add embeddings directly
tableGames.insertOne(new Row()
// We need a primary Key
.addText("match_id", "game3").addInt("round", 1)
// add a vector
.addVector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));
You can insert pre-generated vector data into a vector
column as an array or as a binary encoded value.
Alternatively, if the column has a vectorize integration, you can automatically generate an embedding from a string.
# Insert vector data as an array
"vector_column": [ 0.1, -0.2, 0.3 ]
# Insert vector data with $binary
"vector_column": { "$binary": "PczMzb5MzM0+mZma" }
# Generate an embedding with vectorize
"vector_column": "Text to vectorize"
The $binary
form is for writes only.
Binary encoding can be more economical, in terms of space and resources, when sending vectors in JSON payloads.
Once received, the Data API decodes the binary format, and then stores the resulting list of numbers in the table.
When you read data from a vector
column, the Data API returns the vector as a list of numbers, regardless of the initial write format.
Vector binary encodings specification
A d-dimensional vector is a list of d floating-point numbers that can be binary encoded.
To prepare for encoding, the list must be transformed into a sequence of bytes where each float is represented as four bytes in big-endian format.
Then, the byte sequence is Base64-encoded, with =
padding, if needed.
For example, here are some vectors and their resulting Base64 encoded strings:
[0.1, -0.2, 0.3] = "PczMzb5MzM0+mZma" [0.1, 0.2] = "PczMzT5MzM0=" [10, 10.5, 100, -91.19] = "QSAAAEEoAABCyAAAwrZhSA=="
Once encoded, you use $binary
to pass the Base64 string to the Data API:
{ "$binary": "BASE64_STRING" }
You can use a script to encode your vectors, for example:
python
import base64
import struct
input_vector = [0.1, -0.2, 0.3]
d = len(input_vector)
pack_format = ">" + "f" * d
binary_encode = base64.b64encode(struct.pack(pack_format, *input_vector)).decode()
With vectorize, the table stores the resulting embedding only. The original text string is discarded, unless you manually write the value in another column in the same insert or update command.
If you enable vectorize on a |
Query and read vector data
A vector search retrieves rows that are most similar to a given vector.
To run a vector search on a table, use a sort
clause with an indexed vector
column and a query vector.
If your table has multiple vector
columns, you can only sort
on one vector
column at a time.
You can provide a pre-generated query vector or, if the vector
column has a vectorize integration, you can automatically generate a query vector from a string.
-
Python
-
TypeScript
-
Java
-
curl
# Run a vector search with the query vector encoded as array
my_table.find(sort={"vector_column": [0.1, -0.2, 0.3]}, ...)
# Run a vector search with the query vector binary-encoded
my_table.find(sort={"vector_column": DataAPIVector([0.1, -0.2, 0.3])}, ...)
# Run a vector search by generating the query embedding with vectorize:
my_table.find(sort={"vector_column": "Query text to vectorize"}, ...)
By default, the results are returned with DataAPIVector
objects in their vector
columns.
If you set the table APIOptions.serdes_options.custom_datatypes_in_reading
to False
, all vectors are returned as plain lists of floats, regardless of the format in the underlying HTTP response.
This setting also controls how the query vector itself is returned when you invoke find
with include_sort_vector=True
.
For more information, see Python client usage: Serdes options and custom data types.
You can provide sort vectors in the same way that you write vectors:
// Run a vector search using a DataAPIVector
await table.find({}, { sort: { vectorColumn: vector([0.1, -0.2, 0.3]) } });
// Run a vector search using a plain list of floats
// You must provide your own typing or use an untyped table
await table.find({}, { sort: { vectorColumn: [0.1, -0.2, 0.3] } });
// Run a vector search using vectorize
await table.find({}, { sort: { vectorColumn: 'Query text to vectorize' } });
Regardless of how you provide vectors, the vector results are always returned as DataAPIVector
objects.
The can be converted to arrays lazily using vector.asArray()
.
If you want to avoid using DataAPIVector
objects, you must provide your own custom serialization/deserialization logic for the Table
instance. See Custom Ser/Des for more information.
This also applies to find
operations that include the includeSortVector
option:
// Run a vector search using vectorize
const cursor = await table.find({})
.sort({ vectorColumn: 'Query text to vectorize' })
.includeSortVector();
// The query vector is returned as a DataAPIVector object
const queryVector = await cursor.getSortVector();
To perform a vector search, use the Sort.vector(..,..)
and Sort.vectorize(..,..)
methods.
These methods take two arguments: a column name and a query vector.
If you omit the column name, the client uses the default of $vector
or $vectorize
, depending on the method.
These are reserved fields for collections.
With tables, it is important that you specify the name of your vector
column in the first argument.
This is because a table can contain multiple vector
columns, and vector
columns can have any valid name.
You can provide sort vectors to these methods in the same way that you write vectors:
// This example uses TableFindOptions, but you can use any sortable object
// Sort with DataAPIVector
DataAPIVector vector = new DataAPIVector(new float[] {0.1f, 0.2f, 0.3f});
new TableFindOptions()
.sort(Sort.vector("m_vector", vector));
// Sort with float[]
new TableFindOptions()
.sort(Sort.vector("m_vector", new float[] {0.1f, 0.2f, 0.3f}));
// Sort with vectorize
new TableFindOptions()
.sort(Sort.vectorize("m_vector", "Text To Serialize"));
# Provide the query vector as an array
"sort": { "vector_column": [ 0.1, -0.2, 0.3 ] }
# Provide the query vector with $binary
"sort": { "vector_column": { "$binary": "PczMzb5MzM0+mZma" } }
# Generate a query vector with vectorize
"sort": { "vector_column": "Text to vectorize" }
The $binary
form can be more economical, in terms of space and resources, when sending vectors in JSON payloads.
Once received, the Data API decodes the binary format, and then uses the resulting list of numbers for the vector search.
The includeSortVector
option returns the query vector as a list of numbers, regardless of the format supplied in the original command.
For more information and examples, see Find rows reference.
BLOB type
The Data API supports blob
type.
Due to a known issue with filtering on |
-
Python
-
TypeScript
-
Java
-
curl
The Python client automatically handles binary data for tables.
Any bytes
values sent to the Data API, such as rows to insert or search filters values, are automatically converted to the format expected by the Data API.
Likewise, binary data received from the Data API is returned as a bytes
value.
# Assume the table has primary key "primary_column" and a "bytes_column" BLOB column
bytes_val = b"=\xfb\xe7m>\xe9x\xd5?I\xfb\xe7"
my_table.insert_one({"primary_column": "row0", "bytes_column": bytes_val})
# Return TableInsertOneResult(inserted_id={'primary_column': 'row0'}, ... (shortened)
my_table.find_one({"primary_column": "row0"})["bytes_column"]
# Returns b'=\xfb\xe7m>\xe9x\xd5?I\xfb\xe7'
In the previous example, bytes_val
, being bytes_val.hex() == "3dfbe76d3ee978d53f49fbe7"
, translates to the
JSON representation {"$binary": "PfvnbT7peNU/Sfvn"}
.
For more information about the JSON representation, see the curl tab for BLOB.
The TypeScript client automatically handles binary data in a runtime-independent way, through the DataAPIBlob
class:
// Using the explicit class constructor
await table.insertOne({ blobColumn: new DataAPIBlob([0x0, 0x1, 0x2]), ... });
// Using the `blob` function shorthand
await table.insertOne({ blobColumn: blob(Buffer.from[0x0, 0x1, 0x2]), ... });
You can create a vector from any DataAPIBlobLike
object:
export type DataAPIBlobLike = DataAPIBlob | ArrayBuffer | Buffer | { $binary: string };
// Creating a DataAPIBlob from a base64 encoded string
const binary = blob({ $binary: 'ZXZlciBkcmVhbQ==' });
The same class is also returned when reading back the BLOB data:
const row = await table.findOne({ pkey: 'row0' });
console.log(row.blobColumn.asArrayBuffer());
The Java client automatically handles binary data for tables.
Arrays of bytes (byte[]
) sent to the Data API, such as rows to insert or search filters, are automatically converted to the format expected by the Data API.
Likewise, binary data received from the Data API is returned as a byte[]
.
byte[] bytesVal = {
(byte) 0x3D, (byte) 0xFB, (byte) 0xE7, (byte) 0x6D,
(byte) 0x3E, (byte) 0xE9, (byte) 0x78, (byte) 0xD5,
(byte) 0x3F, (byte) 0x49, (byte) 0xFB, (byte) 0xE7
};
// Assume 'myTable' has TEXT-type primary key "primary_column" and "bytes_column" BLOB column
myTable.insertOne(new Row()
.addText("primary_column", "row0")
.addBlob("bytes_column", bytesVal));
myTable
.findOne(Filters.eq("primary_column", "row0"))
.ifPresent(row -> {
byte[] bytesColumn = row.getBlob("bytes_column");
// prints: [61, -5, -25, 109, 62, -23, 120, -43, 63, 73, -5, -25]
});
In the previous example, the array bytes_column
translates to the JSON representation {"$binary": "PfvnbT7peNU/Sfvn"}
.
For more information about the JSON representation, see the curl tab for BLOB.
To write data to a blob
column, use $binary
with a Base64-encoded string representing the byte sequence you want to store:
"blob_column": { "$binary": "PfvnbT7peNU/Sfvn" }
Over HTTP, you can use $binary
to write binary-encoded vectors to vector
columns.
However, writing binary-encoded vectors to blob
columns does not treat that data as true vectors, such as for vector search.
For more information about storing vector data and running vector search on tables, see Vector type.
For information and examples for encoding arrays and writing vectors with $binary
, see the curl tab for Write vector data.
Other scalar types
-
ascii
: A string of US-ASCII characters.The Python client accepts and returns
str
. -
boolean
The Python client accepts and returns
bool
. -
inet
: Accepts literal IPV4 and IPV6 values. Domain names are rejected for security reasons.The Python client accepts and returns
ipaddress.IPV4Addres
andipaddress.IPV6Address
. -
text
: A UTF-8 encoded string.The Python client accepts and returns
str
. -
varchar
: A UTF-8 encoded string. This is an alias fortext
. However,varchar
is partially supported by the Data API, whereastext
is fully supported.With the Data API, you can read, write, update, and delete values in existing
varchar
columns, but you can’t use the Data API to create tables withvarchar
columns or add columns of typevarchar
to tables.If you need a
varchar
column in a table, then you must create or alter the table with CQL before using the Data API to manage the values in thevarchar
column. Alternatively, consider using thetext
type.
Map, list, and set types
The Data API supports map
, list
, and set
types for table columns with the following limitations:
-
No nested maps, lists, or sets, such as
map<string,map<string,string>>
. -
Maps, lists, and sets are not supported in primary keys, indexes, or
filter
clauses. -
Maps can have only
text
orascii
keys. -
Frozen maps, lists, and sets created in CQL (outside the Data API) aren’t accessible through the Data API. For more information, see Support for existing tables.
-
Map
-
List
-
Set
map<keyType,valueType>
is a typed set of key-value pairs represented as a JSON object where the keys are unique.
When written to a table, maps are internally ordered by their keys, and they are returned in that order.
To define a map
column in a table, you specify the map
type, as well as the keyType
and valueType
of the items in the map.
The keyType
can be only text
or ascii
.
Other key types are not supported.
-
Python
-
TypeScript
-
Java
-
curl
Example using the fluent interface:
table_definition = (
CreateTableDefinition.builder()
...
.add_map_column(
"map_column",
ColumnType.TEXT, # or "text"
ColumnType.INT, # or "int"
)
...
.build()
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
Example using CreateTableDefinition
:
table_definition = CreateTableDefinition(
columns={
...
"map_column": TableKeyValuedColumnTypeDescriptor(
column_type="map", # or TableKeyValuedColumnType.MAP
key_type=ColumnType.TEXT, # or "text"
value_type=ColumnType.INT, # or "int"
),
...
},
...
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
Example using a plain dictionary:
table_definition = {
"columns": {
...
"map_column": {
"type": "map", "keyType": "text", "valueType": "int"
},
...
},
...
}
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
const table = await db.createTable('TABLE_NAME', {
definition: {
columns: {
MAP_COLUMN_NAME: {
type: 'map',
keyType: 'text',
valueType: 'int',
},
...
},
...
},
});
TableDefinition def = new TableDefinition()
.addColumnMap("map_column", ColumnTypes.TEXT, ColumnTypes.INT);
database
.createTable("TABLE_NAME", def);
"map_column": {
"type": "map",
"keyType": "text",
"valueType": "int"
}
When you write to a map
column, you provide key-value pairs representing items in the map.
Keys must match the defined keyType
format, and values must use the defined valueType
format.
-
Python
-
TypeScript
-
Java
-
curl
The Python client accepts both simple dictionaries and instances of DataAPIMap
for writes.
# Insert a map as a plain dictionary
my_table.insert_one({
"map_column": {"key1": 42, "key2": 298},
...
})
# Insert a map using a DataAPIMap, initialized with a dictionary
my_table.insert_one({
"map_column": DataAPIMap({"key1": 42, "key2": 298}),
...
})
# Insert a map using a DataAPIMap, initialized with a list of pairs
my_table.insert_one({
"map_column": DataAPIMap([("key1", 42), ("key2", 298)]),
...
})
When reading rows, the Python client returns maps as DataAPIMap
objects.
To return standard Python dictionaries, set the table APIOptions.serdes_options.custom_datatypes_in_reading
to False
.
The default for this setting is True
.
Map data is represented by native JavaScript Map
s.
await table.insertOne({ mapColumn: new Map([['key1', 42], ['key2', 298]]), ... });
While you can use plain JavaScript objects when inserting map data, it will still be read back as a native Map
, unless custom serialization/deserialization logic is used.
Map<String, Integer> myMap = new HashMap<>();
myMap.put("key1", 42);
myMap.put("key2", 298);
table.insertOne(new Row().addMap("map_column", myMap));
"map_column": {
"key1": 42,
"key2": 298
}
list<valueType>
is a typed collection of ordered, non-unique values represented as a JSON array.
When written to a table, lists elements are ordered by their position in the list.
To define a list
column in a table, you specify the list
type and the valueType
of the items in the list.
-
Python
-
TypeScript
-
Java
-
curl
Example using the fluent interface:
table_definition = (
CreateTableDefinition.builder()
...
.add_list_column(
"list_column",
ColumnType.TEXT, # or "text"
)
...
.build()
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
Example using CreateTableDefinition
:
table_definition = CreateTableDefinition(
columns={
...
"list_column": TableValuedColumnTypeDescriptor(
column_type="list", # or TableValuedColumnType.LIST
value_type=ColumnType.TEXT, # or "text"
),
...
},
...
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
Example using a plain dictionary:
table_definition = {
"columns": {
...
"list_column": {
"type": "list", "valueType": "text"
},
...
},
...
}
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
const table = await db.createTable('TABLE_NAME', {
definition: {
columns: {
LIST_COLUMN_NAME: {
type: 'list',
valueType: 'text',
},
...
},
...
},
});
TableDefinition def = new TableDefinition()
.addColumnList("list_column", ColumnTypes.TEXT);
Table<Row> table = db.createTable("TABLE_NAME", def);
"list_column": {
"type": "list",
"valueType": "text"
}
When you write to a list
column, you provide an array of values in the defined valueType
format.
-
Python
-
TypeScript
-
Java
-
curl
my_table.insert_one({
"list_column": ["Hello", "World", "!"],
...
})
List data is represented by native JavaScript arrays.
await table.insertOne({ listColumn: ['Hello', 'World', '!'], ... });
table.insertOne(new Row()
.addList("list_column", List.of("Hello", "World", "!"))
);
"list_column": [ "Hello", "World", "!" ]
set<valueType>
is a typed collection of unique values represented as a JSON array.
When written to a table, sets are ordered by their values.
To define a set
column in a table, you specify the set
type and the valueType
of the items in the set.
-
Python
-
TypeScript
-
Java
-
curl
Example using the fluent interface:
table_definition = (
CreateTableDefinition.builder()
...
.add_set_column(
"set_column",
ColumnType.INT, # or "int"
)
...
.build()
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
Example using CreateTableDefinition
:
table_definition = CreateTableDefinition(
columns={
...
"set_column": TableValuedColumnTypeDescriptor(
column_type="set", # or TableValuedColumnType.SET
value_type=ColumnType.INT, # or "int"
),
...
},
...
)
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
Example using a plain dictionary:
table_definition = {
"columns": {
...
"set_column": {
"type": "set", "valueType": "int"
},
...
},
...
}
my_table = database.create_table(
"TABLE_NAME",
definition=table_definition,
...
)
const table = await db.createTable('TABLE_NAME', {
definition: {
columns: {
SET_COLUMN_NAME: {
type: 'set',
valueType: 'int',
},
...
},
...
},
});
type TableSchema = InferTableSchema<typeof table>;
TableDefinition def = new TableDefinition()
.addColumnSet("set_column", ColumnTypes.INT);
Table<Row> table = db.createTable("TABLE_NAME", def);
"set_column": {
"type": "set",
"valueType": "int"
}
When you write to a set
column, you provide an set of values in the defined valueType
format.
-
Python
-
TypeScript
-
Java
-
curl
The Python client accepts both Python sets and instances of DataAPISet
.
# Insert a set as a plain Python set
my_table.insert_one({
"set_column": {9, 8, 7},
...
})
# Insert a set using a DataAPISet, initialized with a plain set
my_table.insert_one({
"set_column": DataAPISet({9, 8, 7}),
...
})
# Insert a set using a DataAPISet, initialized with a list of elements
my_table.insert_one({
"set_column": DataAPISet([9, 8, 7]),
...
})
When reading rows, the Python client returns sets as DataAPISet
objects, unless
the table APIOptions.serdes_options.custom_datatypes_in_reading
is False
,
in which case standard Python sets are used.
The default for this setting is True
.
Set data is represented by native JavaScript Set
s.
await table.insertOne({ setColumn: new Set([9, 8, 8, 7]), ... });
While you can use plain JavaScript arrays when you insert set data, the data is read back as a native Set
, unless you use custom serialization/deserialization logic.
table.insertOne(new Row()
.addSet("set_column", Set.of(9,8,7))
);
"set_column": [ 9, 8, 7 ]
Unsupported types
Unsupported types, formats, and modifiers include:
-
User-defined types (UDTs)
-
Counters
-
Time To Live (TTL)
-
Column timestamps
-
Frozen
-
Static
-
Tuple
The following types are partially supported or supported with limitations:
When working with existing CQL tables that you created outside the Data API, some of these types are partially supported. For more information, see Migrate from CQL.
Mismatched and unsupported type errors
The Data API returns warnings and errors if you use an unsupported type or attempt to insert a value that doesn’t match the column’s defined type.
For example, if you attempt to insert a value that doesn’t match the column’s type, the Data API returns "INVALID_COLUMN_VALUES"
.
Similarly, the Data API returns an error if you attempt to perform an operation on a column that isn’t compatible with the specified operation.
For example, if you attempt to filter on a vector
column, the Data API returns "INVALID_FILTER_EXPRESSION"
.
Operators
Data API provides query and update operators that you can use in filters to find and insert rows.
For examples, see the references for commands that use these operators, such as Find rows reference and Update rows reference.
Query (filter) operators
You can use filter operators on all supported types except map
, list
, set
, and vector
.
Operator type | Name | Purpose |
---|---|---|
Comparison query (Range) |
|
Matches rows where the given column’s value is greater than the specified value. |
|
Matches rows where the given column’s value is greater than or equal to the specified value. |
|
|
Matches rows where the given column’s value is less than the specified value. |
|
|
Matches rows where the given column’s value is less than or equal to the specified value. |
|
Comparison query (Exact) |
|
|
|
Matches rows where a given column’s value does not equal the specified value. |
|
|
Match one or more of an array of specified values.
For example, If you have only one value to match, an array is not necessary, such as |
|
|
Matches any of the values that are NOT IN a given array. |
Update operators
You can use update operators on all supported types.
Name | Purpose |
---|---|
|
Sets a given column’s value to the specified value. |
|
Sets a given column’s value to |
|