Get started with the Data API
The Data API is a schema-less, document-based, modern API that provides programmatic access to data in your Serverless (Vector) databases.
You can use the Data API and its clients to create databases, keyspaces, collections, and tables, and then perform CRUD operations on data in your collections and tables, including complex filters and vector searches that return similarity scores.
The Data API provides an entry point for application development with Astra DB Serverless databases, including a variety of GenAI ecosystem integrations like LangChain, LlamaIndex, and embedding providers. It leverages the scalability, performance, and real-time indexing capabilities of Apache Cassandra® to support GenAI application development.
Prerequisites
-
An active Astra account.
-
An active Serverless (Vector) database.
The Data API supports collections and tables in Serverless (Vector) databases. This includes semi-structured collections and structured table data that you would otherwise interact with through the CQL shell or a driver.
The Data API doesn’t support Serverless (Non-Vector) databases.
Install a client or curl
You can interact with the Data API directly through HTTP or use one of the Astra DB Data API clients for Python, TypeScript, or Java:
Language | Client | Version | Dependency | Documentation |
---|---|---|---|---|
Python |
Python 3.8 or later |
|||
TypeScript |
Node.js 18 or later |
|||
Java |
Java 17 or later (21 recommended) |
Use a package manager to install a client library or a tool to handle HTTP API calls.
-
Python
-
TypeScript
-
Java
-
curl
Install the Python client with pip:
-
Verify that pip is version 23.0 or later:
pip --version
-
If needed, upgrade pip:
python -m pip install --upgrade pip
-
Install the astrapy package . You must have Python 3.8 or later.
pip install astrapy
Install the TypeScript client:
-
Verify that Node is version 18 or later:
node --version
-
Install astra-db-ts with your preferred package manager:
-
npm
-
Yarn
-
pnpm
npm install @datastax/astra-db-ts
yarn add @datastax/astra-db-ts
pnpm add @datastax/astra-db-ts
-
Install the Java client with Maven or Gradle.
-
Maven
-
Gradle
-
Install Java 17 or later (21 recommended) and Maven 3.9 or later.
-
Create a
pom.xml
file in the root of your project, and then replaceVERSION
with the latest version of astra-db-java .pom.xml<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>test-java-client</artifactId> <version>1.0-SNAPSHOT</version> <!-- The Java client --> <dependencies> <dependency> <groupId>com.datastax.astra</groupId> <artifactId>astra-db-java</artifactId> <version>VERSION</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>3.0.0</version> <configuration> <executable>java</executable> <mainClass>com.example.Quickstart</mainClass> </configuration> <executions> <execution> <goals> <goal>java</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>11</source> <target>11</target> </configuration> </plugin> </plugins> </build> </project>
-
Install Gradle and Java 17 or later (21 recommended).
-
Create a
build.gradle
file in the root of your project:build.gradleplugins { id 'java' id 'application' } repositories { mavenCentral() } dependencies { implementation 'com.datastax.astra:astra-db-java:1.+' } application { mainClassName = 'com.example.Quickstart' }
To interact with the Data API directly through HTTP, you need to install a tool to handle API calls, such as curl.
Upgrade a client
When a new client version is released, upgrade your client to get the latest features, improvements, and bug fixes. For information about major changes in specific client versions, see Data API client upgrade guide.
-
Python
-
TypeScript
-
Java
-
curl
# verbose
pip install astrapy --upgrade
# shorthand
pip install -U astrapy
To upgrade to a preview version, you must pass the
|
Reinstall the TypeScript client at the latest version.
For the TypeScript client, DataStax recommends reinstallation instead of upgrade
and update
commands, such as npm update @datastax/astra-db-ts
, that install the latest version allowed by your package.json
.
If your package.json
pins a version that is earlier than the actual latest version, then upgrade
and update
won’t install the actual latest version.
# npm
npm install @datastax/astra-db-ts@latest
# Yarn
yarn add @datastax/astra-db-ts@latest
# pnpm
pnpm add @datastax/astra-db-ts@latest
To upgrade to a preview version, you must pass the
|
To upgrade your Java client, modify the astra-db-java
version in your project’s build.gradle
or pom.xml
.
Because Astra DB is a cloud service, the Data API itself always runs the latest production version.
For information about upgrading a utility like curl, see the documentation for that tool.
Generate application tokens and set environment variables
The Data API requires your database’s API endpoint and application tokens with sufficient permissions to perform the requested operations.
Store application tokens and API endpoints in environment variables to simplify reuse in your scripts.
-
In the Astra Portal navigation menu, select your Serverless (Vector) database.
-
On the Overview tab, in the Database Details section, click Generate Token.
The generated token has a custom Database Administrator role that is scoped to this database only. For more information, see Generate an application token for a database and Custom roles.
-
Copy the token and store it securely. The Astra Portal shows the token only once.
-
In the Database Details section, copy your database’s Data API endpoint.
The Data API endpoint format is
https://DATABASE_ID-REGION.apps.astra.datastax.com
. If you aren’t using a client, be aware that DevOps API calls use a different endpoint format. -
(Optional) Create application tokens with other roles.
The Database Administrator role authorizes operations within a database, but you might need a broader or narrower role for other operations, for example:
-
For operations above the database level, such as creating databases or keyspaces, you need an application token with the Organization Administrator role.
-
For production applications that only read data from a database, consider using an application token with a narrower scope, such as the Read Only User role.
-
-
Set environment variables for your application tokens and Data API endpoint.
-
Linux or macOS
-
Windows
export ASTRA_DB_API_ENDPOINT=API_ENDPOINT export ASTRA_DB_APPLICATION_TOKEN=TOKEN
set ASTRA_DB_API_ENDPOINT=API_ENDPOINT
set ASTRA_DB_APPLICATION_TOKEN=TOKEN
-
Instantiate a client object
When you create apps using the Data API clients, you must instantiate a DataAPIClient
object.
The DataAPIClient
object serves as the entry point to the client hierarchy. It includes the following concepts:
Adjacent to these concepts are the administration classes for database administration. The specific administration classes you use, and how you instantiate them, depends on your client language and database type (Astra DB, HCD, or DSE).
You directly instantiate the DataAPIClient
object only.
Then, through the DataAPIClient
object, you can instantiate and access other classes and concepts.
Where necessary, instructions for instantiating other classes are provided in the command reference relevant to each class.
For more information about each Data API client’s hierarchy, see Python client usage, TypeScript client usage, and Java client usage.
-
Python
-
TypeScript
-
Java
-
curl
client = DataAPIClient("TOKEN")
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An application token for a database, in the form of
|
Returns:
DataAPIClient
: An instance of the client class.
Example response
DataAPIClient("AstraCS:aAbB...")
Script example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database0 = client.get_database("DB_API_ENDPOINT")
collection0 = database0.create_collection("movies", dimension=5)
collection0.insert_one({
"title": "The Title",
"$vector": [0.1, 0.3, -0.7, 0.9, -0.1],
})
admin = client.get_admin()
admin1 = client.get_admin(token=more_powerful_token_override)
database_iterator = admin.list_databases()
For more information, see the Client reference.
const client = new DataAPIClient('TOKEN');
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An application token for a database.
Tokens are prefixed with You can omit this and, instead, pass it with
|
|
The options to use for the client, including defaults. |
Options (DataAPIClientOptions
):
Name | Type | Summary |
---|---|---|
Sets the Data API backend to use (for example, Most operations are the same between backends. Authentication and available administration operations can differ. For more information, see the astra-db-ts README. |
||
Options related to the API requests the client makes. The
For more information on http clients, see the astra-db-ts README and the Client reference. |
||
Allows default options for when spawning a Db instance. |
||
Allows default options for when spawning some Admin instance. |
Returns:
DataAPIClient
- An instance of the client class.
Script example:
import { DataAPIClient } from '@datastax/astra-db-ts';
const client = new DataAPIClient('TOKEN');
const db1 = client.db('DB_API_ENDPOINT');
(async function () {
const coll = await db1.createCollection('movies');
const admin1 = client.admin();
const admin2 = client.admin({ adminToken: 'STRONGER_TOKEN' });
console.log(await coll.insertOne({ name: 'Airplane!' }));
console.log(await admin1.listDatabases());
})();
For information on setting up commands monitoring, see the astra-db-ts README.
For general information, see the Client reference.
// Default Initialization
DataAPIClient client = new DataAPIClient("TOKEN");
// Overriding default settings
DataAPIClient client = new DataAPIClient("TOKEN", DataAPIOptions.builder()
.withHttpConnectTimeout(20)
.withHttpRequestTimeout(20)
.build());
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An application token for a database, in the form of
|
|
A class wrapping the advanced configuration of the client, such as as |
Returns:
DataAPIClient
- An instance of the client class.
Script example:
package com.datastax.astra.client;
import java.util.UUID;
public class Connecting {
public static void main(String[] args) {
// Preferred Access with DataAPIClient (default options)
DataAPIClient client = new DataAPIClient("TOKEN");
// Overriding the default options
DataAPIClient client1 = new DataAPIClient("TOKEN", DataAPIOptions
.builder()
.withMaxTimeMS(10)
.withHttpConnectTimeout(10)
.build());
// Access the Database from its endpoint
Database db1 = client1.getDatabase("API_ENDPOINT");
Database db2 = client1.getDatabase("API_ENDPOINT", "KEYSPACE");
// Access the Database from its endpoint
UUID databaseId = UUID.fromString("f5abf92f-ff66-48a0-bbc2-d240bc25dc1f");
Database db3 = client.getDatabase(databaseId);
Database db4 = client.getDatabase(databaseId, "KEYSPACE");
Database db5 = client.getDatabase(databaseId, "KEYSPACE", "us-east-2");
db5.useNamespace("yet_another");
}
}
When interacting with the Data API directly through HTTP, you don’t create hierarchy objects.
For all Data API requests, you provide a database API endpoint URL that specifies the database that you want to interact with, and you provide an application token with sufficient permission to perform the desired operation on that specific database.
Depending on the command that you want to run, you specify target keyspaces, collections, tables, documents, or rows in the request path or command body.
For more information, see Connect to a database.
Connect to a database
You must connect to a database before you can work with the collections and documents in it. This operation is the basis of Data API calls. It is one of the first operations you need in any client script, and you’ll encounter these commands at the beginning of many code examples throughout the Astra DB Serverless documentation.
Astra DB Serverless databases organize data into collections within keyspaces.
When you connect to a database, you can specify a target keyspace, also referred to as the working keyspace.
If not provided, the default is default_keyspace
.
For information about using the Data API clients and Astra CLI for database administration, such as creating databases and keyspaces, see the Databases reference.
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Get a reference to an existing database with the working keyspace set to the default keyspace or a specific keyspace:
# Connect to a database by database endpoint
# Default keyspace, long form
database = client.get_database("API_ENDPOINT")
# Default keyspace, short form
database = client["API_ENDPOINT"]
# Explicit keyspace
database = client.get_database("API_ENDPOINT", keyspace="KEYSPACE_NAME")
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A database API endpoint URL, such as |
|
|
If supplied, is passed to the Database instead of the client token. |
|
|
By default, the working keyspace is |
Returns:
Database
- An instance of the Database class.
Example response (shortened for clarity)
Database(api_endpoint="https://012...datastax.com", token="AstraCS:aAbB...", keyspace="default_keyspace")
|
Script example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
# Connect to database
database = client.get_database("API_ENDPOINT")
# Run an operation on the database
collection = database.create_collection("movies", dimension=5)
collection.insert_one({
"title": "The Title",
"$vector": [0.1, 0.3, -0.7, 0.9, -0.1],
})
For more information, see the Client reference.
Get a reference to an existing database with the working keyspace set to the default keyspace or a specific keyspace:
// Connect to a database by database endpoint
// Default keyspace
const db = client.db('API_ENDPOINT');
// Explicit keyspace
const db = client.db('API_ENDPOINT', { keyspace: 'KEYSPACE_NAME' });
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A database API endpoint URL, such as |
|
The options to use for the database.
You can override |
Options (DbSpawnOptions
):
If you set any of these options through the client, then those values act as the actual defaults
for these options.
For example, if you set keyspace
in the client, that value is automatically used as the default keyspace instead of 'default_keyspace'
.
Name | Type | Summary |
---|---|---|
|
The keyspace to use for the database.
The defaults is |
|
|
Whether to monitor commands through |
|
|
Access token to use for the database.
The default is the client’s token.
Typically starts with |
|
|
Path to the Data API.
The default is |
Returns:
Db
- An unverified reference to the database.
Example response
Db { keyspace: 'default_keyspace' }
An instance of the Some subsequent operations with the database act on the working keyspace,
unless you pass a different keyspace in the method invocation.
Such operations include
You can use the |
Script example:
import { DataAPIClient } from '@datastax/astra-db-ts'
const client = new DataAPIClient('TOKEN');
// Connect to database by database endpoint and
// override default options for keyspace and token.
const db2 = client.db('API_ENDPOINT', {
keyspace: 'KEYSPACE_NAME',
token: 'WEAKER_TOKEN',
});
// Run an operation on the database
(async function () {
const collection = db1.collection('movies');
await collection.insertOne({ title: 'The Italian Job', $vector: [...] });
})();
For more information, see the Client reference.
Get a reference to an existing database with the working keyspace set to the default keyspace or a specific keyspace:
// Connect to a database by database endpoint
// Default keyspace
Database db = client.getDatabase(String apiEndpoint);
// Explicit keyspace
Database db = client.getDatabase(String apiEndpoint, String keyspace);
// Connect to a database by database ID and optional region
// Default keyspace
Database db = client.getDatabase(UUID databaseId, String region);
// Explicit keyspace
Database db = client.getDatabase(UUID databaseId, String keyspace, String region);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Either a database API endpoint URL, such as If you use the database ID, you can optionally specify a region for multi-region databases. The endpoint URL is constructed from the database ID and the default or specified region. |
|
|
The working keyspace to use.
If not provided, the default is |
|
|
The region to use for connecting to the database. The database must be deployed in that region. You can’t set the region parameter when you use an API endpoint instead of an ID. If you don’t set this parameter and the region can’t be inferred from an API endpoint, an additional DevOps API request is made to determine the default region and use it in subsequent operations. |
Returns:
Database
- An instance of the Database class.
Example response
com.datastax.astra.client.Database@378bf509
An instance of the Some subsequent operations with the database act on the working keyspace,
unless you pass a different keyspace in the method invocation.
Such operations include You can use the |
Script example:
package com.datastax.astra.client;
import java.util.UUID;
public class Connecting {
public static void main(String[] args) {
// Preferred Access with DataAPIClient (default options)
DataAPIClient client = new DataAPIClient("TOKEN");
// Overriding the default options
DataAPIClient client1 = new DataAPIClient("TOKEN", DataAPIOptions
.builder()
.withMaxTimeMS(10)
.withHttpConnectTimeout(10)
.build());
// Access the Database from its endpoint
Database db1 = client1.getDatabase("API_ENDPOINT");
Database db2 = client1.getDatabase("API_ENDPOINT", "KEYSPACE");
// Access the Database from its endpoint
UUID databaseId = UUID.fromString("f5abf92f-ff66-48a0-bbc2-d240bc25dc1f");
Database db3 = client.getDatabase(databaseId);
Database db4 = client.getDatabase(databaseId, "KEYSPACE");
Database db5 = client.getDatabase(databaseId, "KEYSPACE", "us-east-2");
db5.useNamespace("yet_another");
}
}
You inherently connect to a database when you send requests to the Data API.
A Data API curl request has the following structure:
curl -sS -L -X POST "ASTRA_DB_ENDPOINT/api/json/v1/KEYSPACE_NAME/COLLECTION_OR_TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{COMMAND_BODY}'
|
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The target database’s API endpoint URL. |
|
|
The target keyspace where you want to run the command.
The target keyspace is also known as the working keyspace.
All Serverless (Vector) databases have an initial default keyspace called |
|
|
The name of the collection or table where you want to run the command, if required for the requested command. |
|
|
An application token with sufficient permissions to perform the requested command on the specified database. For more information, see Generate application tokens and set environment variables. |
|
|
A JSON object that describes the command to run and all parameters related to that command. The JSON schema often includes consistent clauses, such as If the body is empty, the Data API returns an error because there was no command to run:
|
After you connect to a database, you can work with the collections, tables, documents, and rows in it. You can also programmatically manage databases and keyspaces.
Database terminology
Depending on your background and experience, you might be familiar with various terminology for database components. For example, structured databases use terms like tables, rows, and columns. Whereas semi-structured databases use collections, documents, and fields to refer to functionally similar or identical components.
In Astra DB, the terminology you encounter depends on the database types and features that you use. The following table explains some of these terms. Each set of terms describe similar components, but these components are not necessarily functional equivalents. For example, a single field of vector data doesn’t necessarily translate directly to a single column of structured, non-vector data.
Serverless (Vector) databases | Serverless (Non-Vector) databases and CQL | Description |
---|---|---|
Keyspace |
Keyspace |
A container for one or more collections or tables within a database. Namespace is a deprecated term for a keyspace in a Serverless (Vector) database. |
Collection |
Table |
A container for data within a database. The difference depends on the schema type:
|
Document |
Row |
A piece of data, having one or more properties, that is stored in a collection or table in a database. Document properties are stored in fields, and row properties are stored in columns. |
Field |
Column |
Any properties or attributes of data, such as vectors, identifiers, customer contact information, purchase history, account statuses, metadata, and so on. Properties can be stored as various data types including text, numbers, arrays, booleans, and so on. |
Naming conventions
Astra DB has the following naming conventions for databases, keyspaces, collections, tables, and vectorize API keys:
-
Must start and end with a letter or number.
-
Can contain uppercase letters A-Z, lowercase letters a-z, numbers 0-9, and underscores (
_
). Some components allow additional special characters. -
Must contain at least two characters.
-
Can’t exceed the maximum character limit for the entity type:
-
Keyspaces: 48 characters.
-
Databases, collections, tables, and vectorize API keys: 50 characters.
-
The Data API has the following naming conventions for document properties in collections:
-
Must start and end with a letter or an underscore (
_
). -
Can contain uppercase letters A-Z, lowercase letters a-z, numbers 0-9, and underscores (
_
). -
Must contain at least one character.
-
Can’t exceed 48 characters.
-
Can’t be exactly
_id
, which is reserved and interpreted as a document’s identity property.
The dollar sign ($
) is reserved for system-defined operator and property names, such as $exists
, $and
, $or
, and $vector
.
Data API limits
The Data API includes guardrails to ensure best practices, foster availability, and promote optimal configurations for your Astra DB Serverless databases.
In the following table, the term property refers to either a field of a document in a collection or a column in a table. For more information, see Database terminology. |
Entity | Limit | Notes |
---|---|---|
Number of collections per database |
5 or 10 (approx.) |
Indexes determine the number of collections you can have in a database. Serverless (Vector) databases created after June 24, 2024 can have approximately 10 collections. Databases created before this date can have approximately 5 collections. The collection limit is based on the number of indexes. For more information, see The indexing option. |
Number of tables per database |
50 |
A Serverless (Vector) database can have up to 50 tables. |
Page size |
20 |
For certain operations, a page can contain up to 20 documents or rows. After reaching the page limit, you can load any additional responses on the next page:
Some operations, such as
|
In-memory sort limit |
10,000 |
Operations that require fetching and sorting chunks of data can support no more than 10,000 rows in-memory. In this case, rows refers to either documents in collections or rows in tables. If your queries hit this limit, try restructuring your application to avoid running queries on excessively large chunks of data.
For example, in tables, you can adjust your table’s If you frequently hit this limit in collections, consider whether your data needs to be stored in tables, which can be more performant for large datasets. |
Maximum property name length |
100 |
Maximum of 100 characters in a property name. |
Maximum path length |
1,000 |
Maximum of 1,000 characters in a path name.
This is calculated as the total for all segments, including any dots ( |
Maximum indexed string size in bytes |
8,000 |
Maximum of 8,000 bytes (UTF-8 encoded) for |
Maximum number property length |
100 |
Maximum of 100 characters for the length of a |
Maximum elements per array |
1,000 |
Maximum number of elements in an array. This limit applies to indexed fields in collections only. |
Maximum vector dimensions |
4,096 |
Maximum size of dimensions you can define for a vector-enabled collection. |
Maximum properties per JSON object |
1,000 |
Maximum number of properties for a JSON object, including top-level properties and nested properties. This limit applies to JSON objects stored in indexed fields in collections only. A JSON object can have nested objects, also known as sub-documents. The maximum of 1,000 includes all each indexed properties in the main document and those in each sub-document, if any. For more information, see The indexing option. |
Maximum properties per JSON document |
2,000 |
The maximum number of properties allowed in a single JSON document is 2000, including intermediate properties and leaf properties. For example, the following document has three properties that apply to this limit:
|
Maximum document or row size in characters |
4 million |
A single document in a collection can have a maximum of 4 million characters. |
Maximum inserted batch size in characters |
20 million |
An entire batch of documents submitted through an |
Maximum number of deletions per transaction |
20 |
Maximum number of documents that can be deleted in each |
Maximum number of updates per transaction |
20 |
Maximum number of documents that can be updated in each |
Maximum number of insertions per transaction |
100 |
Maximum number of documents or rows that can be inserted in each |
Maximum size of |
100 |
When using the |
Maximum number of vector search results |
1,000 |
For vector search, the response is a single page of up to 1,000 documents or rows, unless you set a lower |
Exceeded limit returns 200 OK with error
If your request is valid but the command exceeds a limit, the Data API responds with HTTP 200 OK
and an error message.
It is also possible to receive a response containing both data and errors. Always inspect the response for error messages.
For example, if you exceed the per-transaction limit of 100 documents in an insertMany
command, the Data API response contains the following message:
{
"errors": [
{
"message": "Request invalid: field 'command.documents' value \"[...]\" not valid. Problem: amount of documents to insert is over the max limit (101 vs 100).",
"errorCode": "COMMAND_FIELD_INVALID"
}
]
}