Get started with the Data API

The Data API is a schema-less, document-based, modern API that provides programmatic access to data in your Serverless (Vector) databases.

You can use the Data API and its clients to create databases, keyspaces, collections, and tables, and then perform CRUD operations on data in your collections and tables, including complex filters and vector searches that return similarity scores.

The Data API provides an entry point for application development with Astra DB Serverless databases, including a variety of GenAI ecosystem integrations like LangChain, LlamaIndex, and embedding providers. It leverages the scalability, performance, and real-time indexing capabilities of Apache Cassandra® to support GenAI application development.

Prerequisites

  • An active Astra account.

  • An active Serverless (Vector) database.

    The Data API supports collections and tables in Serverless (Vector) databases. This includes semi-structured collections and structured table data that you would otherwise interact with through the CQL shell or a driver.

    The Data API doesn’t support Serverless (Non-Vector) databases.

Install a client or curl

You can interact with the Data API directly through HTTP or use one of the Astra DB Data API clients for Python, TypeScript, or Java:

Language Client Version Dependency Documentation

Python

astrapy

Latest astrapy release on GitHub

Python 3.9 or later

Get started with the Data API

TypeScript

astra-db-ts

Latest astra-db-ts release on GitHub

Node.js 18 or later

Get started with the Data API

Java

astra-db-java

Latest astra-db-java release on Maven Central

Java 17 or later (21 recommended)

Get started with the Data API

Use a package manager to install a client library or a tool to handle HTTP API calls.

  • Python

  • TypeScript

  • Java

  • curl

Install the Python client with pip:

  1. Verify that pip is version 23.0 or later:

    pip --version
  2. If needed, upgrade pip:

    python -m pip install --upgrade pip
  3. Install the astrapy package Latest astrapy release on GitHub. You must have Python 3.9 or later.

    pip install astrapy

Install the TypeScript client:

  1. Verify that Node is version 18 or later:

    node --version
  2. Install astra-db-ts Latest astra-db-ts release on GitHub with your preferred package manager:

    • npm

    • Yarn

    • pnpm

    npm install @datastax/astra-db-ts
    yarn add @datastax/astra-db-ts
    pnpm add @datastax/astra-db-ts

Install the Java client with Maven or Gradle.

  • Maven

  • Gradle

  1. Install Java 17 or later (21 recommended) and Maven 3.9 or later.

  2. Create a pom.xml file in the root of your project, and then replace VERSION with the latest version of astra-db-java Latest astra-db-java release on Maven Central.

    pom.xml
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                                 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.example</groupId>
      <artifactId>test-java-client</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <!-- The Java client -->
      <dependencies>
        <dependency>
          <groupId>com.datastax.astra</groupId>
          <artifactId>astra-db-java</artifactId>
          <version>VERSION</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
              <executable>java</executable>
              <mainClass>com.example.Quickstart</mainClass>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>
  1. Install Gradle and Java 17 or later (21 recommended).

  2. Create a build.gradle file in the root of your project:

    build.gradle
    plugins {
        id 'java'
        id 'application'
    }
    
    repositories {
        mavenCentral()
    }
    
    dependencies {
        implementation 'com.datastax.astra:astra-db-java:1.+'
    }
    
    application {
        mainClassName = 'com.example.Quickstart'
    }

To interact with the Data API directly through HTTP, you need to install a tool to handle API calls, such as curl.

Upgrade a client

When a new client version is released, upgrade your client to get the latest features, improvements, and bug fixes. For information about major changes in specific client versions, see Data API client upgrade guide.

  • Python

  • TypeScript

  • Java

  • curl

# verbose
pip install astrapy --upgrade

# shorthand
pip install -U astrapy

To upgrade to a preview version, you must pass the --pre flag to the package manager:

pip install --upgrade --pre astrapy

Reinstall the TypeScript client at the latest version.

For the TypeScript client, DataStax recommends reinstallation instead of upgrade and update commands, such as npm update @datastax/astra-db-ts, that install the latest version allowed by your package.json. If your package.json pins a version that is earlier than the actual latest version, then upgrade and update won’t install the actual latest version.

# npm
npm install @datastax/astra-db-ts@latest

# Yarn
yarn add @datastax/astra-db-ts@latest

# pnpm
pnpm add @datastax/astra-db-ts@latest

To upgrade to a preview version, you must pass the @next flag to the package manager:

# npm
npm install @datastax/astra-db-ts@next

# Yarn
yarn add @datastax/astra-db-ts@next

# pnpm
pnpm add @datastax/astra-db-ts@next

To upgrade your Java client, modify the astra-db-java version in your project’s build.gradle or pom.xml.

Because Astra DB is a cloud service, the Data API itself always runs the latest production version.

For information about upgrading a utility like curl, see the documentation for that tool.

Generate application tokens and set environment variables

The Data API requires your database’s API endpoint and application tokens with sufficient permissions to perform the requested operations.

Store application tokens and API endpoints in environment variables to simplify reuse in your scripts.

  1. In the Astra Portal navigation menu, select your Serverless (Vector) database.

  2. On the Overview tab, in the Database Details section, click Generate Token.

    The generated token has a custom Database Administrator role that is scoped to this database only. For more information, see Generate an application token for a database and Custom roles.

  3. Copy the token and store it securely. The Astra Portal shows the token only once.

  4. In the Database Details section, copy your database’s Data API endpoint.

    The Data API endpoint format is https://DATABASE_ID-REGION.apps.astra.datastax.com. If you aren’t using a client, be aware that DevOps API calls use a different endpoint format.

  5. (Optional) Create application tokens with other roles.

    The Database Administrator role authorizes operations within a database, but you might need a broader or narrower role for other operations, for example:

    • For operations above the database level, such as creating databases or keyspaces, you need an application token with the Organization Administrator role.

    • For production applications that only read data from a database, consider using an application token with a narrower scope, such as the Read Only User role.

  6. Set environment variables for your application tokens and Data API endpoint.

    • Linux or macOS

    • Windows

    export ASTRA_DB_API_ENDPOINT=API_ENDPOINT
    export ASTRA_DB_APPLICATION_TOKEN=TOKEN
    set ASTRA_DB_API_ENDPOINT=API_ENDPOINT
    set ASTRA_DB_APPLICATION_TOKEN=TOKEN

Instantiate a client object

When you create apps using the Data API clients, you must instantiate a DataAPIClient object.

The DataAPIClient object serves as the entry point to the client hierarchy. It includes the following concepts:

Adjacent to these concepts are the administration classes for database administration. The specific administration classes you use, and how you instantiate them, depends on your client language and database type (Astra DB, HCD, or DSE).

You directly instantiate the DataAPIClient object only. Then, through the DataAPIClient object, you can instantiate and access other classes and concepts. Where necessary, instructions for instantiating other classes are provided in the command reference relevant to each class.

For more information about each Data API client’s hierarchy, see Python client usage, TypeScript client usage, and Java client usage.

  • Python

  • TypeScript

  • Java

  • curl

client = DataAPIClient("TOKEN")

Parameters:

Name Type Summary

token

str

An application token for a database, in the form of AstraCS:…​.

DataAPIClient objects aren’t specific to a database. Therefore, when you instantiate a client, you must provide an application token that has adequate permissions to perform the desired operations on the target databases.

Returns:

DataAPIClient: An instance of the client class.

Example response
DataAPIClient("AstraCS:aAbB...")

Script example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database0 = client.get_database("DB_API_ENDPOINT")
collection0 = database0.create_collection("movies", dimension=5)
collection0.insert_one({
    "title": "The Title",
    "$vector": [0.1, 0.3, -0.7, 0.9, -0.1],
})
admin = client.get_admin()
admin1 = client.get_admin(token=more_powerful_token_override)
database_iterator = admin.list_databases()

For more information, see the Client reference.

const client = new DataAPIClient('TOKEN');

Parameters:

Name Type Summary

token?

string

An application token for a database. Tokens are prefixed with AstraCS:.

You can omit this and, instead, pass it with client.db() or client.admin() through the token parameter.

DataAPIClient objects aren’t specific to a database. Therefore, when you instantiate a client, you must provide an application token that has adequate permissions to perform the desired operations on the target databases.

options?

DataAPIClientOptions

The options to use for the client, including defaults.

Name Type Summary

environment?

DataAPIEnvironment

Sets the Data API backend to use (for example, dse, hcd, astra). The default is astra.

Most operations are the same between backends. Authentication and available administration operations can differ. For more information, see the astra-db-ts README.

httpOptions?

DataAPIHttpOptions

Options related to the API requests the client makes.

The DataAPIHttpOptions type is a discriminated union on the client field. There are four available behaviors for the client field:

  • httpOptions not set: Use fetch-h2 if available or fall back to fetch.

  • client: 'default' or unset: Use fetch-h2 if available or throw an error.

  • client: 'fetch': Only use the native fetch API.

  • client: 'custom': Pass a custom Fetcher implementation to the client.

fetch-h2 is typically available by default on node runtimes only. On other runtimes, you might need to use the native fetch API or, if your code is minified, pass in the fetch-h2 module manually.

For more information on http clients, see the astra-db-ts README and the Client reference.

dbOptions?

DbSpawnOptions

Allows default options for when spawning a Db instance.

adminOptions?

AdminSpawnOptions

Allows default options for when spawning some Admin instance.

Returns:

DataAPIClient - An instance of the client class.

Script example:

import { DataAPIClient } from '@datastax/astra-db-ts';

const client = new DataAPIClient('TOKEN');

const db1 = client.db('DB_API_ENDPOINT');

(async function () {
  const coll = await db1.createCollection('movies');

  const admin1 = client.admin();
  const admin2 = client.admin({ adminToken: 'STRONGER_TOKEN' });

  console.log(await coll.insertOne({ name: 'Airplane!' }));
  console.log(await admin1.listDatabases());
})();

For information on setting up commands monitoring, see the astra-db-ts README.

For general information, see the Client reference.

// Default Initialization
DataAPIClient client = new DataAPIClient("TOKEN");

// Overriding default settings
DataAPIClient client = new DataAPIClient("TOKEN", DataAPIOptions.builder()
  .withHttpConnectTimeout(20)
  .withHttpRequestTimeout(20)
  .build());

Parameters:

Name Type Summary

token

String

An application token for a database, in the form of AstraCS:…​.

DataAPIClient objects aren’t specific to a database. Therefore, when you instantiate a client, you must provide an application token that has adequate permissions to perform the desired operations on the target databases.

options

DataAPIOptions

A class wrapping the advanced configuration of the client, such as as HttpClient settings like timeouts.

Returns:

DataAPIClient - An instance of the client class.

Script example:

package com.datastax.astra.client;

import java.util.UUID;

public class Connecting {
    public static void main(String[] args) {
        // Preferred Access with DataAPIClient (default options)
        DataAPIClient client = new DataAPIClient("TOKEN");

        // Overriding the default options
        DataAPIClient client1 = new DataAPIClient("TOKEN", DataAPIOptions
                .builder()
                .withMaxTimeMS(10)
                .withHttpConnectTimeout(10)
                .build());

        // Access the Database from its endpoint
        Database db1 = client1.getDatabase("API_ENDPOINT");
        Database db2 = client1.getDatabase("API_ENDPOINT", "KEYSPACE");

        // Access the Database from its endpoint
        UUID databaseId = UUID.fromString("f5abf92f-ff66-48a0-bbc2-d240bc25dc1f");
        Database db3 = client.getDatabase(databaseId);
        Database db4 = client.getDatabase(databaseId, "KEYSPACE");
        Database db5 = client.getDatabase(databaseId, "KEYSPACE", "us-east-2");

        db5.useNamespace("yet_another");

    }
}

When interacting with the Data API directly through HTTP, you don’t create hierarchy objects.

For all Data API requests, you provide a database API endpoint URL that specifies the database that you want to interact with, and you provide an application token with sufficient permission to perform the desired operation on that specific database.

Depending on the command that you want to run, you specify target keyspaces, collections, tables, documents, or rows in the request path or command body.

For more information, see Connect to a database.

Connect to a database

You must connect to a database before you can work with the collections and documents in it. This operation is the basis of Data API calls. It is one of the first operations you need in any client script, and you’ll encounter these commands at the beginning of many code examples throughout the Astra DB Serverless documentation.

Astra DB Serverless databases organize data into collections within keyspaces. When you connect to a database, you can specify a target keyspace, also referred to as the working keyspace. If not provided, the default is default_keyspace.

For information about using the Data API clients and Astra CLI for database administration, such as creating databases and keyspaces, see the Databases reference.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Get a reference to an existing database with the working keyspace set to the default keyspace or a specific keyspace:

# Connect to a database by database endpoint
# Default keyspace, long form
database = client.get_database("API_ENDPOINT")
# Default keyspace, short form
database = client["API_ENDPOINT"]
# Explicit keyspace
database = client.get_database("API_ENDPOINT", keyspace="KEYSPACE_NAME")

Parameters:

Name Type Summary

api_endpoint

str

A database API endpoint URL, such as \https://DATABASE_ID-REGION.apps.astra.datastax.com.

token

Optional[str]

If supplied, is passed to the Database instead of the client token.

keyspace

Optional[str]

By default, the working keyspace is default_keyspace. Include keyspace to specify a different working keyspace.

Returns:

Database - An instance of the Database class.

Example response (shortened for clarity)
Database(api_endpoint="https://012...datastax.com", token="AstraCS:aAbB...", keyspace="default_keyspace")
  • An instance of the Database class always has a notion of working keyspace. You can pass a keyspace explicitly or use the system default of default_keyspace.

    Some subsequent operations with the database act on the working keyspace, unless you pass a different keyspace in the method invocation. Such operations include get_collection, create_collection, list_collection_names, list_collections, and command.

    You can use the use_keyspace method to change the Database instance’s working keyspace, for example: my_database.use_keyspace("my_other_keyspace"). This doesn’t change the parent keyspace for any collections previously created from that Database instance.

  • Most astrapy objects have an asynchronous counterpart that you can use within the asyncio framework. To get an AsyncDatabase, clients expose a get_async_database method. Likewise, synchronous Databases have a to_async method as well.

Script example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")

# Connect to database
database = client.get_database("API_ENDPOINT")

# Run an operation on the database
collection = database.create_collection("movies", dimension=5)
collection.insert_one({
    "title": "The Title",
    "$vector": [0.1, 0.3, -0.7, 0.9, -0.1],
})

For more information, see the Client reference.

Get a reference to an existing database with the working keyspace set to the default keyspace or a specific keyspace:

// Connect to a database by database endpoint
// Default keyspace
const db = client.db('API_ENDPOINT');
// Explicit keyspace
const db = client.db('API_ENDPOINT', { keyspace: 'KEYSPACE_NAME' });

Parameters:

Name Type Summary

endpoint

string

A database API endpoint URL, such as \https://DATABASE_ID-REGION.apps.astra.datastax.com.

options?

DbSpawnOptions

The options to use for the database. You can override keyspace in method calls.

Options (DbSpawnOptions):

If you set any of these options through the client, then those values act as the actual defaults for these options. For example, if you set keyspace in the client, that value is automatically used as the default keyspace instead of 'default_keyspace'.

Name Type Summary

keyspace?

string

The keyspace to use for the database. The defaults is 'default_keyspace'. You can override this in method calls.

monitorCommands?

boolean

Whether to monitor commands through CommandEvents, using the client as an event emitter. Defaults to false.

token?

string

Access token to use for the database. The default is the client’s token. Typically starts with AstraCS:.

dataApiPath?

string

Path to the Data API. The default is 'api/json/v1'.

Returns:

Db - An unverified reference to the database.

Example response
Db { keyspace: 'default_keyspace' }

An instance of the Db class always has a notion of working keyspace. You can pass a keyspace explicitly or use the system default of 'default_keyspace'.

Some subsequent operations with the database act on the working keyspace, unless you pass a different keyspace in the method invocation. Such operations include collection, createCollection, dropCollection, listCollections, and command.

You can use the useKeyspace method to change the Db instance’s working keyspace, for example: db.useKeyspace('my_other_keyspace'). This doesn’t change the parent keyspace for any collections previously created from that Db instance.

Script example:

import { DataAPIClient } from '@datastax/astra-db-ts'

const client = new DataAPIClient('TOKEN');

// Connect to database by database endpoint and
// override default options for keyspace and token.
const db2 = client.db('API_ENDPOINT', {
  keyspace: 'KEYSPACE_NAME',
  token: 'WEAKER_TOKEN',
});

// Run an operation on the database
(async function () {
  const collection = db1.collection('movies');
  await collection.insertOne({ title: 'The Italian Job', $vector: [...] });
})();

For more information, see the Client reference.

Get a reference to an existing database with the working keyspace set to the default keyspace or a specific keyspace:

// Connect to a database by database endpoint
// Default keyspace
Database db = client.getDatabase(String apiEndpoint);
// Explicit keyspace
Database db = client.getDatabase(String apiEndpoint, String keyspace);

// Connect to a database by database ID and optional region
// Default keyspace
Database db = client.getDatabase(UUID databaseId, String region);
// Explicit keyspace
Database db = client.getDatabase(UUID databaseId, String keyspace, String region);

Parameters:

Name Type Summary

apiEndpoint or databaseId

String or UUID

Either a database API endpoint URL, such as \https://DATABASE_ID-REGION.apps.astra.datastax.com, or a database ID. DataStax recommends using the database API endpoint.

If you use the database ID, you can optionally specify a region for multi-region databases. The endpoint URL is constructed from the database ID and the default or specified region.

keyspace

String

The working keyspace to use. If not provided, the default is default_keyspace.

region

String

The region to use for connecting to the database. The database must be deployed in that region. You can’t set the region parameter when you use an API endpoint instead of an ID. If you don’t set this parameter and the region can’t be inferred from an API endpoint, an additional DevOps API request is made to determine the default region and use it in subsequent operations.

Returns:

Database - An instance of the Database class.

Example response
com.datastax.astra.client.Database@378bf509

An instance of the Database class always has a notion of working keyspace. You can pass a keyspace explicitly or use the system default of default_keyspace.

Some subsequent operations with the database act on the working keyspace, unless you pass a different keyspace in the method invocation. Such operations include getCollection, createCollection, listCollectionNames, listCollections, and runCommand.

You can use the useKeyspace method to change the Database instance’s working keyspace, for example: db.useKeyspace("my_other_keyspace");. This doesn’t change the parent keyspace for any collections previously created from that Database instance.

Script example:

package com.datastax.astra.client;

import java.util.UUID;

public class Connecting {
    public static void main(String[] args) {
        // Preferred Access with DataAPIClient (default options)
        DataAPIClient client = new DataAPIClient("TOKEN");

        // Overriding the default options
        DataAPIClient client1 = new DataAPIClient("TOKEN", DataAPIOptions
                .builder()
                .withMaxTimeMS(10)
                .withHttpConnectTimeout(10)
                .build());

        // Access the Database from its endpoint
        Database db1 = client1.getDatabase("API_ENDPOINT");
        Database db2 = client1.getDatabase("API_ENDPOINT", "KEYSPACE");

        // Access the Database from its endpoint
        UUID databaseId = UUID.fromString("f5abf92f-ff66-48a0-bbc2-d240bc25dc1f");
        Database db3 = client.getDatabase(databaseId);
        Database db4 = client.getDatabase(databaseId, "KEYSPACE");
        Database db5 = client.getDatabase(databaseId, "KEYSPACE", "us-east-2");

        db5.useNamespace("yet_another");

    }
}

You inherently connect to a database when you send requests to the Data API.

A Data API curl request has the following structure:

curl -sS -L -X POST "ASTRA_DB_ENDPOINT/api/json/v1/KEYSPACE_NAME/COLLECTION_OR_TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{COMMAND_BODY}'
  • Data API HTTP requests always use the POST method, regardless of the actual CRUD operation performed by the command.

  • The curl examples in the Astra DB Serverless documentation include some optional arguments, such as -s, -sS, and | jq. You can omit or modify these arguments as needed.

Parameters:

Name Type Summary

ASTRA_DB_ENDPOINT

string

The target database’s API endpoint URL.

KEYSPACE_NAME

string

The target keyspace where you want to run the command. The target keyspace is also known as the working keyspace. All Serverless (Vector) databases have an initial default keyspace called default_keyspace.

COLLECTION_OR_TABLE_NAME

string

The name of the collection or table where you want to run the command, if required for the requested command.

APPLICATION_TOKEN

string

An application token with sufficient permissions to perform the requested command on the specified database. For more information, see Generate application tokens and set environment variables.

REQUEST_BODY

object

A JSON object that describes the command to run and all parameters related to that command.

The JSON schema often includes consistent clauses, such as command, sort, filter, and options, but the allowed parameters can vary significantly across commands.

If the body is empty, the Data API returns an error because there was no command to run:

{
  "errors": [
    {
      "message":"Request invalid: \
        field 'command' value `null` not valid. \
        Problem: must not be null.",
      "errorCode":"COMMAND_FIELD_INVALID"
    }
  ]
}

After you connect to a database, you can work with the collections, tables, documents, and rows in it. You can also programmatically manage databases and keyspaces.

Database terminology

Depending on your background and experience, you might be familiar with various terminology for database components. For example, structured databases use terms like tables, rows, and columns. Whereas semi-structured databases use collections, documents, and fields to refer to functionally similar or identical components.

In Astra DB, the terminology you encounter depends on the database types and features that you use. The following table explains some of these terms. Each set of terms describe similar components, but these components are not necessarily functional equivalents. For example, a single field of vector data doesn’t necessarily translate directly to a single column of structured, non-vector data.

Serverless (Vector) databases Serverless (Non-Vector) databases and CQL Description

Keyspace

Keyspace

A container for one or more collections or tables within a database.

Namespace is a deprecated term for a keyspace in a Serverless (Vector) database.

Collection

Table

A container for data within a database.

The difference depends on the schema type:

  • Collections use dynamic schemas and store data in documents. With a dynamic schema, each document can have different fields. Collections are best for semi-structured data.

  • Tables use fixed schemas and store data in rows. With a fixed schema, all rows must have the same columns, and every column must have a value (which can be null). Tables are best for structured data.

Document

Row

A piece of data, having one or more properties, that is stored in a collection or table in a database.

Document properties are stored in fields, and row properties are stored in columns.

Field

Column

Any properties or attributes of data, such as vectors, identifiers, customer contact information, purchase history, account statuses, metadata, and so on. Properties can be stored as various data types including text, numbers, arrays, booleans, and so on.

Naming conventions

Astra DB has the following naming conventions for databases, keyspaces, collections, tables, and vectorize API keys:

  • Must start and end with a letter or number.

  • Can contain uppercase letters A-Z, lowercase letters a-z, numbers 0-9, and underscores (_). Some components allow additional special characters.

  • Must contain at least two characters.

  • Can’t exceed the maximum character limit for the entity type:

    • Keyspaces: 48 characters.

    • Databases, collections, tables, and vectorize API keys: 50 characters.

The Data API has the following naming conventions for document properties in collections:

  • Must start and end with a letter or an underscore (_).

  • Can contain uppercase letters A-Z, lowercase letters a-z, numbers 0-9, and underscores (_).

  • Must contain at least one character.

  • Can’t exceed 48 characters.

  • Can’t be exactly _id, which is reserved and interpreted as a document’s identity property.

The dollar sign ($) is reserved for system-defined operator and property names, such as $exists, $and, $or, and $vector.

Data API limits

The Data API includes guardrails to ensure best practices, foster availability, and promote optimal configurations for your Astra DB Serverless databases.

In the following table, the term property refers to either a field of a document in a collection or a column in a table. For more information, see Database terminology.

Entity Limit Notes

Number of collections per database

5 or 10 (approx.)

Indexes determine the number of collections you can have in a database.

Serverless (Vector) databases created after June 24, 2024 can have approximately 10 collections. Databases created before this date can have approximately 5 collections. The collection limit is based on the number of indexes. For more information, see The indexing option.

Number of tables per database

50

A Serverless (Vector) database can have up to 50 tables.

Page size

20

For certain operations, a page can contain up to 20 documents or rows. After reaching the page limit, you can load any additional responses on the next page:

  • For clients, you must iterate over a cursor.

  • For HTTP, you must use the nextPageState ID returned by paginated Data API responses.

Some operations, such as deleteMany and vector ANN search, don’t return a cursor or nextPageState, even if there are additional matches:

  • For vector ANN search, the response is a single page of up to 1,000 documents, unless you set a lower limit.

  • For deleteMany on collections, HTTP requests delete up to 20 documents per request, and then return a count but not a nextPageState. Reissue the HTTP request until the response indicates that fewer than 20 documents were deleted. The Data API clients automatically issue multiple HTTP requests until all matching documents are deleted.

  • Some other operations may not trigger pagination, such as certain combinations of sort and filter operations. For more information, see Sort clauses for documents and Sort clauses for rows.

In-memory sort limit

10,000

Operations that require fetching and sorting chunks of data can support no more than 10,000 rows in-memory. In this case, rows refers to either documents in collections or rows in tables.

If your queries hit this limit, try restructuring your application to avoid running queries on excessively large chunks of data. For example, in tables, you can adjust your table’s partitionSort in the primary key for more efficient clustering.

If you frequently hit this limit in collections, consider whether your data needs to be stored in tables, which can be more performant for large datasets.

Maximum property name length

100

Maximum of 100 characters in a property name.

Maximum path length

1,000

Maximum of 1,000 characters in a path name. This is calculated as the total for all segments, including any dots (.) between properties in a path.

Maximum indexed string size in bytes

8,000

Maximum of 8,000 bytes (UTF-8 encoded) for string length in an indexed field in a collection. The Data API uses UTF-8 encoding regardless of the original encoding in the request.

Maximum number property length

100

Maximum of 100 characters for the length of a number type value.

Maximum elements per array

1,000

Maximum number of elements in an array. This limit applies to indexed fields in collections only.

Maximum vector dimensions

4,096

Maximum size of dimensions you can define for a vector-enabled collection.

Maximum properties per JSON object

1,000

Maximum number of properties for a JSON object, including top-level properties and nested properties. This limit applies to JSON objects stored in indexed fields in collections only.

A JSON object can have nested objects, also known as sub-documents. The maximum of 1,000 includes all each indexed properties in the main document and those in each sub-document, if any. For more information, see The indexing option.

Maximum properties per JSON document

2,000

The maximum number of properties allowed in a single JSON document is 2000, including intermediate properties and leaf properties.

For example, the following document has three properties that apply to this limit: root, root.branch, and root.branch.leaf.

{
  "root": {
    "branch": {
      "leaf": 42
    }
  }
}

Maximum document or row size in characters

4 million

A single document in a collection can have a maximum of 4 million characters.

Maximum inserted batch size in characters

20 million

An entire batch of documents submitted through an insertMany or updateMany command can have up to 20 million characters.

Maximum number of deletions per transaction

20

Maximum number of documents that can be deleted in each deleteMany HTTP transaction against a collection.

Maximum number of updates per transaction

20

Maximum number of documents that can be updated in each updateMany HTTP transaction against a collection.

Maximum number of insertions per transaction

100

Maximum number of documents or rows that can be inserted in each insertMany HTTP transaction.

Maximum size of _id values array

100

When using the $in operator to send an array of _id values, the maximum size of the array is 100. This limit applies to operations on collections only because _id is a reserved field for collections.

Maximum number of vector search results

1,000

For vector search, the response is a single page of up to 1,000 documents or rows, unless you set a lower limit.

Exceeded limit returns 200 OK with error

If your request is valid but the command exceeds a limit, the Data API responds with HTTP 200 OK and an error message.

It is also possible to receive a response containing both data and errors. Always inspect the response for error messages.

For example, if you exceed the per-transaction limit of 100 documents in an insertMany command, the Data API response contains the following message:

{
  "errors": [
    {
      "message": "Request invalid: field 'command.documents' value \"[...]\" not valid. Problem: amount of documents to insert is over the max limit (101 vs 100).",
      "errorCode": "COMMAND_FIELD_INVALID"
    }
  ]
}

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com