Create a collection

Creates a new collection in a Serverless (Vector) database.

Method signature

  • Python

  • TypeScript

  • Java

  • curl

The signature of this command changed in Python client version 2.0.

If you are using an earlier version, DataStax recommends upgrading to the latest version. For more information, see Data API client upgrade guide.

The following method belongs to the astrapy.Database class.

create_collection(
  name: str,
  *,
  definition: CollectionDefinition | dict[str, Any] | None,
  document_type: type[Any],
  keyspace: str,
  collection_admin_timeout_ms: int,
  embedding_api_key: str | EmbeddingHeadersProvider,
  spawn_api_options: APIOptions,
) -> Collection

Most astrapy objects have an asynchronous counterpart, for use within the asyncio framework. To get an AsyncCollection, use the create_collection method of instances of AsyncDatabase, or alternatively the to_async method of the synchronous Collection class.

See the AsyncCollection client reference for details about the async API.

The following method belongs to the Db class.

async createCollection<Schema extends SomeDoc = SomeDoc>(
  name: string,
  options?: {
    vector?: CollectionVectorOptions,
    indexing?: CollectionIndexingOptions<Schema>,
    defaultId?: CollectionDefaultIdOptions,
    lexical?: CollectionLexicalOptions,
    rerank?: CollectionRerankOptions,
    logging?: DataAPILoggingConfig,
    keyspace?: string,
    embeddingApiKey?: string | EmbeddingHeadersProvider,
    serdes?: CollectionSerDesConfig,
    timeoutDefaults?: TimeoutDescriptor,
    timeout?: number | TimeoutDescriptor,
  }
): Collection<Schema>

The following methods belong to the com.datastax.astra.client.Database class.

Collection<Document> createCollection(String collectionName)
Collection<Document> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition
)
Collection<Document> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition,
  CreateCollectionOptions options
)
<T> Collection<T> createCollection(
  String collectionName,
  Class<T> documentClass
)
<T>  Collection<T> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition,
  Class<T> documentClass
)
<T> Collection<T> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition,
  Class<T> documentClass,
  CreateCollectionOptions options
)
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": OPTIONS
  }
}'

Result

  • Python

  • TypeScript

  • Java

  • curl

Creates a collection with the specified parameters.

Returns a Collection object. You can use this object to work with documents in the collection.

Unless you specify the document_type parameter, the collection is typed as Collection[dict]. For more information, see Typing support.

Example response:

Collection(name="COLLECTION_NAME", keyspace="default_keyspace", database.api_endpoint="ASTRA_DB_API_ENDPOINT", api_options=FullAPIOptions(token=StaticTokenProvider("APPLICATION_TOKEN"...), ...))

Creates a collection with the specified parameters.

Returns a promise that resolves to a Collection<Schema> object. You can use this object to work with documents in the collection.

A Collection is typed as Collection<Schema>, where Schema defaults to SomeDoc (Record<string, any>). Providing the specific Schema type enables stronger typing for collection operations. For more information, see Typing Collections and Tables.

Creates a collection with the specified parameters.

Returns a Collection object.

You can use this object to work with documents in the collection.

Creates a collection with the specified parameters.

If the command succeeds, the response indicates the success.

Example response:

{
  "status": {
    "ok": 1
  }
}

Parameters

The required and valid parameters depend on whether the collection will store vector data and your embedding generation method. For more information, see Manage collections and tables.

You can’t edit a collection’s definition after you create the collection.

  • Python

  • TypeScript

  • Java

  • curl

Name Type Summary

name

str

The name of the collection to create.

definition

CollectionDefinition

The full configuration for the collection. See the CollectionDefinition table for more details.

You can define definition in a CollectionDefinition object, or you can use the fluent interface of CollectionDefinition.

Plain Python dictionaries can be passed for definition as well, provided they mirror the structure of CollectionDefinition objects.

document_type

type

Optional. A formal specifier for the type checker. If provided, document_type must match the type hint specified in the assignment. For more information, see Typing support.

Default: Collection[dict]

keyspace

str

The keyspace in which to create the collection.

Default: The general keyspace setting for the database.

collection_admin_timeout_ms

int

A timeout, in milliseconds, to impose on the underlying API request. If not provided, the corresponding Database defaults apply.

embedding_api_key

str | EmbeddingHeadersProvider

Optional. This only applies to collections with a vectorize embedding provider integration.

This secret is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize.

spawn_api_options

APIOptions

A complete or partial specification of the APIOptions to override the defaults inherited from the Database. Use this to customize the interaction of the Python client with the collection. For example, you can change the serdes options or default timeouts. If APIOptions is passed together with a named parameter (such as a timeout), the latter takes precedence over the corresponding spawn_api_options setting.

Properties of CollectionDefinition
Name Type Summary

vector

CollectionVectorOptions

Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled.

Required for vector search and hybrid search.

See the examples below.

lexical

CollectionLexicalOptions

Optional. The lexical search configuration for the collection.

The CollectionLexicalOptions object has the following attributes:

  • enabled: A boolean describing whether to enable lexical search for the collection. Use this to disable lexical search for the collection. Required to support hybrid search.

  • analyzer: A string describing a built-in analyzer, or a JSON object describing an analyzer configuration.

    Strings must be one of: standard.

    JSON objects must follow the specifications in Find data with CQL analyzers.

    Currently, only the standard lucene analyzer is supported. This corresponds to the value of "standard".

Only collections in databases in the AWS us-east-2 region support this parameter.

See the examples below.

Default: A CollectionLexicalOptions object with an enabled value of True and an analyzer value of "standard". This means that lexical search is enabled by default.

rerank

CollectionRerankOptions

Optional. The reranker configuration for the collection.

The CollectionRerankOptions object has the following attributes:

Only collections in databases in the AWS us-east-2 region support this parameter.

See the examples below.

Default: A RerankServiceOptions object with an enabled value of True and a service value corresponding to the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model. This means that reranking is enabled by default.

indexing

dict | None

The selective indexing configuration for the collection.

See the examples below.

Default: All fields of all documents.

default_id

CollectionDefaultIDOptions

The defaultId configuration for the collection. This is used when you insert a document without an _id field.

See the examples below.

Name Type Summary

name

string

The name of the collection to create.

Name Type Summary

vector?

CollectionVectorOptions

The vector configuration for the collection, e.g. vector dimension & similarity metric. Required to support vector search. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

lexical?

CollectionLexicalOptions

Optional. The lexical search configuration for the collection.

The CollectionLexicalOptions object has the following attributes:

  • enabled: A boolean describing whether to enable lexical search for the collection. Use this to disable lexical search for the collection. Required to support hybrid search.

  • analyzer: A string describing a built-in analyzer, or a JSON object describing an analyzer configuration.

    Strings must be one of: standard.

    JSON objects must follow the specifications in Find data with CQL analyzers.

    Currently, only the standard lucene analyzer is supported. This corresponds to the value of "standard".

Only collections in databases in the AWS us-east-2 region support this parameter.

See the examples below.

Default: A CollectionLexicalOptions object with enabled: true and analyzer: "STANDARD". This means that lexical search is enabled by default.

rerank?

CollectionRerankOptions

Optional. The reranker configuration for the collection.

The CollectionRerankOptions object has the following attributes:

Only collections in databases in the AWS us-east-2 region support this parameter.

See the examples below.

Default: A RerankServiceOptions object with enabled: true and a service value corresponding to the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model. This means that reranking is enabled by default.

indexing?

CollectionIndexingOptions<Schema>

The selective indexing configuration for the collection.

defaultId?

CollectionDefaultIdOptions

The defaultId configuration for the collection, for when a document does not specify an _id field.

embeddingApiKey?

string | EmbeddingHeadersProvider

An alternative to service.authentication.providerKey for the embedding provider.

Provides the API key directly via headers instead of using an API key in the Astra DB KMS.

embeddingApiKey overrides the Astra DB KMS API key if you set both.

keyspace?

string

Overrides the keyspace where the collection is created. If not set, the database’s working keyspace is used.

logging?

string

The configuration for logging events emitted by the DataAPIClient.

serdes?

string

The configuration for logging events emitted by the DataAPIClient.

For more information, see Custom Ser/Des

timeoutDefaults

TimeoutDescriptor

Optional.

The default timeout(s) to apply to operations performed on this Collection instance. You can specify requestTimeoutMs, generalMethodTimeoutMs, and collectionAdminTimeoutMs.

Details about the timeoutDefaults parameter

The default timeout options for any operation performed on this Collection instance.

The TimeoutDescriptor object can contain these properties:

  • requestTimeoutMs (number): The maximum time, in milliseconds, that the client should wait for each underlying HTTP request. Default: 10 seconds.

  • generalMethodTimeoutMs (number): The maximum time, in milliseconds, that the whole operation, which may involve multiple HTTP requests, can take. Default: 30 seconds.

  • collectionAdminTimeoutMs (number): The maximum time, in milliseconds, for collection admin operations like creating, dropping, and listing collections. Default: 60 seconds.

timeout

number | TimeoutDescriptor

Optional.

The timeout to apply to this method.

Only collectionAdminTimeoutMs applies to this method. This is the maximum time, in milliseconds, for collection admin operations like creating, dropping, and listing collections.

Default: 60 seconds, unless you specified a different default along the Options Hierarchy.

Name Type Summary

collectionName

String

The name of the collection.

collectionDefinition

CollectionDefinition

Settings for the collection, including vector options, the default ID format, and indexing options.

options

CreateCollectionOptions

Options for the operation, including the keyspace.

documentClass

Class<T>

Working with specialized beans for the collection and not the default Document type.

Name Type Summary

createCollection

command

The Data API command to create a collection in a Serverless (Vector) database. It acts as a container for all the attributes and settings required to create the collection.

name

string

The name of the new collection. This must be unique within the database specified in the request URL.

options.defaultId

object

Optional. Controls how the Data API allocates an`_id` for each document that doesn’t specify an ID value in the request. For backwards compatibility with Data API releases before version 1.0.3, if you omit a defaultId option on createCollection, a document’s _id value is a plain string version of version 4 random-based UUID.

options.defaultId.type

string

If you include defaultId, you must include one of objectId, uuidv7, uuidv6, uuid.

options.vector

object

Optional. Recommended. Creates a vector-enabled collection.

Vector-enabled collections can store either vector or non-vector data. Collections that aren’t vector-enabled can’t store vector data.

options.vector.dimension

int

The dimension for vector embeddings in the collection. If you’re not sure what dimension to set, use the dimension vector your embeddings model produces. This can be optional for vectorize, if the specified vector.service.modelName has a default dimension value. For more information, see the documentation for your embedding provider integration.

options.vector.metric

string

The similarity metric to use for vector search: cosine (default), dot_product, or euclidean.

options.vector.service

object

Optional. Configure a vectorize embedding provider integration.

options.vector.service.provider

string

The vectorize embedding provider name.

options.vector.service.modelName

string

A valid model name for the specified vectorize embedding provider.

options.vector.service.authentication

string

Optional. Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider. In options.vector.service.authentication.providerKey, provide the credential’s API Key name as given in Astra DB KMS.

Alternatively, you can omit the authentication object, and then provide the authentication key in an x-embedding-api-key header instead. If you use header authentication, you must provide the x-embedding-api-key header with every command that requires vectorize for this collection, including inserting data and vector search with vectorize.

options.vector.service.parameters

object

Optional. Your embedding provider might require additional parameters. Use findEmbeddingProviders or see the documentation for your embedding provider integration.

options.lexical

object

Optional. The lexical search configuration for the collection.

Only collections in databases in the AWS us-east-2 region support this parameter.

options.lexical.enabled

boolean

Optional. Whether to enable lexical search for the collection. Required to support hybrid search.

Default: True

options.lexical.analyzer

string | object

Optional.

A string describing a built-in analyzer, or a JSON object describing an analyzer configuration.

Strings must be one of: standard.

JSON objects must follow the specifications in Find data with CQL analyzers.

Currently, only the standard lucene analyzer is supported. This corresponds to the value of "standard".

Default: standard

options.rerank

object

Optional. The reranker configuration for the collection.

Only collections in databases in the AWS us-east-2 region support this parameter.

options.rerank.enabled

boolean

Optional. Whether to enable reranking for the collection.

Required to support hybrid search.

Default: True

options.rerank.service

object

Optional.

A JSON object describing a reranker configuration.

options.rerank.service.provider

string

The name of the reranking provider. Currently, only Nvidia is supported.

Default: Nvidia

options.rerank.service.modelName

string

The name of a reranking model supported by the reranking provider. Currently, only nvidia/llama-3.2-nv-rerankqa-1b-v2 is supported.

Default: nvidia/llama-3.2-nv-rerankqa-1b-v2

options.indexing

object

Optional. Enable selective indexing for data inserted to the collection. If you specify indexing, you must also specify either an allow or deny clause.

options.indexing.allow

array

Either allow or deny is required if you specify indexing. Provide an array of one or more properties to index. Alternatively, you can enter a wildcard "allow": ["*"] to index all properties during an update operation. This is the same as the default behavior if you omit indexing.

options.indexing.deny

array

Either allow or deny is required if you specify indexing. Provide an array of one or more properties to not index. If you enter a wildcard "deny": ["*"], then no properties are indexed during an update operation.

Examples

The following examples demonstrate how to create a collection.

Create a collection that is not vector-enabled

  • Python

  • TypeScript

  • Java

  • curl

from astrapy import DataAPIClient

# Get a database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection = database.create_collection("COLLECTION_NAME")
  • Typed collections

  • Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

// Define the type for the collection
interface User {
  name: string,
  age?: number,
}

// Create a collection
(async function () {
  const collection = await database.createCollection<User>("COLLECTION_NAME");
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

// Create a collection
(async function () {
  const collection = await database.createCollection("COLLECTION_NAME");
})();
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;

public class CreateCollection {

    public static void main(String[] args) {
        // Get a database
        Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT");

        // Create a collection
        Collection<Document> collection = database.createCollection("COLLECTION_NAME");
    }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {}
  }
}'

Create a collection can store vector embeddings

Collections that are vector-enabled can store vector embeddings and work with vector search.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CollectionDefinition, CollectionVectorOptions

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        dimension=1024,
        metric=VectorMetric.COSINE,
    ),
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(1024)
    .set_vector_metric(VectorMetric.COSINE)
    .build()
)

collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
  • Typed collections

  • Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector as an inline field in your interfaces, or you can extend the utility VectorDoc type provided by the client.

import { DataAPIClient, VectorDoc } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

// Define the type for the collection
interface User extends VectorDoc {
  name: string,
  age?: number,
}

(async function () {
  const collection = await database.createCollection<User>("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
    },
  });
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, or type-related issues will occur.

Consider using a type like VectorDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector field to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
    },
  });
})();
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;

public class CreateCollection {

    public static void main(String[] args) {
        // Get a database
        Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT");

        // Create a collection
        CollectionDefinition collectionDefinition = new CollectionDefinition()
            .vectorDimension(1024)
            .vectorSimilarity(SimilarityMetric.COSINE);

        Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
    }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": 1024,
        "metric": "cosine"
      }
    }
  }
}'

Create a collection that can automatically generate vector embeddings

If you want to automatically generate vector embeddings, create a vector-enabled collection and configure an embedding provider integration for the collection.

The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.

You can also store pre-generated vector embeddings in the collection. If you store pre-generated and automatically generated embeddings in the same collection, make sure all embeddings have the same provider, model, and dimensions. Mismatched embeddings can cause inaccurate vector searches.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="PROVIDER",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters=PARAMETERS,
        )
    )
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="PROVIDER",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        },
        parameters=PARAMETERS,
    )
    .build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
  • Typed collections

  • Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inlines fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient, VectorizeDoc } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

(async function () {
  const collection = await database.createCollection<User>("COLLECTION_NAME", {
    vector: {
      dimension: MODEL_DIMENSIONS,
      metric: "SIMILARITY_METRIC",
      service: {
        provider: "PROVIDER",
        modelName: "MODEL_NAME",
        authentication: {
          providerKey: "API_KEY_NAME",
        },
        parameters: PARAMETERS,
      },
    },
  });
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    vector: {
      dimension: MODEL_DIMENSIONS,
      metric: "SIMILARITY_METRIC",
      service: {
        provider: "PROVIDER",
        modelName: "MODEL_NAME",
        authentication: {
          providerKey: "API_KEY_NAME",
        },
        parameters: PARAMETERS,
      },
    },
  });
})();
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;

public class CreateCollection {

    public static void main(String[] args) {
        // Get a database
        Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT");

        // Create a collection
        CollectionDefinition collectionDefinition = new CollectionDefinition()
            .vectorDimension(MODEL_DIMENSIONS)
            .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
            .vectorize(
                "PROVIDER",
                "MODEL_NAME",
                "API_KEY_NAME",
                PARAMETERS
            );

        Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
    }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "PROVIDER",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          },
          "parameters": PARAMETERS
        }
      }
    }
  }
}'

If you want to perform hybrid search on your collection, you must create a collection that has vector, lexical, and rerank enabled. Your collection must also be in a database in the AWS us-east-2 region.

Lexical and rerank are enabled by default when you create a collection in a database in the AWS us-east-2 region, but you can optionally configure the lexical analyzer and the reranker model.

For configuration details about the lexical analyzer, see Find data with CQL analyzers. Currently, only the standard lucene analyzer is supported.

For configuration details about the reranker model, inspect the available reranker models. Currently, only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model reranker model is supported.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionLexicalOptions,
    CollectionRerankOptions,
    CollectionVectorOptions,
    RerankServiceOptions,
    VectorServiceOptions,
)

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.COSINE,
        dimension=1024,
        service=VectorServiceOptions(
            provider="nvidia",
            model_name="NV-Embed-QA",
        )
    ),
    lexical=CollectionLexicalOptions(
        analyzer="standard",
        enabled=True,
    ),
    rerank=CollectionRerankOptions(
        enabled=True,
        service=RerankServiceOptions(
            provider="nvidia",
            model_name="nvidia/llama-3.2-nv-rerankqa-1b-v2",
        ),
    ),
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(1024)
    .set_vector_metric(VectorMetric.COSINE)
    .set_vector_service(
        provider="nvidia",
        model_name="NV-Embed-QA",
    )
    .set_lexical("standard", enabled=True)
    .set_rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2")
    .build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
  • Typed collections

  • Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector, $vectorize, and $lexical as inlines fields in your interfaces, or you can extend the utility VectorDoc, VectorizeDoc, and LexicalDoc types provided by the client.

import { DataAPIClient, LexicalDoc, VectorizeDoc } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

// Define the type for the collection
interface User extends VectorizeDoc, LexicalDoc {
  name: string,
  age?: number,
}

(async function () {
  const collection = await database.createCollection<User>("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
      service: {
          provider: "nvidia",
          modelName: "NV-Embed-QA",
      },
    },
    lexical: {
      enabled: true,
      analyzer: "STANDARD",
    },
    rerank: {
      enabled: true,
      service: {
        provider: "nvidia",
        modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
      },
    },
  });
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize and $lexical fields must still be a string, or type-related issues will occur.

Consider using a type like VectorDoc & LexicalDoc & SomeDoc or VectorizeDoc & LexicalDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector, $vectorize, and $lexical to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
      service: {
          provider: "nvidia",
          modelName: "NV-Embed-QA",
      },
    },
    lexical: {
      enabled: true,
      analyzer: "STANDARD",
    },
    rerank: {
      enabled: true,
      service: {
        provider: "nvidia",
        modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
      },
    },
  });
})();

The Java client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

package com.examples;

import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.rerank.CollectionRerankOptions;
import com.datastax.astra.client.core.rerank.RerankServiceOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vector.VectorOptions;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;

import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;

public class CreateCollection {

  public static void main(String[] args) {
   // Get a database
   Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
     .getDatabase("ASTRA_DB_API_ENDPOINT");

   // Create a collection
   CollectionDefinition collectionDefinition = new CollectionDefinition();

   // Vector Options
   VectorServiceOptions vectorService = new VectorServiceOptions()
     .provider( "nvidia")
     .modelName("NV-Embed-QA");
   VectorOptions vectorOptions = new VectorOptions()
     .dimension(1536)
     .metric(SimilarityMetric.COSINE.getValue())
     .service(vectorService);
    def.vector(vectorOptions);

    // Lexical Options
    LexicalOptions lexicalOptions = new LexicalOptions()
      .enabled(true)
      .analyzer(new Analyzer(STANDARD));
    def.lexical(lexicalOptions);

    // Rerank Options
   RerankServiceOptions rerankService = new RerankServiceOptions()
      .modelName("nvidia/llama-3.2-nv-rerankqa-1b-v2")
      .provider("nvidia");
   CollectionRerankOptions rerankOptions = new CollectionRerankOptions()
      .enabled(true)
      .service(rerankService);
   def.rerank(rerankOptions);

   database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}
package com.examples;

import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.rerank.CollectionRerankOptions;
import com.datastax.astra.client.core.rerank.RerankServiceOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vector.VectorOptions;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;

import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;

public class CreateCollection {

  public static void main(String[] args) {
   // Get a database
   Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
     .getDatabase("ASTRA_DB_API_ENDPOINT");

   database.createCollection("COLLECTION_NAME",
       new CollectionDefinition()
        .vector(1536, SimilarityMetric.COSINE)
        .vectorize("nvidia", "NV-Embed-QA")
        .lexical(STANDARD)
        .rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2"));
     }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "lexical": {
        "analyzer": "standard",
        "enabled": true
      },
      "rerank": {
        "enabled": true,
        "service": {
          "modelName": "nvidia/llama-3.2-nv-rerankqa-1b-v2",
          "provider": "nvidia"
        }
      },
      "vector": {
        "dimension": 1024,
        "metric": "cosine",
        "service": {
          "provider": "nvidia",
          "modelName": "NV-Embed-QA"
        }
      }
    }
  }
}'

Create a collection and specify the default ID format

For more information about the default ID format, see Document IDs.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

from astrapy import DataAPIClient
from astrapy.info import (
    CollectionDefinition,
    CollectionDefaultIDOptions,
)
from astrapy.constants import DefaultIdType

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    default_id=CollectionDefaultIDOptions(DefaultIdType.OBJECTID),
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import DefaultIdType

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder()
    .set_default_id(DefaultIdType.OBJECTID)
    .build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
  • Typed collections

  • Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

The _id field type should match the defaultId type.

import { DataAPIClient, ObjectId } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

// Define the type for the collection
interface User {
  _id: ObjectId,
  name: string,
  age?: number,
}

(async function () {
  const collection = await database.createCollection<User>("COLLECTION_NAME", {
    defaultId: {
      type: "objectId",
    },
  });
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

However, if you later specify _id when you insert a document, DataStax recommends that it has the same type as the defaultId.

Consider using a type like { id: ObjectId } & SomeDoc which allows the documents to remain untyped, but still statically requires the _id field to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    defaultId: {
      type: "objectId",
    },
  });
})();
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefaultIdTypes;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;

public class CreateCollection {

    public static void main(String[] args) {
        // Get a database
        Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT");

        // Create a collection
        CollectionDefinition collectionDefinition = new CollectionDefinition()
            .defaultId(CollectionDefaultIdTypes.OBJECT_ID);

        Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
    }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "defaultId": {
        "type": "uuidv7"
      }
    }
  }
}'

Create a collection and specify which fields to index

For more information about selective indexing, see Indexes in collections.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    indexing={"allow": ["city", "country"]},
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder()
    .set_indexing("allow", ["city", "country"])
    .build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    indexing: {
      allow: ["city", "country"],
    },
  });
})();
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;

public class CreateCollection {

    public static void main(String[] args) {
        // Get a database
        Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT");

        // Create a collection
        CollectionDefinition collectionDefinition = new CollectionDefinition()
            .indexingAllow("city", "country");

        Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
    }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "indexing": {
        "allow": ["city", "country"]
      }
    }
  }
}'

Create a collection and specify which fields shouldn’t be indexed

For more information about selective indexing, see Indexes in collections.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a collection:

  • You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.

  • You can use a fluent interface to build the collection definition and then create the collection from the definition.

  • CollectionDefinition object

  • Fluent interface

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    indexing={"deny": ["city", "country"]},
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "ASTRA_DB_API_ENDPOINT",
    token="ASTRA_DB_APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder()
    .set_indexing("deny", ["city", "country"])
    .build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)
import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    indexing: {
      deny: ["city", "country"],
    },
  });
})();
package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.collections.CollectionDefaultIdTypes;

public class CreateCollection {

    public static void main(String[] args) {
        // Get a database
        Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
            .getDatabase("ASTRA_DB_API_ENDPOINT");

        // Create a collection
        CollectionDefinition collectionDefinition = new CollectionDefinition()
            .indexingDeny("city", "country");

        Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
    }
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "indexing": {
        "deny": ["city", "country"]
      }
    }
  }
}'

Client reference

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the client reference.

For more information, see the client reference.

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com