Create a collection

Creates a new collection in a Serverless (Vector) database.

Method signature

  • Python

  • TypeScript

  • Java

  • curl

The signature of this command changed in Python client version 2.0-preview.

If you are using client version 2.0-preview or later, see the description of this change in Data API client upgrade guide.

database.create_collection(
  name: str,
  *,
  keyspace: str,
  dimension: int,
  metric: str,
  service: CollectionVectorServiceOptions | dict[str, Any],
  indexing: dict[str, Any],
  default_id_type: str,
  additional_options: dict[str, Any],
  check_exists: bool,
  max_time_ms: int,
  embedding_api_key: str | EmbeddingHeadersProvider,
  collection_max_time_ms: int,
) -> Collection
database.createCollection<Schema extends SomeDoc = SomeDoc>(
  collectionName: string,
  options?: {
    checkExists?: boolean,
    vector?: VectorOptions,
    indexing?: IndexingOptions<Schema>,
    keyspace?: string,
    defaultId?: DefaultIdOptions,
    embeddingApiKey?: string | EmbeddingHeadersProvider | null,
    defaultMaxTimeMS?: number | null,
    maxTimeMS?: number,
  }): Promise<Collection<Schema>>

The signature of this command changed in Java client version 2.0-preview.

If you are using client version 2.0-preview or later, see the description of this change in Data API client upgrade guide.

Collection<Document> createCollection(String collectionName)
Collection<Document> createCollection(
  String collectionName,
  int dimension,
  SimilarityMetric metric
)
<T> Collection<T> createCollection(
  String collectionName,
  int dimension,
  SimilarityMetric metric,
  Class<T> documentClass
)
<T> Collection<T> createCollection(
  String collectionName,
  Class<T> documentClass
)
Collection<Document> createCollection(
  String collectionName,
  CollectionOptions collectionOptions
)
<T> Collection<T> createCollection(
  String collectionName,
  CollectionOptions collectionOptions,
  Class<T> documentClass
)
Collection<Document> createCollection(
  String collectionName,
  CollectionOptions collectionOptions,
  CommandOptions<?> commandOptions
)
<T> Collection<T> createCollection(
  String collectionName,
  CollectionOptions collectionOptions,
  CommandOptions<?> commandOptions,
  Class<T> documentClass
)
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": OPTIONS
  }
}'

Result

  • Python

  • TypeScript

  • Java

  • curl

Creates a collection with the specified parameters.

Returns a Collection object. You can use this object to work with documents in the collection.

Example response:

Collection(name="COLLECTION_NAME", keyspace="default_keyspace", database=Database(api_endpoint="ASTRA_DB_API_ENDPOINT", token="APPLICATION_TOKEN", keyspace="default_keyspace"))

Creates a collection with the specified parameters.

Returns a promise that resolved to a Collection<Schema> object. You can use this object to work with documents in the collection.

A Collection is typed as Collection<Schema> where Schema is the user-defined type of the documents in the collection. If you provide a specific schema, operations on the collection are strongly typed. Otherwise, they are weakly typed.

Creates a collection with the specified parameters.

Returns a Collection object. You can use this object to work with documents in the collection.

Creates a collection with the specified parameters.

If the command succeeds, the response indicates the success.

Example response:

{
  "status": {
    "ok": 1
  }
}

Parameters

The required and valid parameters depend on whether the collection will store vector data and your embedding generation method. For more information, see Manage collections and tables.

You can’t edit a collection’s parameters after you create the collection.

  • Python

  • TypeScript

  • Java

  • curl

Name Type Summary

name

str

The name of the collection.

keyspace

Optional[str]

The keyspace where the collection is to be created. If not specified, the database’s working keyspace is used.

dimension

Optional[int]

For vector collections, the dimension of the vectors, which is the number of their components. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

metric

Optional[str]

The similarity metric used for vector searches. Allowed values are VectorMetric.DOT_PRODUCT, VectorMetric.EUCLIDEAN or VectorMetric.COSINE (default).

service

Optional[CollectionVectorServiceOptions]

The service definition for vector embeddings. Required for vector collections that generate embeddings automatically.

This is an instance of CollectionVectorServiceOptions, which defines the provider and model_name, and other optional settings, such as authentication. This parameter can also be a simple dictionary.

authentication is an object defining how to authenticate with the embedding provider. For example, {providerKey: "API_KEY_NAME"}, where API_KEY_NAME is the name of your embedding provider key in the Astra DB KMS.

indexing

Optional[Dict[str, Any]]

Optional specification for selective indexing of the collection, in the form of a dictionary such as {"deny": […​]} or {"allow": […​]}.

default_id_type

Optional[str]

Set the default ID type that the API server will generate when inserting documents that don’t explicitly specify an _id field. Can be set to any of the values DefaultIdType.UUID, DefaultIdType.OBJECTID, DefaultIdType.UUIDV6, DefaultIdType.UUIDV7, DefaultIdType.DEFAULT.

additional_options

Optional[Dict[str, Any]]

Any further set of key-value pairs that will be added to the "options" part of the payload when sending the Data API command to create a collection.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

embedding_api_key

Optional[str]

An alternative to authentication in CollectionVectorServiceOptions. Provide the API key directly instead of using an API key in the Astra DB KMS. The API key is passed to the Data API with each request in the form of an x-embedding-api-key HTTP header.

This parameter is not stored on the database, and it is used by the Collection instance only when issuing reads or writes on the collection.

This is useful for creating collections with an embedding service without specifying an authentication in the service configuration.

embedding_api_key overrides the Astra DB KMS API key if you set both.

collection_max_time_ms

Optional[int]

A default timeout, in milliseconds, for the duration of each operation on the collection. Individual timeouts can be provided to each collection method call and will take precedence, with this value being an overall default. Note that for some methods involving multiple API calls (such as delete_many and insert_many), you should provide a timeout with sufficient duration for the operation you’re performing. This parameter is not stored on the database, it is only used by the Collection instance when issuing reads or writes on the collection.

Name Type Summary

collectionName

string

The name of the collection to create.

vector?

CreateCollectionOptions<Schema>

The options for creating the collection.

  • dimension: The dimension for the vector in the collection.

  • metric: The similarity metric to use for vector search.

  • service.provider: The name of the embedding provider. Required for vector collections that generate embeddings automatically.

  • service.modelName: The model name for vector embeddings.

  • service.authentication: An object defining how to authenticate with the embedding provider. For example, {providerKey: 'API_KEY_NAME'}, where API_KEY_NAME is the name of your embedding provider key in the Astra DB KMS.

Name Type Summary

vector?

VectorOptions

The vector configuration for the collection, e.g. vector dimension & similarity metric. If not set, collection will not support vector search. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

indexing?

IndexingOptions<Schema>

The selective indexing configuration for the collection.

defaultId?

DefaultIdOptions

The defaultId configuration for the collection, for when a document does not specify an _id field.

keyspace?

string

Overrides the keyspace where the collection is created. If not set, the database’s working keyspace is used.

embeddingApiKey?

string

An alternative to service.authentication.providerKey for the embedding provider. Provide the API key directly instead of using an API key in the Astra DB KMS. embeddingApiKey overrides the Astra DB KMS API key if you set both.

defaultMaxTimeMS?

number

The default maxTimeMS for each operation on the Collection.

maxTimeMs?

number

Maximum time in milliseconds the client should wait for the operation to complete.

Name Type Summary

collectionName

String

The name of the collection.

dimension

int

The dimension for the vectors in the collection. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

metric

SimilarityMetric

The similarity metric to use for vector search: SimilarityMetric.cosine (default), SimilarityMetric.dot_product, or SimilarityMetric.euclidean.

collectionOptions

CollectionOptions

Fine-grained settings with vector, embedding provider, model name, authentication, selective indexing, and defaultId options.

clazz

Class<T>

Working with specialized beans for the collection and not the default Document type.

Name Type Summary

createCollection

command

The Data API command to create a collection in a Serverless (Vector) database. It acts as a container for all the attributes and settings required to create the collection.

name

string

The name of the new collection. This must be unique within the database specified in the request URL.

options.defaultId

object

(Optional) Controls how the Data API allocates an`_id` for each document that doesn’t specify an ID value in the request. For backwards compatibility with Data API releases before version 1.0.3, if you omit a defaultId option on createCollection, a document’s _id value is a plain string version of version 4 random-based UUID.

options.defaultId.type

string

If you include defaultId, you must include one of objectId, uuidv7, uuidv6, uuid.

options.vector

object

(Optional, recommended) Creates a vector-enabled collection.

Vector-enabled collections can store either vector or non-vector data. Collections that aren’t vector-enabled can’t store vector data.

options.vector.dimension

int

The dimension for vector embeddings in the collection. If you’re not sure what dimension to set, use the dimension vector your embeddings model produces. This can be optional for vectorize, if the specified vector.service.modelName has a default dimension value. For more information, see the documentation for your embedding provider integration.

options.vector.metric

string

The similarity metric to use for vector search: cosine (default), dot_product, or euclidean.

options.vector.service

object

(Optional) Configure a vectorize embedding provider integration.

options.vector.service.provider

string

The vectorize embedding provider name.

options.vector.service.modelName

string

A valid model name for the specified vectorize embedding provider.

options.vector.service.authentication

string

Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider. In options.vector.service.authentication.providerKey, provide the credential’s API Key name as given in Astra DB KMS.

Alternatively, you can omit the authentication object, and then provide the authentication key in an x-embedding-api-key header instead. If you use header authentication, you must provide the x-embedding-api-key header with every command that requires vectorize for this collection, including loading data and vector search with vectorize.

options.vector.service.parameters

object

Your embedding provider might require additional parameters. Use findEmbeddingProviders or see the documentation for your embedding provider integration.

options.indexing

object

(Optional) Enable selective indexing for data loaded to the collection. If you specify indexing, you must also specify either an allow or deny clause.

options.indexing.allow

array

Either allow or deny is required if you specify indexing. Provide an array of one or more properties to index. Alternatively, you can enter a wildcard "allow": ["*"] to index all properties during an update operation. This is the same as the default behavior if you omit indexing.

options.indexing.deny

array

Either allow or deny is required if you specify indexing. Provide an array of one or more properties to not index. If you enter a wildcard "deny": ["*"], then no properties are indexed during an update operation.

Examples

The following examples demonstrate how to create a collection.

  • Python

  • TypeScript

  • Java

  • curl

Create a collection that is not vector-enabled:

collection = database.create_collection("COLLECTION_NAME")

Create a collection to store vector data and provide embeddings when you load data:

from astrapy.constants import VectorMetric

collection = database.create_collection(
    "COLLECTION_NAME",
    dimension=5,
    metric=VectorMetric.COSINE,
)

Create a new collection that generates vector embeddings automatically with vectorize.

To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.

As an alternative to Astra KMS authentication, you can do one of the following:

  • Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.

  • Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see Vector and vectorize and the explanation of the embedding_api_key parameter in this command’s Parameters.

from astrapy.info import CollectionVectorServiceOptions
from astrapy.constants import VectorMetric

collection = database.create_collection(
    "COLLECTION_NAME",
    metric=VectorMetric.DOT_PRODUCT,
    dimension=1536,
    service=CollectionVectorServiceOptions(
        provider="openai",
        model_name="text-embedding-3-small",
        authentication={
            "providerKey": "API_KEY_NAME",
        },
    ),
)

Create a new collection with default document IDs of type ObjectID:

from astrapy.constants import DefaultIdType

collection = database.create_collection(
    "COLLECTION_NAME",
    default_id_type=DefaultIdType.OBJECTID,
)

Create a new collection with selective indexing:

collection = database.create_collection(
    "COLLECTION_NAME",
    indexing={"allow": ["city", "country"]},
)

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

# Create a non-vector collection
collection_simple = database.create_collection("NON_VECTOR_COLLECTION_NAME")

# Create a vector collection
collection_vector = database.create_collection(
    "VECTOR_COLLECTION_NAME",
    dimension=3,
    metric=astrapy.constants.VectorMetric.COSINE,
)

# Create a collection with UUIDv6 as default IDs
from astrapy.constants import DefaultIdType, SortDocuments

collection_uuid6 = database.create_collection(
    "UUIDV6_COLLECTION_NAME",
    default_id_type=DefaultIdType.UUIDV6,
)

collection_uuid6.insert_one({"desc": "a document", "seq": 0})
collection_uuid6.insert_one({"_id": 123, "desc": "another", "seq": 1})
doc_ids = [
    doc["_id"]
    for doc in collection_uuid6.find({}, sort={"seq": SortDocuments.ASCENDING})
]
print(doc_ids)
#  Will print: [UUID('1eef29eb-d587-6779-adef-45b95ef13497'), 123]
print(doc_ids[0].version)
#  Will print: 6
const collection = await db.createCollection('COLLECTION_NAME');

Create a new collection to store vector data.

const collection = await db.createCollection<Schema>('COLLECTION_NAME', {
  vector: {
    dimension: 5,
    metric: 'cosine',
  },
});

Create a new collection that generates vector embeddings automatically.

To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.

As an alternative to Astra KMS authentication, you can do one of the following:

  • Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.

  • Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see the explanation of the embeddingApiKey optional parameter in the Options table and Vector and vectorize.

const collection = await db.createCollection<Schema>('COLLECTION_NAME', {
  vector: {
    dimension: 1536,
    metric: 'dot_product',
    service: {
      provider: 'openai',
      modelName: 'text-embedding-3-small',
      authentication: {
        providerKey: 'API_KEY_NAME',
      },
    },
  },
});

Example:

import { DataAPIClient, VectorDoc } from '@datastax/astra-db-ts';

// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');

// Define the schema for the collection
interface User extends VectorDoc {
  name: string,
  age?: number,
}

(async function () {
  // Create a basic untyped non-vector collection
  const users1 = await db.createCollection('users');
  await users1.insertOne({ name: 'John' });

  // Typed collection with custom options in a non-default keyspace
  const users2 = await db.createCollection<User>('users', {
    keyspace: 'KEYSPACE_NAME',
    defaultId: {
      type: 'objectId',
    },
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
  });
  await users2.insertOne({ name: 'John' }, { sort: { $vector: [.12, .62, .87, .16, .72] } });
})();

See also:

Create a collection to store vector data.

Based on the collection parameters, you can provide embeddings when you load data or automatically generate embeddings with vectorize.

To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.

As an alternative to Astra KMS authentication, you can do one of the following:

  • Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.

  • Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see the explanation of the collectionOptions parameter in the Parameters table and Vector and vectorize.

// Given `db` Database object, create a new collection

// Create simple collection with given name.
Collection<Document> simple1 = db
  .createCollection(String collectionName);
Collection<MyBean> simple2 = db
  .createCollection(String collectionName, Class<MyBean> clazz);

// Create collections with vector options
Collection<Document> vector1 = createCollection(
  String collectionName,
  int dimension,
  SimilarityMetric metric);
Collection<MyBean> vector2 = createCollection(
  String collectionName,
  int dimension,
  SimilarityMetric metric,
  Class<MyBean> clazz);

// Full-Fledged CollectionOptions with a builder
Collection<Document> full1 = createCollection(
   String collectionName,
   CollectionOptions collectionOptions);
Collection<MyBean> full2 = createCollection(
   String collectionName,
   CollectionOptions collectionOptions,
   Class<MyBean> clazz);

Example:

package com.datastax.astra.client.database;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.CollectionIdTypes;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.SimilarityMetric;

public class CreateCollection {
  public static void main(String[] args) {

    Database db = new Database(
            System.getenv("ASTRA_DB_API_ENDPOINT"),
            System.getenv("ASTRA_DB_APPLICATION_TOKEN"));

    // Create a non-vector collection
    Collection<Document> simple1 = db.createCollection("col");

    // Default Id Collection
    Collection<Document> defaultId = db.createCollection("defaultId", CollectionOptions
            .builder()
            .defaultIdType(CollectionIdTypes.OBJECT_ID)
            .build());

    // -- Indexing
    Collection<Document> indexingDeny = db.createCollection("indexing1", CollectionOptions
              .builder()
              .indexingDeny("blob")
              .build());
    // Create a collection with indexing (allow) - cannot use allow and denay at the same time
    Collection<Document> indexingAllow = db.createCollection("allow1", CollectionOptions
            .builder()
            .indexingAllow("metadata")
            .build());

    // Vector
    Collection<Document> vector1 = db.createCollection("vector1", 14, SimilarityMetric.DOT_PRODUCT);

    // Create a vector collection
    Collection<Document> vector2 = db.createCollection("vector2", CollectionOptions
      .builder()
      .vectorDimension(1536)
      .vectorSimilarity(SimilarityMetric.EUCLIDEAN)
      .build());

    // Create a collection for the db
    Collection<Document> collection_vectorize_header = db.createCollection(
            "collection_vectorize_header",
            // Create collection with a Service in vectorize (No API KEY)
            CollectionOptions.builder()
                    .vectorDimension(1536)
                    .vectorSimilarity(SimilarityMetric.DOT_PRODUCT)
                    .vectorize("openai", "text-embedding-ada-002")
                    .build());

    // Create a collection for the db
    Collection<Document> collection_vectorize_shared_key = db.createCollection(
            "collection_vectorize_shared_key",
            // Create collection with a Service in vectorize (No API KEY)
            CollectionOptions.builder()
                    .vectorDimension(1536)
                    .vectorSimilarity(SimilarityMetric.DOT_PRODUCT)
                    .vectorize("openai", "text-embedding-ada-002", "OPENAI_API_KEY" )
                    .build());



  }
}

Create a collection that isn’t vector-enabled:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {}
  }
}' | jq

Create a vector-enabled collection where you plan to provide embeddings when you load data. This example also sets the defaultID type for documents loaded into the collection.

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "defaultId": {
        "type": "uuidv7"
      },
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  }
}' | jq

Create a vector-enabled collection that automatically generates embeddings with vectorize.

To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.

As an alternative to Astra KMS authentication, you can do one of the following:

  • Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.

  • Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see the explanation for options.vector.service.authentication in the Parameters table and Vector and vectorize.

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": 1536,
        "metric": "cosine",
        "service": {
          "provider": "openai",
          "modelName": "text-embedding-3-small",
          "authentication": {
            "providerKey": "ASTRA_KMS_API_KEY_NAME"
          }
        }
      }
    }
  }
}' | jq

Client reference

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the client reference.

For more information, see the client reference.

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com