Create a collection

This documentation reflects client version 1.5.x. For 2.x documentation, refer to the client reference for Python, TypeScript, or Java.

Creates a new collection in a Serverless (Vector) database.

Method signature

The signature of this command changed in Python client version 2.0-preview.

If you are using client version 2.0-preview or later, see the description of this change in Data API client upgrade guide.

The following method belongs to the astrapy.Database class.

python
create_collection(
  name: str,
  *,
  keyspace: str,
  dimension: int,
  metric: str,
  service: CollectionVectorServiceOptions | dict[str, Any],
  indexing: dict[str, Any],
  default_id_type: str,
  additional_options: dict[str, Any],
  check_exists: bool,
  max_time_ms: int,
  embedding_api_key: str | EmbeddingHeadersProvider,
  collection_max_time_ms: int,
) -> Collection

Result

Creates a collection with the specified parameters.

Returns a Collection object. You can use this object to work with documents in the collection.

Example response:

shell
Collection(name="COLLECTION_NAME", keyspace="default_keyspace", database=Database(api_endpoint="ASTRA_DB_API_ENDPOINT", token="APPLICATION_TOKEN", keyspace="default_keyspace"))

Parameters

The required and valid parameters depend on whether the collection will store vector data and your embedding generation method. For more information, see Manage collections and tables.

You can’t edit a collection’s parameters after you create the collection.

Name Type Summary

name

str

The name of the collection.

keyspace

Optional[str]

The keyspace where the collection is to be created. If not specified, the database’s working keyspace is used.

dimension

Optional[int]

For vector collections, the dimension of the vectors, which is the number of their components. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

metric

Optional[str]

The similarity metric used for vector searches. Allowed values are VectorMetric.DOT_PRODUCT, VectorMetric.EUCLIDEAN or VectorMetric.COSINE (default).

service

Optional[CollectionVectorServiceOptions]

The service definition for vector embeddings. Required for vector collections that generate embeddings automatically.

This is an instance of CollectionVectorServiceOptions, which defines the provider and model_name, and other optional settings, such as authentication. This parameter can also be a simple dictionary.

authentication is an object defining how to authenticate with the embedding provider. For example, {providerKey: "API_KEY_NAME"}, where API_KEY_NAME is the name of your embedding provider key in the Astra DB KMS.

indexing

Optional[Dict[str, Any]]

Optional specification for selective indexing of the collection, in the form of a dictionary such as {"deny": […​]} or {"allow": […​]}.

default_id_type

Optional[str]

Set the default ID type that the API server will generate when inserting documents that don’t explicitly specify an _id field. Can be set to any of the values DefaultIdType.UUID, DefaultIdType.OBJECTID, DefaultIdType.UUIDV6, DefaultIdType.UUIDV7, DefaultIdType.DEFAULT.

additional_options

Optional[Dict[str, Any]]

Any further set of key-value pairs that will be added to the "options" part of the payload when sending the Data API command to create a collection.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

embedding_api_key

Optional[str]

An alternative to authentication in CollectionVectorServiceOptions. Provide the API key directly instead of using an API key in the Astra DB KMS. The API key is passed to the Data API with each request in the form of an x-embedding-api-key HTTP header.

This parameter is not stored on the database, and it is used by the Collection instance only when issuing reads or writes on the collection.

This is useful for creating collections with an embedding service without specifying an authentication in the service configuration.

embedding_api_key overrides the Astra DB KMS API key if you set both.

collection_max_time_ms

Optional[int]

A default timeout, in milliseconds, for the duration of each operation on the collection. Individual timeouts can be provided to each collection method call and will take precedence, with this value being an overall default. Note that for some methods involving multiple API calls (such as delete_many and insert_many), you should provide a timeout with sufficient duration for the operation you’re performing. This parameter is not stored on the database, it is only used by the Collection instance when issuing reads or writes on the collection.

Examples

The following examples demonstrate how to create a collection.

Create a collection that is not vector-enabled:

python
collection = database.create_collection("COLLECTION_NAME")

Create a collection to store vector data and provide embeddings when you load data:

python
from astrapy.constants import VectorMetric

collection = database.create_collection(
    "COLLECTION_NAME",
    dimension=5,
    metric=VectorMetric.COSINE,
)

Create a new collection that generates vector embeddings automatically with vectorize.

To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.

As an alternative to Astra KMS authentication, you can do one of the following:

python
from astrapy.info import CollectionVectorServiceOptions
from astrapy.constants import VectorMetric

collection = database.create_collection(
    "COLLECTION_NAME",
    metric=VectorMetric.DOT_PRODUCT,
    dimension=1536,
    service=CollectionVectorServiceOptions(
        provider="openai",
        model_name="text-embedding-3-small",
        authentication={
            "providerKey": "API_KEY_NAME",
        },
    ),
)

Create a new collection with default document IDs of type ObjectID:

python
from astrapy.constants import DefaultIdType

collection = database.create_collection(
    "COLLECTION_NAME",
    default_id_type=DefaultIdType.OBJECTID,
)

Create a new collection with selective indexing:

python
collection = database.create_collection(
    "COLLECTION_NAME",
    indexing={"allow": ["city", "country"]},
)

Example:

python
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")

# Create a non-vector collection
collection_simple = database.create_collection("NON_VECTOR_COLLECTION_NAME")

# Create a vector collection
collection_vector = database.create_collection(
    "VECTOR_COLLECTION_NAME",
    dimension=3,
    metric=astrapy.constants.VectorMetric.COSINE,
)

# Create a collection with UUIDv6 as default IDs
from astrapy.constants import DefaultIdType, SortDocuments

collection_uuid6 = database.create_collection(
    "UUIDV6_COLLECTION_NAME",
    default_id_type=DefaultIdType.UUIDV6,
)

collection_uuid6.insert_one({"desc": "a document", "seq": 0})
collection_uuid6.insert_one({"_id": 123, "desc": "another", "seq": 1})
doc_ids = [
    doc["_id"]
    for doc in collection_uuid6.find({}, sort={"seq": SortDocuments.ASCENDING})
]
print(doc_ids)
#  Will print: [UUID('1eef29eb-d587-6779-adef-45b95ef13497'), 123]
print(doc_ids[0].version)
#  Will print: 6

Client reference

For more information, see the client reference.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com