Create a collection
Creates a new collection in a Serverless (Vector) database.
Method signature
-
Python
-
TypeScript
-
Java
-
curl
The signature of this command changed in Python client version 2.0-preview. If you are using client version 2.0-preview or later, see the description of this change in Data API client upgrade guide. |
database.create_collection(
name: str,
*,
keyspace: str,
dimension: int,
metric: str,
service: CollectionVectorServiceOptions | dict[str, Any],
indexing: dict[str, Any],
default_id_type: str,
additional_options: dict[str, Any],
check_exists: bool,
max_time_ms: int,
embedding_api_key: str | EmbeddingHeadersProvider,
collection_max_time_ms: int,
) -> Collection
database.createCollection<Schema extends SomeDoc = SomeDoc>(
collectionName: string,
options?: {
checkExists?: boolean,
vector?: VectorOptions,
indexing?: IndexingOptions<Schema>,
keyspace?: string,
defaultId?: DefaultIdOptions,
embeddingApiKey?: string | EmbeddingHeadersProvider | null,
defaultMaxTimeMS?: number | null,
maxTimeMS?: number,
}): Promise<Collection<Schema>>
The signature of this command changed in Java client version 2.0-preview. If you are using client version 2.0-preview or later, see the description of this change in Data API client upgrade guide. |
Collection<Document> createCollection(String collectionName)
Collection<Document> createCollection(
String collectionName,
int dimension,
SimilarityMetric metric
)
<T> Collection<T> createCollection(
String collectionName,
int dimension,
SimilarityMetric metric,
Class<T> documentClass
)
<T> Collection<T> createCollection(
String collectionName,
Class<T> documentClass
)
Collection<Document> createCollection(
String collectionName,
CollectionOptions collectionOptions
)
<T> Collection<T> createCollection(
String collectionName,
CollectionOptions collectionOptions,
Class<T> documentClass
)
Collection<Document> createCollection(
String collectionName,
CollectionOptions collectionOptions,
CommandOptions<?> commandOptions
)
<T> Collection<T> createCollection(
String collectionName,
CollectionOptions collectionOptions,
CommandOptions<?> commandOptions,
Class<T> documentClass
)
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": OPTIONS
}
}'
Result
-
Python
-
TypeScript
-
Java
-
curl
Creates a collection with the specified parameters.
Returns a Collection
object.
You can use this object to work with documents in the collection.
Example response:
Collection(name="COLLECTION_NAME", keyspace="default_keyspace", database=Database(api_endpoint="ASTRA_DB_API_ENDPOINT", token="APPLICATION_TOKEN", keyspace="default_keyspace"))
Creates a collection with the specified parameters.
Returns a promise that resolved to a Collection<Schema>
object.
You can use this object to work with documents in the collection.
A Collection
is typed as Collection<Schema>
where Schema
is the user-defined type of the documents in the collection.
If you provide a specific schema, operations on the collection are strongly typed.
Otherwise, they are weakly typed.
Creates a collection with the specified parameters.
Returns a Collection
object.
You can use this object to work with documents in the collection.
Creates a collection with the specified parameters.
If the command succeeds, the response indicates the success.
Example response:
{
"status": {
"ok": 1
}
}
Parameters
The required and valid parameters depend on whether the collection will store vector data and your embedding generation method. For more information, see Manage collections and tables.
You can’t edit a collection’s parameters after you create the collection. |
-
Python
-
TypeScript
-
Java
-
curl
Name | Type | Summary |
---|---|---|
|
|
The name of the collection. |
|
|
The keyspace where the collection is to be created. If not specified, the database’s working keyspace is used. |
|
|
For vector collections, the dimension of the vectors, which is the number of their components. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces. |
|
|
The similarity metric used for vector searches. Allowed values are |
|
|
The service definition for vector embeddings. Required for vector collections that generate embeddings automatically. This is an instance of
|
|
|
Optional specification for selective indexing of the collection, in the form of a dictionary such as |
|
|
Set the default ID type that the API server will generate when inserting documents that don’t explicitly specify an |
additional_options |
|
Any further set of key-value pairs that will be added to the "options" part of the payload when sending the Data API command to create a collection. |
|
|
A timeout, in milliseconds, for the underlying HTTP request. |
|
|
An alternative to This parameter is not stored on the database, and it is used by the This is useful for creating collections with an embedding service without specifying an
|
|
|
A default timeout, in milliseconds, for the duration of each operation on the collection.
Individual timeouts can be provided to each collection method call and will take precedence,
with this value being an overall default. Note that for some methods involving multiple API calls
(such as |
Name | Type | Summary |
---|---|---|
|
|
The name of the collection to create. |
|
The options for creating the collection.
|
Options (CreateCollectionOptions
):
Name | Type | Summary |
---|---|---|
The vector configuration for the collection, e.g. vector dimension & similarity metric. If not set, collection will not support vector search. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces. |
||
The selective indexing configuration for the collection. |
||
The defaultId configuration for the collection, for when a document does not specify an |
||
|
Overrides the keyspace where the collection is created. If not set, the database’s working keyspace is used. |
|
|
An alternative to |
|
|
The default |
|
|
Maximum time in milliseconds the client should wait for the operation to complete. |
Name | Type | Summary |
---|---|---|
|
|
The name of the collection. |
|
|
The dimension for the vectors in the collection. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces. |
|
|
The similarity metric to use for vector search: |
|
|
Fine-grained settings with vector, embedding provider, model name, authentication, selective indexing, and |
|
|
Working with specialized beans for the collection and not the default |
Name | Type | Summary |
---|---|---|
|
|
The Data API command to create a collection in a Serverless (Vector) database. It acts as a container for all the attributes and settings required to create the collection. |
|
|
The name of the new collection. This must be unique within the database specified in the request URL. |
|
|
(Optional) Controls how the Data API allocates an`_id` for each document that doesn’t specify an ID value in the request. For backwards compatibility with Data API releases before version 1.0.3, if you omit a |
|
|
If you include |
|
|
(Optional, recommended) Creates a vector-enabled collection. Vector-enabled collections can store either vector or non-vector data. Collections that aren’t vector-enabled can’t store vector data. |
|
|
The dimension for vector embeddings in the collection.
If you’re not sure what dimension to set, use the dimension vector your embeddings model produces.
This can be optional for vectorize, if the specified |
|
|
The similarity metric to use for vector search: |
|
|
(Optional) Configure a vectorize embedding provider integration. |
|
|
The vectorize embedding provider name. |
|
|
A valid model name for the specified vectorize embedding provider. |
|
|
Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider.
In Alternatively, you can omit the |
|
|
Your embedding provider might require additional parameters. Use findEmbeddingProviders or see the documentation for your embedding provider integration. |
|
|
(Optional) Enable selective indexing for data loaded to the collection.
If you specify |
|
|
Either |
|
|
Either |
Examples
The following examples demonstrate how to create a collection.
-
Python
-
TypeScript
-
Java
-
curl
Create a collection that is not vector-enabled:
collection = database.create_collection("COLLECTION_NAME")
Create a collection to store vector data and provide embeddings when you load data:
from astrapy.constants import VectorMetric
collection = database.create_collection(
"COLLECTION_NAME",
dimension=5,
metric=VectorMetric.COSINE,
)
Create a new collection that generates vector embeddings automatically with vectorize.
To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.
As an alternative to Astra KMS authentication, you can do one of the following:
-
Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.
-
Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see Vector and vectorize and the explanation of the
embedding_api_key
parameter in this command’s Parameters.
from astrapy.info import CollectionVectorServiceOptions
from astrapy.constants import VectorMetric
collection = database.create_collection(
"COLLECTION_NAME",
metric=VectorMetric.DOT_PRODUCT,
dimension=1536,
service=CollectionVectorServiceOptions(
provider="openai",
model_name="text-embedding-3-small",
authentication={
"providerKey": "API_KEY_NAME",
},
),
)
Create a new collection with default document IDs of type ObjectID
:
from astrapy.constants import DefaultIdType
collection = database.create_collection(
"COLLECTION_NAME",
default_id_type=DefaultIdType.OBJECTID,
)
Create a new collection with selective indexing:
collection = database.create_collection(
"COLLECTION_NAME",
indexing={"allow": ["city", "country"]},
)
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
# Create a non-vector collection
collection_simple = database.create_collection("NON_VECTOR_COLLECTION_NAME")
# Create a vector collection
collection_vector = database.create_collection(
"VECTOR_COLLECTION_NAME",
dimension=3,
metric=astrapy.constants.VectorMetric.COSINE,
)
# Create a collection with UUIDv6 as default IDs
from astrapy.constants import DefaultIdType, SortDocuments
collection_uuid6 = database.create_collection(
"UUIDV6_COLLECTION_NAME",
default_id_type=DefaultIdType.UUIDV6,
)
collection_uuid6.insert_one({"desc": "a document", "seq": 0})
collection_uuid6.insert_one({"_id": 123, "desc": "another", "seq": 1})
doc_ids = [
doc["_id"]
for doc in collection_uuid6.find({}, sort={"seq": SortDocuments.ASCENDING})
]
print(doc_ids)
# Will print: [UUID('1eef29eb-d587-6779-adef-45b95ef13497'), 123]
print(doc_ids[0].version)
# Will print: 6
const collection = await db.createCollection('COLLECTION_NAME');
Create a new collection to store vector data.
const collection = await db.createCollection<Schema>('COLLECTION_NAME', {
vector: {
dimension: 5,
metric: 'cosine',
},
});
Create a new collection that generates vector embeddings automatically.
To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.
As an alternative to Astra KMS authentication, you can do one of the following:
-
Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.
-
Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see the explanation of the
embeddingApiKey
optional parameter in the Options table and Vector and vectorize.
const collection = await db.createCollection<Schema>('COLLECTION_NAME', {
vector: {
dimension: 1536,
metric: 'dot_product',
service: {
provider: 'openai',
modelName: 'text-embedding-3-small',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
});
Example:
import { DataAPIClient, VectorDoc } from '@datastax/astra-db-ts';
// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');
// Define the schema for the collection
interface User extends VectorDoc {
name: string,
age?: number,
}
(async function () {
// Create a basic untyped non-vector collection
const users1 = await db.createCollection('users');
await users1.insertOne({ name: 'John' });
// Typed collection with custom options in a non-default keyspace
const users2 = await db.createCollection<User>('users', {
keyspace: 'KEYSPACE_NAME',
defaultId: {
type: 'objectId',
},
vector: {
dimension: 5,
metric: 'cosine',
},
});
await users2.insertOne({ name: 'John' }, { sort: { $vector: [.12, .62, .87, .16, .72] } });
})();
See also:
Create a collection to store vector data.
Based on the collection parameters, you can provide embeddings when you load data or automatically generate embeddings with vectorize.
To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.
As an alternative to Astra KMS authentication, you can do one of the following:
-
Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.
-
Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see the explanation of the
collectionOptions
parameter in the Parameters table and Vector and vectorize.
// Given `db` Database object, create a new collection
// Create simple collection with given name.
Collection<Document> simple1 = db
.createCollection(String collectionName);
Collection<MyBean> simple2 = db
.createCollection(String collectionName, Class<MyBean> clazz);
// Create collections with vector options
Collection<Document> vector1 = createCollection(
String collectionName,
int dimension,
SimilarityMetric metric);
Collection<MyBean> vector2 = createCollection(
String collectionName,
int dimension,
SimilarityMetric metric,
Class<MyBean> clazz);
// Full-Fledged CollectionOptions with a builder
Collection<Document> full1 = createCollection(
String collectionName,
CollectionOptions collectionOptions);
Collection<MyBean> full2 = createCollection(
String collectionName,
CollectionOptions collectionOptions,
Class<MyBean> clazz);
Example:
package com.datastax.astra.client.database;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.CollectionIdTypes;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.SimilarityMetric;
public class CreateCollection {
public static void main(String[] args) {
Database db = new Database(
System.getenv("ASTRA_DB_API_ENDPOINT"),
System.getenv("ASTRA_DB_APPLICATION_TOKEN"));
// Create a non-vector collection
Collection<Document> simple1 = db.createCollection("col");
// Default Id Collection
Collection<Document> defaultId = db.createCollection("defaultId", CollectionOptions
.builder()
.defaultIdType(CollectionIdTypes.OBJECT_ID)
.build());
// -- Indexing
Collection<Document> indexingDeny = db.createCollection("indexing1", CollectionOptions
.builder()
.indexingDeny("blob")
.build());
// Create a collection with indexing (allow) - cannot use allow and denay at the same time
Collection<Document> indexingAllow = db.createCollection("allow1", CollectionOptions
.builder()
.indexingAllow("metadata")
.build());
// Vector
Collection<Document> vector1 = db.createCollection("vector1", 14, SimilarityMetric.DOT_PRODUCT);
// Create a vector collection
Collection<Document> vector2 = db.createCollection("vector2", CollectionOptions
.builder()
.vectorDimension(1536)
.vectorSimilarity(SimilarityMetric.EUCLIDEAN)
.build());
// Create a collection for the db
Collection<Document> collection_vectorize_header = db.createCollection(
"collection_vectorize_header",
// Create collection with a Service in vectorize (No API KEY)
CollectionOptions.builder()
.vectorDimension(1536)
.vectorSimilarity(SimilarityMetric.DOT_PRODUCT)
.vectorize("openai", "text-embedding-ada-002")
.build());
// Create a collection for the db
Collection<Document> collection_vectorize_shared_key = db.createCollection(
"collection_vectorize_shared_key",
// Create collection with a Service in vectorize (No API KEY)
CollectionOptions.builder()
.vectorDimension(1536)
.vectorSimilarity(SimilarityMetric.DOT_PRODUCT)
.vectorize("openai", "text-embedding-ada-002", "OPENAI_API_KEY" )
.build());
}
}
Create a collection that isn’t vector-enabled:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {}
}
}' | jq
Create a vector-enabled collection where you plan to provide embeddings when you load data.
This example also sets the defaultID
type for documents loaded into the collection.
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"defaultId": {
"type": "uuidv7"
},
"vector": {
"dimension": 5,
"metric": "cosine"
}
}
}
}' | jq
Create a vector-enabled collection that automatically generates embeddings with vectorize.
To automatically generate embeddings, you must enable the corresponding embedding provider integration, add the embedding provider API key in the Astra KMS, and make sure your database can access the embedding provider service. You can use the Data API to find supported embedding providers and their configuration parameters.
As an alternative to Astra KMS authentication, you can do one of the following:
-
Use the Astra-hosted NVIDIA embedding provider integration, if your database meets the cloud provider and region requirements.
-
Use header authentication to manually provide the embedding provider credentials with every request that requires embedding generation, including loading data and vector search with vectorize. For more information, see the explanation for
options.vector.service.authentication
in the Parameters table and Vector and vectorize.
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"vector": {
"dimension": 1536,
"metric": "cosine",
"service": {
"provider": "openai",
"modelName": "text-embedding-3-small",
"authentication": {
"providerKey": "ASTRA_KMS_API_KEY_NAME"
}
}
}
}
}
}' | jq
Client reference
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the client reference.
For more information, see the client reference.
For more information, see the client reference.
Client reference documentation is not applicable for HTTP.