Create a collection
Creates a new collection in a Serverless (Vector) database.
Method signature
-
Python
-
TypeScript
-
Java
-
curl
The signature of this command changed in Python client version 2.0. If you are using an earlier version, DataStax recommends upgrading to the latest version. For more information, see Data API client upgrade guide. |
The following method belongs to the astrapy.Database
class.
create_collection(
name: str,
*,
definition: CollectionDefinition | dict[str, Any] | None,
document_type: type[Any],
keyspace: str,
collection_admin_timeout_ms: int,
embedding_api_key: str | EmbeddingHeadersProvider,
spawn_api_options: APIOptions,
) -> Collection
Most See the AsyncCollection client reference for details about the async API. |
The following method belongs to the Db
class.
async createCollection<Schema extends SomeDoc = SomeDoc>(
name: string,
options?: {
vector?: CollectionVectorOptions,
indexing?: CollectionIndexingOptions<Schema>,
defaultId?: CollectionDefaultIdOptions,
lexical?: CollectionLexicalOptions,
rerank?: CollectionRerankOptions,
logging?: DataAPILoggingConfig,
keyspace?: string,
embeddingApiKey?: string | EmbeddingHeadersProvider,
serdes?: CollectionSerDesConfig,
timeoutDefaults?: TimeoutDescriptor,
timeout?: number | TimeoutDescriptor,
}
): Collection<Schema>
The following methods belong to the com.datastax.astra.client.Database
class.
Collection<Document> createCollection(String collectionName)
Collection<Document> createCollection(
String collectionName,
CollectionDefinition collectionDefinition
)
Collection<Document> createCollection(
String collectionName,
CollectionDefinition collectionDefinition,
CreateCollectionOptions options
)
<T> Collection<T> createCollection(
String collectionName,
Class<T> documentClass
)
<T> Collection<T> createCollection(
String collectionName,
CollectionDefinition collectionDefinition,
Class<T> documentClass
)
<T> Collection<T> createCollection(
String collectionName,
CollectionDefinition collectionDefinition,
Class<T> documentClass,
CreateCollectionOptions options
)
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": OPTIONS
}
}'
Result
-
Python
-
TypeScript
-
Java
-
curl
Creates a collection with the specified parameters.
Returns a Collection
object.
You can use this object to work with documents in the collection.
Unless you specify the document_type
parameter, the collection is typed as Collection[dict]
.
For more information, see Typing support.
Example response:
Collection(name="COLLECTION_NAME", keyspace="default_keyspace", database.api_endpoint="ASTRA_DB_API_ENDPOINT", api_options=FullAPIOptions(token=StaticTokenProvider("APPLICATION_TOKEN"...), ...))
Creates a collection with the specified parameters.
Returns a promise that resolves to a Collection<Schema>
object.
You can use this object to work with documents in the collection.
A Collection
is typed as Collection<Schema>
, where Schema
defaults to SomeDoc
(Record<string, any>
).
Providing the specific Schema
type enables stronger typing for collection operations.
For more information, see Typing Collections and Tables.
Creates a collection with the specified parameters.
Returns a Collection
object.
You can use this object to work with documents in the collection.
Creates a collection with the specified parameters.
If the command succeeds, the response indicates the success.
Example response:
{
"status": {
"ok": 1
}
}
Parameters
The required and valid parameters depend on whether the collection will store vector data and your embedding generation method. For more information, see Manage collections and tables.
You can’t edit a collection’s definition after you create the collection. |
-
Python
-
TypeScript
-
Java
-
curl
Name | Type | Summary |
---|---|---|
|
|
The name of the collection to create. |
|
The full configuration for the collection. See the You can define Plain Python dictionaries can be passed for |
|
|
|
Optional.
A formal specifier for the type checker.
If provided, Default: |
|
|
The keyspace in which to create the collection. Default: The general keyspace setting for the database. |
|
|
A timeout, in milliseconds, to impose on the underlying API request.
If not provided, the corresponding |
|
|
Optional. This only applies to collections with a vectorize embedding provider integration. This secret is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize. |
|
A complete or partial specification of the APIOptions to override the defaults inherited from the |
Name | Type | Summary |
---|---|---|
|
Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. See the examples below. |
|
|
Optional. The lexical search configuration for the collection. The
Only collections in databases in the AWS See the examples below. Default: A |
|
|
Optional. The reranker configuration for the collection. The
Only collections in databases in the AWS See the examples below. Default: A |
|
|
|
The selective indexing configuration for the collection. See the examples below. Default: All fields of all documents. |
|
The defaultId configuration for the collection.
This is used when you insert a document without an See the examples below. |
Name | Type | Summary |
---|---|---|
|
|
The name of the collection to create. |
Options (CreateCollectionOptions
):
Name | Type | Summary |
---|---|---|
The vector configuration for the collection, e.g. vector dimension & similarity metric. Required to support vector search. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces. |
||
Optional. The lexical search configuration for the collection. The
Only collections in databases in the AWS See the examples below. Default: A |
||
Optional. The reranker configuration for the collection. The
Only collections in databases in the AWS See the examples below. Default: A |
||
The selective indexing configuration for the collection. |
||
The defaultId configuration for the collection, for when a document does not specify an |
||
|
An alternative to Provides the API key directly via headers instead of using an API key in the Astra DB KMS.
|
|
|
Overrides the keyspace where the collection is created. If not set, the database’s working keyspace is used. |
|
|
The configuration for logging events emitted by the DataAPIClient. |
|
|
The configuration for logging events emitted by the DataAPIClient. For more information, see Custom Ser/Des |
|
|
Optional. The default timeout(s) to apply to operations performed on this Collection instance.
You can specify Details about the
|
|
|
|
Optional. The timeout to apply to this method. Only Default: 60 seconds, unless you specified a different default along the Options Hierarchy. |
Name | Type | Summary |
---|---|---|
|
|
The name of the collection. |
|
Settings for the collection, including vector options, the default ID format, and indexing options. |
|
|
Options for the operation, including the keyspace. |
|
|
|
Working with specialized beans for the collection and not the default |
Name | Type | Summary |
---|---|---|
|
|
The Data API command to create a collection in a Serverless (Vector) database. It acts as a container for all the attributes and settings required to create the collection. |
|
|
The name of the new collection. This must be unique within the database specified in the request URL. |
|
|
Optional. Controls how the Data API allocates an`_id` for each document that doesn’t specify an ID value in the request. For backwards compatibility with Data API releases before version 1.0.3, if you omit a |
|
|
If you include |
|
|
Optional. Recommended. Creates a vector-enabled collection. Vector-enabled collections can store either vector or non-vector data. Collections that aren’t vector-enabled can’t store vector data. |
|
|
The dimension for vector embeddings in the collection.
If you’re not sure what dimension to set, use the dimension vector your embeddings model produces.
This can be optional for vectorize, if the specified |
|
|
The similarity metric to use for vector search: |
|
|
Optional. Configure a vectorize embedding provider integration. |
|
|
The vectorize embedding provider name. |
|
|
A valid model name for the specified vectorize embedding provider. |
|
|
Optional. Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider.
In Alternatively, you can omit the |
|
|
Optional. Your embedding provider might require additional parameters. Use findEmbeddingProviders or see the documentation for your embedding provider integration. |
|
|
Optional. The lexical search configuration for the collection. Only collections in databases in the AWS |
|
|
Optional. Whether to enable lexical search for the collection. Required to support hybrid search. Default: True |
|
|
Optional. A string describing a built-in analyzer, or a JSON object describing an analyzer configuration. Strings must be one of: JSON objects must follow the specifications in Find data with CQL analyzers. Currently, only the standard lucene analyzer is supported.
This corresponds to the value of Default: |
|
|
Optional. The reranker configuration for the collection. Only collections in databases in the AWS |
|
|
Optional. Whether to enable reranking for the collection. Required to support hybrid search. Default: True |
|
|
Optional. A JSON object describing a reranker configuration. |
|
|
The name of the reranking provider.
Currently, only Default: |
|
|
The name of a reranking model supported by the reranking provider.
Currently, only Default: |
|
|
Optional. Enable selective indexing for data inserted to the collection.
If you specify |
|
|
Either |
|
|
Either |
Examples
The following examples demonstrate how to create a collection.
Create a collection that is not vector-enabled
-
Python
-
TypeScript
-
Java
-
curl
from astrapy import DataAPIClient
# Get a database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection = database.create_collection("COLLECTION_NAME")
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User {
name: string,
age?: number,
}
// Create a collection
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME");
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Create a collection
(async function () {
const collection = await database.createCollection("COLLECTION_NAME");
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
Collection<Document> collection = database.createCollection("COLLECTION_NAME");
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {}
}
}'
Create a collection can store vector embeddings
Collections that are vector-enabled can store vector embeddings and work with vector search.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CollectionDefinition, CollectionVectorOptions
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
dimension=1024,
metric=VectorMetric.COSINE,
),
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
collection_definition = (
CollectionDefinition.builder()
.set_vector_dimension(1024)
.set_vector_metric(VectorMetric.COSINE)
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
You can define $vector
as an inline field in your interfaces, or you can extend the utility VectorDoc
type provided by the client.
import { DataAPIClient, VectorDoc } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User extends VectorDoc {
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
The $vector
field must still be number[]
or DataAPIVector
, or type-related issues will occur.
Consider using a type like VectorDoc & SomeDoc
which allows the documents to remain untyped, but still statically requires the $vector
field to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.vectorDimension(1024)
.vectorSimilarity(SimilarityMetric.COSINE);
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"vector": {
"dimension": 1024,
"metric": "cosine"
}
}
}
}'
Create a collection that can automatically generate vector embeddings
If you want to automatically generate vector embeddings, create a vector-enabled collection and configure an embedding provider integration for the collection.
The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.
You can also store pre-generated vector embeddings in the collection. If you store pre-generated and automatically generated embeddings in the same collection, make sure all embeddings have the same provider, model, and dimensions. Mismatched embeddings can cause inaccurate vector searches.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CollectionDefinition,
CollectionVectorOptions,
VectorServiceOptions,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
metric=VectorMetric.SIMILARITY_METRIC,
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="PROVIDER",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters=PARAMETERS,
)
)
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_vector_dimension(MODEL_DIMENSIONS)
.set_vector_metric(VectorMetric.SIMILARITY_METRIC)
.set_vector_service(
provider="PROVIDER",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters=PARAMETERS,
)
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
You can define $vector
and $vectorize
as inlines fields in your interfaces, or you can extend the utility VectorizeDoc
types provided by the client.
import { DataAPIClient, VectorizeDoc } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User extends VectorizeDoc {
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
vector: {
dimension: MODEL_DIMENSIONS,
metric: "SIMILARITY_METRIC",
service: {
provider: "PROVIDER",
modelName: "MODEL_NAME",
authentication: {
providerKey: "API_KEY_NAME",
},
parameters: PARAMETERS,
},
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
The $vector
field must still be number[]
or DataAPIVector
, and the $vectorize
field must still be a string, or type-related issues will occur.
Consider using a type like VectorizeDoc & SomeDoc
which allows the documents to remain untyped, but still statically requires the $vector
and $vectorize
fields to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
vector: {
dimension: MODEL_DIMENSIONS,
metric: "SIMILARITY_METRIC",
service: {
provider: "PROVIDER",
modelName: "MODEL_NAME",
authentication: {
providerKey: "API_KEY_NAME",
},
parameters: PARAMETERS,
},
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.vectorDimension(MODEL_DIMENSIONS)
.vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
.vectorize(
"PROVIDER",
"MODEL_NAME",
"API_KEY_NAME",
PARAMETERS
);
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"vector": {
"dimension": MODEL_DIMENSIONS,
"metric": "SIMILARITY_METRIC",
"service": {
"provider": "PROVIDER",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": PARAMETERS
}
}
}
}
}'
Create a collection that supports hybrid search
If you want to perform hybrid search on your collection, you must create a collection that has vector, lexical, and rerank enabled.
Your collection must also be in a database in the AWS us-east-2
region.
Lexical and rerank are enabled by default when you create a collection in a database in the AWS us-east-2
region, but you can optionally configure the lexical analyzer and the reranker model.
For configuration details about the lexical analyzer, see Find data with CQL analyzers. Currently, only the standard lucene analyzer is supported.
For configuration details about the reranker model, inspect the available reranker models. Currently, only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model reranker model is supported.
For configuration details about vector, see Create a collection can store vector embeddings and Create a collection that can automatically generate vector embeddings.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CollectionDefinition,
CollectionLexicalOptions,
CollectionRerankOptions,
CollectionVectorOptions,
RerankServiceOptions,
VectorServiceOptions,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
metric=VectorMetric.COSINE,
dimension=1024,
service=VectorServiceOptions(
provider="nvidia",
model_name="NV-Embed-QA",
)
),
lexical=CollectionLexicalOptions(
analyzer="standard",
enabled=True,
),
rerank=CollectionRerankOptions(
enabled=True,
service=RerankServiceOptions(
provider="nvidia",
model_name="nvidia/llama-3.2-nv-rerankqa-1b-v2",
),
),
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_vector_dimension(1024)
.set_vector_metric(VectorMetric.COSINE)
.set_vector_service(
provider="nvidia",
model_name="NV-Embed-QA",
)
.set_lexical("standard", enabled=True)
.set_rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2")
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
You can define $vector
, $vectorize
, and $lexical
as inlines fields in your interfaces, or you can extend the utility VectorDoc
, VectorizeDoc
, and LexicalDoc
types provided by the client.
import { DataAPIClient, LexicalDoc, VectorizeDoc } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User extends VectorizeDoc, LexicalDoc {
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
service: {
provider: "nvidia",
modelName: "NV-Embed-QA",
},
},
lexical: {
enabled: true,
analyzer: "STANDARD",
},
rerank: {
enabled: true,
service: {
provider: "nvidia",
modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
},
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
The $vector
field must still be number[]
or DataAPIVector
, and the $vectorize
and $lexical
fields must still be a string, or type-related issues will occur.
Consider using a type like VectorDoc & LexicalDoc & SomeDoc
or VectorizeDoc & LexicalDoc & SomeDoc
which allows the documents to remain untyped, but still statically requires the $vector
, $vectorize
, and $lexical
to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
service: {
provider: "nvidia",
modelName: "NV-Embed-QA",
},
},
lexical: {
enabled: true,
analyzer: "STANDARD",
},
rerank: {
enabled: true,
service: {
provider: "nvidia",
modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
},
},
});
})();
The Java client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
package com.examples;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.rerank.CollectionRerankOptions;
import com.datastax.astra.client.core.rerank.RerankServiceOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vector.VectorOptions;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition();
// Vector Options
VectorServiceOptions vectorService = new VectorServiceOptions()
.provider( "nvidia")
.modelName("NV-Embed-QA");
VectorOptions vectorOptions = new VectorOptions()
.dimension(1536)
.metric(SimilarityMetric.COSINE.getValue())
.service(vectorService);
def.vector(vectorOptions);
// Lexical Options
LexicalOptions lexicalOptions = new LexicalOptions()
.enabled(true)
.analyzer(new Analyzer(STANDARD));
def.lexical(lexicalOptions);
// Rerank Options
RerankServiceOptions rerankService = new RerankServiceOptions()
.modelName("nvidia/llama-3.2-nv-rerankqa-1b-v2")
.provider("nvidia");
CollectionRerankOptions rerankOptions = new CollectionRerankOptions()
.enabled(true)
.service(rerankService);
def.rerank(rerankOptions);
database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
package com.examples;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.rerank.CollectionRerankOptions;
import com.datastax.astra.client.core.rerank.RerankServiceOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vector.VectorOptions;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
database.createCollection("COLLECTION_NAME",
new CollectionDefinition()
.vector(1536, SimilarityMetric.COSINE)
.vectorize("nvidia", "NV-Embed-QA")
.lexical(STANDARD)
.rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2"));
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"lexical": {
"analyzer": "standard",
"enabled": true
},
"rerank": {
"enabled": true,
"service": {
"modelName": "nvidia/llama-3.2-nv-rerankqa-1b-v2",
"provider": "nvidia"
}
},
"vector": {
"dimension": 1024,
"metric": "cosine",
"service": {
"provider": "nvidia",
"modelName": "NV-Embed-QA"
}
}
}
}
}'
Create a collection and specify the default ID format
For more information about the default ID format, see Document IDs.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.info import (
CollectionDefinition,
CollectionDefaultIDOptions,
)
from astrapy.constants import DefaultIdType
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
default_id=CollectionDefaultIDOptions(DefaultIdType.OBJECTID),
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import DefaultIdType
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_default_id(DefaultIdType.OBJECTID)
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
The _id
field type should match the defaultId
type.
import { DataAPIClient, ObjectId } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User {
_id: ObjectId,
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
defaultId: {
type: "objectId",
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
However, if you later specify _id
when you insert a document, DataStax recommends that it has the same type as the defaultId
.
Consider using a type like { id: ObjectId } & SomeDoc
which allows the documents to remain untyped, but still statically requires the _id
field to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
defaultId: {
type: "objectId",
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefaultIdTypes;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.defaultId(CollectionDefaultIdTypes.OBJECT_ID);
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"defaultId": {
"type": "uuidv7"
}
}
}
}'
Create a collection and specify which fields to index
For more information about selective indexing, see Indexes in collections.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
indexing={"allow": ["city", "country"]},
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_indexing("allow", ["city", "country"])
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
indexing: {
allow: ["city", "country"],
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.indexingAllow("city", "country");
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"indexing": {
"allow": ["city", "country"]
}
}
}
}'
Create a collection and specify which fields shouldn’t be indexed
For more information about selective indexing, see Indexes in collections.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
indexing={"deny": ["city", "country"]},
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_indexing("deny", ["city", "country"])
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
indexing: {
deny: ["city", "country"],
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.collections.CollectionDefaultIdTypes;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.indexingDeny("city", "country");
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"indexing": {
"deny": ["city", "country"]
}
}
}
}'
Client reference
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the client reference.
For more information, see the client reference.
For more information, see the client reference.
Client reference documentation is not applicable for HTTP.