Create a collection
Creates a new collection in a Serverless (Vector) database.
Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart. |
Result
-
Python
-
TypeScript
-
Java
-
curl
Creates a collection with the specified parameters.
Returns a Collection
object.
You can use this object to work with documents in the collection.
Unless you specify the document_type
parameter, the collection is typed as Collection[dict]
.
For more information, see Typing support.
Example response:
Collection(name="COLLECTION_NAME", keyspace="default_keyspace", database.api_endpoint="ASTRA_DB_API_ENDPOINT", api_options=FullAPIOptions(token=StaticTokenProvider("APPLICATION_TOKEN"...), ...))
Creates a collection with the specified parameters.
Returns a promise that resolves to a Collection<Schema>
object.
You can use this object to work with documents in the collection.
A Collection
is typed as Collection<Schema>
, where Schema
defaults to SomeDoc
(Record<string, any>
).
Providing the specific Schema
type enables stronger typing for collection operations.
For more information, see Typing Collections and Tables.
Creates a collection with the specified parameters.
Returns a Collection
object.
You can use this object to work with documents in the collection.
Creates a collection with the specified parameters.
If the command succeeds, the response indicates the success.
Example response:
{
"status": {
"ok": 1
}
}
Parameters
You can’t edit a collection’s definition after you create the collection. |
-
Python
-
TypeScript
-
Java
-
curl
The signature of this method changed in Python client version 2.0. If you are using an earlier version, DataStax recommends upgrading to the latest version. For more information, see Data API client upgrade guide. |
Use the create_collection
method, which belongs to the astrapy.Database
class.
Method signature
create_collection(
name: str,
*,
definition: CollectionDefinition | dict[str, Any] | None,
document_type: type[Any],
keyspace: str,
collection_admin_timeout_ms: int,
embedding_api_key: str | EmbeddingHeadersProvider,
spawn_api_options: APIOptions,
) -> Collection
Most astrapy
objects have an asynchronous counterpart, for use within the asyncio
framework.
To get an AsyncCollection
, use the create_collection
method of instances of AsyncDatabase
, or alternatively the to_async
method of the synchronous Collection
class.
See the AsyncCollection client reference for details about the async API.
Name | Type | Summary |
---|---|---|
|
|
The name of the new collection. |
|
The full configuration for the collection. See the You can define Plain Python dictionaries can be passed for |
|
|
|
Optional.
A formal specifier for the type checker.
If provided, Default: |
|
|
The keyspace in which to create the collection. Default: The working keyspace for the database. |
|
|
A timeout, in milliseconds, to impose on the underlying API request.
If not provided, the corresponding |
|
|
Optional. This only applies to collections with a vectorize embedding provider integration. Use this option to provide the API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize. |
|
A complete or partial specification of the APIOptions to override the defaults inherited from the |
Name | Type | Summary |
---|---|---|
|
Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. See the example with a vector service and example without a vector service for usage. |
|
|
Optional. The lexical search configuration for the collection. Only collections in databases in the AWS The
See the example for usage. Default: A |
|
|
Optional. The reranker configuration for the collection. Only collections in databases in the AWS The
See the example for usage. Default: A |
|
|
|
Optional. The selective indexing configuration for the collection. See the example to specify which fields to index and the example to specify which fields to not index for usage. Default: All fields of all documents. |
|
Optional.
Specifies the default ID type for documents in the collection.
This is used when you insert a document without an Can be one of:
See the example for usage. For more information, see Document IDs. Default: |
Use the createCollection
method, which belongs to the Db
class.
Method signature
async createCollection<Schema extends SomeDoc = SomeDoc>(
name: string,
options?: {
vector?: CollectionVectorOptions,
indexing?: CollectionIndexingOptions<Schema>,
defaultId?: CollectionDefaultIdOptions,
lexical?: CollectionLexicalOptions,
rerank?: CollectionRerankOptions,
logging?: DataAPILoggingConfig,
keyspace?: string,
embeddingApiKey?: string | EmbeddingHeadersProvider,
serdes?: CollectionSerDesConfig,
timeoutDefaults?: TimeoutDescriptor,
timeout?: number | TimeoutDescriptor,
}
): Collection<Schema>
Name | Type | Summary |
---|---|---|
|
|
The name of the new collection. |
|
Optional.
The options for this operation. See the |
Name | Type | Summary |
---|---|---|
Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. See the example with a vector service and example without a vector service for usage. |
||
Optional. The lexical search configuration for the collection. Only collections in databases in the AWS The
See the example for usage. Default: A |
||
Optional. The reranker configuration for the collection. Only collections in databases in the AWS The
See the example for usage. Default: A |
||
Optional. The selective indexing configuration for the collection. See the example to specify which fields to index and the example to specify which fields to not index for usage. Default: All fields of all documents. |
||
Optional.
Specifies the default ID type for documents in the collection.
This is used when you insert a document without an Can be one of:
See the example for usage. For more information, see Document IDs. Default: Each autogenerated |
||
|
Optional. This only applies to collections with a vectorize embedding provider integration. Use this option to provide the API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize. |
|
|
The keyspace in which to create the collection. Default: The working keyspace for the database. |
|
|
Optional. The configuration for logging events emitted by the DataAPIClient. |
|
|
Optional. The configuration for serialization/deserialization by the DataAPIClient. For more information, see Custom Ser/Des. |
|
|
Optional. The default timeout(s) to apply to operations performed on this Collection instance.
You can specify Details about the
|
|
|
|
Optional. The timeout to apply to this method. Only Default: 60 seconds, unless you specified a different default along the Options Hierarchy. |
Use the createCollection
method, which belongs to the com.datastax.astra.client.Database
class.
Method signature
Collection<Document> createCollection(String collectionName)
Collection<Document> createCollection(
String collectionName,
CollectionDefinition collectionDefinition
)
Collection<Document> createCollection(
String collectionName,
CollectionDefinition collectionDefinition,
CreateCollectionOptions options
)
<T> Collection<T> createCollection(
String collectionName,
Class<T> documentClass
)
<T> Collection<T> createCollection(
String collectionName,
CollectionDefinition collectionDefinition,
Class<T> documentClass
)
<T> Collection<T> createCollection(
String collectionName,
CollectionDefinition collectionDefinition,
Class<T> documentClass,
CreateCollectionOptions options
)
Name | Type | Summary |
---|---|---|
|
|
The name of the new collection. |
|
Settings for the collection, including vector options, the default ID format, and indexing options. |
|
|
Options for the operation, including the keyspace. |
|
|
|
Working with specialized beans for the collection and not the default |
Use the createCollection
command.
Command signature
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": OPTIONS
}
}'
Name | Type | Summary |
---|---|---|
|
|
The name of the new collection. |
|
|
Optional.
The options for this operation. See the |
Name | Type | Summary |
---|---|---|
|
|
Optional.
Specifies the default ID type for documents in the collection.
This is used when you insert a document without an Can be one of:
See the example for usage. For more information, see Document IDs. Default: Each autogenerated |
|
|
Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. The
See the example with a vector service and example without a vector service for usage. |
|
|
Optional. The lexical search configuration for the collection. Only collections in databases in the AWS The
See the example for usage. Default: An object with an |
|
|
Optional. The reranker configuration for the collection. Only collections in databases in the AWS The
See the example for usage. Default: An object with an |
|
|
Optional. Configures selective indexing for data inserted to the collection. The * * See the example to specify which fields to index and the example to specify which fields to not index for usage. Default: All fields of all documents. |
Examples
The following examples demonstrate how to create a collection.
Create a collection that is not vector-enabled
-
Python
-
TypeScript
-
Java
-
curl
from astrapy import DataAPIClient
# Get a database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection = database.create_collection("COLLECTION_NAME")
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User {
name: string,
age?: number,
}
// Create a collection
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME");
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Create a collection
(async function () {
const collection = await database.createCollection("COLLECTION_NAME");
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
Collection<Document> collection = database.createCollection("COLLECTION_NAME");
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {}
}
}'
Create a collection can store vector embeddings
Collections that are vector-enabled can store vector embeddings and work with vector search.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CollectionDefinition, CollectionVectorOptions
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
dimension=1024,
metric=VectorMetric.COSINE,
),
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
collection_definition = (
CollectionDefinition.builder()
.set_vector_dimension(1024)
.set_vector_metric(VectorMetric.COSINE)
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
You can define $vector
as an inline field in your interfaces, or you can extend the utility VectorDoc
type provided by the client.
import { DataAPIClient, VectorDoc } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User extends VectorDoc {
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
The $vector
field must still be number[]
or DataAPIVector
, or type-related issues will occur.
Consider using a type like VectorDoc & SomeDoc
which allows the documents to remain untyped, but still statically requires the $vector
field to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.vectorDimension(1024)
.vectorSimilarity(SimilarityMetric.COSINE);
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"vector": {
"dimension": 1024,
"metric": "cosine"
}
}
}
}'
Create a collection that can automatically generate vector embeddings
If you want to automatically generate vector embeddings, create a vector-enabled collection and configure an embedding provider integration for the collection.
The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.
You can also store pre-generated vector embeddings in the collection. If you store pre-generated and automatically generated embeddings in the same collection, make sure all embeddings have the same provider, model, and dimensions. Mismatched embeddings can cause inaccurate vector searches.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CollectionDefinition,
CollectionVectorOptions,
VectorServiceOptions,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
metric=VectorMetric.SIMILARITY_METRIC,
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="PROVIDER",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters=PARAMETERS,
)
)
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_vector_dimension(MODEL_DIMENSIONS)
.set_vector_metric(VectorMetric.SIMILARITY_METRIC)
.set_vector_service(
provider="PROVIDER",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters=PARAMETERS,
)
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
You can define $vector
and $vectorize
as inlines fields in your interfaces, or you can extend the utility VectorizeDoc
types provided by the client.
import { DataAPIClient, VectorizeDoc } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User extends VectorizeDoc {
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
vector: {
dimension: MODEL_DIMENSIONS,
metric: "SIMILARITY_METRIC",
service: {
provider: "PROVIDER",
modelName: "MODEL_NAME",
authentication: {
providerKey: "API_KEY_NAME",
},
parameters: PARAMETERS,
},
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
The $vector
field must still be number[]
or DataAPIVector
, and the $vectorize
field must still be a string, or type-related issues will occur.
Consider using a type like VectorizeDoc & SomeDoc
which allows the documents to remain untyped, but still statically requires the $vector
and $vectorize
fields to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
vector: {
dimension: MODEL_DIMENSIONS,
metric: "SIMILARITY_METRIC",
service: {
provider: "PROVIDER",
modelName: "MODEL_NAME",
authentication: {
providerKey: "API_KEY_NAME",
},
parameters: PARAMETERS,
},
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.vectorDimension(MODEL_DIMENSIONS)
.vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
.vectorize(
"PROVIDER",
"MODEL_NAME",
"API_KEY_NAME",
PARAMETERS
);
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"vector": {
"dimension": MODEL_DIMENSIONS,
"metric": "SIMILARITY_METRIC",
"service": {
"provider": "PROVIDER",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": PARAMETERS
}
}
}
}
}'
Create a collection that supports hybrid search
If you want to perform hybrid search on your collection, you must create a collection that has vector, lexical, and rerank enabled.
Your collection must also be in a database in the AWS us-east-2
region.
Lexical and rerank are enabled by default when you create a collection in a database in the AWS us-east-2
region, but you can optionally configure the lexical analyzer and the reranker model.
For configuration details about the lexical analyzer, see Find data with CQL analyzers. The following example uses a configuration suitable for English text.
For configuration details about the reranker model, inspect the available reranker models. Only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model reranker model is supported.
For configuration details about vector, see Create a collection can store vector embeddings and Create a collection that can automatically generate vector embeddings.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CollectionDefinition,
CollectionLexicalOptions,
CollectionRerankOptions,
CollectionVectorOptions,
RerankServiceOptions,
VectorServiceOptions,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
metric=VectorMetric.COSINE,
dimension=1024,
service=VectorServiceOptions(
provider="nvidia",
model_name="NV-Embed-QA",
)
),
lexical=CollectionLexicalOptions(
analyzer={
"tokenizer": {
"name": "standard",
"args": {}
},
"filters": [
{
"name": "lowercase"
},
{
"name": "stop"
},
{
"name": "porterstem"
},
{
"name": "asciifolding"
}
],
"charFilters": []
},
enabled=True,
),
rerank=CollectionRerankOptions(
enabled=True,
service=RerankServiceOptions(
provider="nvidia",
model_name="nvidia/llama-3.2-nv-rerankqa-1b-v2",
),
),
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_vector_dimension(1024)
.set_vector_metric(VectorMetric.COSINE)
.set_vector_service(
provider="nvidia",
model_name="NV-Embed-QA",
)
.set_lexical(
{
"tokenizer": {
"name": "standard",
"args": {}
},
"filters": [
{
"name": "lowercase"
},
{
"name": "stop"
},
{
"name": "porterstem"
},
{
"name": "asciifolding"
}
],
"charFilters": []
}
)
.set_rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2")
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
You can define $vector
, $vectorize
, and $lexical
as inlines fields in your interfaces, or you can extend the utility VectorDoc
, VectorizeDoc
, and LexicalDoc
types provided by the client.
import { DataAPIClient, LexicalDoc, VectorizeDoc } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User extends VectorizeDoc, LexicalDoc {
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
service: {
provider: "nvidia",
modelName: "NV-Embed-QA",
},
},
lexical: {
enabled: true,
analyzer: {
tokenizer: {
name: "standard",
args: {}
},
filters: [
{
name: "lowercase"
},
{
name: "stop"
},
{
name: "porterstem"
},
{
name: "asciifolding"
}
],
charFilters: []
},
},
rerank: {
enabled: true,
service: {
provider: "nvidia",
modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
},
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
The $vector
field must still be number[]
or DataAPIVector
, and the $vectorize
and $lexical
fields must still be a string, or type-related issues will occur.
Consider using a type like VectorDoc & LexicalDoc & SomeDoc
or VectorizeDoc & LexicalDoc & SomeDoc
which allows the documents to remain untyped, but still statically requires the $vector
, $vectorize
, and $lexical
to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
vector: {
dimension: 1024,
metric: "cosine",
service: {
provider: "nvidia",
modelName: "NV-Embed-QA",
},
},
lexical: {
enabled: true,
analyzer: {
tokenizer: {
name: "standard",
args: {}
},
filters: [
{
name: "lowercase"
},
{
name: "stop"
},
{
name: "porterstem"
},
{
name: "asciifolding"
}
],
charFilters: []
},
},
rerank: {
enabled: true,
service: {
provider: "nvidia",
modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
},
},
});
})();
The Java client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
package com.examples;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.rerank.CollectionRerankOptions;
import com.datastax.astra.client.core.rerank.RerankServiceOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vector.VectorOptions;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition();
// Vector Options
VectorServiceOptions vectorService = new VectorServiceOptions()
.provider( "nvidia")
.modelName("NV-Embed-QA");
VectorOptions vectorOptions = new VectorOptions()
.dimension(1024)
.metric(SimilarityMetric.COSINE.getValue())
.service(vectorService);
collectionDefinition.vector(vectorOptions);
// Lexical Options
Analyzer analyzer = new Analyzer()
.tokenizer(STANDARD.getValue())
.addFilter("lowercase")
.addFilter("stop")
.addFilter("porterstem")
.addFilter("asciifolding");
LexicalOptions lexicalOptions = new LexicalOptions()
.enabled(true)
.analyzer(analyzer);
collectionDefinition.lexical(lexicalOptions);
// Rerank Options
RerankServiceOptions rerankService = new RerankServiceOptions()
.modelName("nvidia/llama-3.2-nv-rerankqa-1b-v2")
.provider("nvidia");
CollectionRerankOptions rerankOptions = new CollectionRerankOptions()
.enabled(true)
.service(rerankService);
collectionDefinition.rerank(rerankOptions);
database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
package com.examples;
import com.datastax.astra.client.DataAPIClients;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
database.createCollection(
"COLLECTION_NAME",
new CollectionDefinition()
.vector(1024, SimilarityMetric.COSINE)
.vectorize("nvidia", "NV-Embed-QA")
.lexical(
new LexicalOptions()
.enabled(true)
.analyzer(
new Analyzer()
.tokenizer(STANDARD.getValue())
.addFilter("lowercase")
.addFilter("stop")
.addFilter("porterstem")
.addFilter("asciifolding")))
.rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2"));
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"lexical": {
"analyzer": {
"tokenizer": {
"name": "standard",
"args": {}
},
"filters": [
{
"name": "lowercase"
},
{
"name": "stop"
},
{
"name": "porterstem"
},
{
"name": "asciifolding"
}
],
"charFilters": []
},
"enabled": true
},
"rerank": {
"enabled": true,
"service": {
"modelName": "nvidia/llama-3.2-nv-rerankqa-1b-v2",
"provider": "nvidia"
}
},
"vector": {
"dimension": 1024,
"metric": "cosine",
"service": {
"provider": "nvidia",
"modelName": "NV-Embed-QA"
}
}
}
}
}'
Create a collection and specify the default ID format
For more information about the default ID format, see Document IDs. For allowed values, see the Parameters.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.info import (
CollectionDefinition,
CollectionDefaultIDOptions,
)
from astrapy.constants import DefaultIdType
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
default_id=CollectionDefaultIDOptions(DefaultIdType.OBJECTID),
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import DefaultIdType
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_default_id(DefaultIdType.OBJECTID)
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
-
Typed collections
-
Untyped collections
You can manually define a client-side type for your collection to help statically catch errors.
The _id
field type should match the defaultId
type.
import { DataAPIClient, ObjectId } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
// Define the type for the collection
interface User {
_id: ObjectId,
name: string,
age?: number,
}
(async function () {
const collection = await database.createCollection<User>("COLLECTION_NAME", {
defaultId: {
type: "objectId",
},
});
})();
If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.
However, if you later specify _id
when you insert a document, DataStax recommends that it has the same type as the defaultId
.
Consider using a type like { id: ObjectId } & SomeDoc
which allows the documents to remain untyped, but still statically requires the _id
field to have the correct type.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
defaultId: {
type: "objectId",
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefaultIdTypes;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.defaultId(CollectionDefaultIdTypes.OBJECT_ID);
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"defaultId": {
"type": "uuidv7"
}
}
}
}'
Create a collection and specify which fields to index
For more information about selective indexing, see Indexes in collections.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
indexing={"allow": ["city", "country"]},
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_indexing("allow", ["city", "country"])
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
indexing: {
allow: ["city", "country"],
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.indexingAllow("city", "country");
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"indexing": {
"allow": ["city", "country"]
}
}
}
}'
Create a collection and specify which fields shouldn’t be indexed
For more information about selective indexing, see Indexes in collections.
-
Python
-
TypeScript
-
Java
-
curl
The Python client supports multiple ways to create a collection:
-
You can define the collection parameters in a
CollectionDefinition
object and then create the collection from theCollectionDefinition
object. -
You can use a fluent interface to build the collection definition and then create the collection from the definition.
-
CollectionDefinition object
-
Fluent interface
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = CollectionDefinition(
indexing={"deny": ["city", "country"]},
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"ASTRA_DB_API_ENDPOINT",
token="ASTRA_DB_APPLICATION_TOKEN",
)
# Create a collection
collection_definition = (
CollectionDefinition.builder()
.set_indexing("deny", ["city", "country"])
.build()
)
collection = database.create_collection(
"COLLECTION_NAME",
definition=collection_definition,
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get a database
const client = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN");
const database = client.db("ASTRA_DB_API_ENDPOINT");
(async function () {
const collection = await database.createCollection("COLLECTION_NAME", {
indexing: {
deny: ["city", "country"],
},
});
})();
package com.examples;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.collections.CollectionDefaultIdTypes;
public class CreateCollection {
public static void main(String[] args) {
// Get a database
Database database = new DataAPIClient("ASTRA_DB_APPLICATION_TOKEN")
.getDatabase("ASTRA_DB_API_ENDPOINT");
// Create a collection
CollectionDefinition collectionDefinition = new CollectionDefinition()
.indexingDeny("city", "country");
Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
}
}
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createCollection": {
"name": "COLLECTION_NAME",
"options": {
"indexing": {
"deny": ["city", "country"]
}
}
}
}'
Client reference
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the client reference.
For more information, see the client reference.
For more information, see the client reference.
Client reference documentation is not applicable for HTTP.