Create a collection

Creates a new collection in a database.

Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart.

Result

Python
TypeScript
Java
curl

Creates a collection with the specified parameters.

Returns a Collection object. You can use this object to work with documents in the collection.

Unless you specify the document_type parameter, the collection is typed as Collection[dict]. For more information, see Typing support.

Creates a collection with the specified parameters.

Returns a promise that resolves to a Collection object. You can use this object to work with documents in the collection.

A Collection is typed as Collection<Schema>, where Schema defaults to SomeDoc (Record<string, any>). Providing the specific Schema type enables stronger typing for collection operations. For more information, see Typing collections and tables.

Creates a collection with the specified parameters.

Returns a Collection object.

You can use this object to work with documents in the collection.

Creates a collection with the specified parameters.

If the command succeeds, the response indicates the success.

Example successful response:

{
  "status": {
    "ok": 1
  }
}

Parameters

You cannot edit a collection’s definition after you create the collection.

Python
TypeScript
Java
curl

The signature of this method changed in Python client version 2.0.

If you are using an earlier version, DataStax recommends upgrading to the latest version. For more information, see Data API client upgrade guide.

Use the create_collection method, which belongs to the astrapy.Database class.

Method signature

create_collection(
  name: str,
  *,
  definition: CollectionDefinition | dict[str, Any] | None,
  document_type: type[Any],
  keyspace: str,
  collection_admin_timeout_ms: int,
  embedding_api_key: str | EmbeddingHeadersProvider,
  spawn_api_options: APIOptions,
) -> Collection

Most astrapy objects have an asynchronous counterpart for use within the asyncio framework. To get an AsyncCollection, use the create_collection method of instances of AsyncDatabase, or use the to_async method of the synchronous Collection class. For more information, see AsyncCollection.

Name Type Summary

Name	Type	Summary
`name`	`str`	The name of the new collection. Collection names must follow these rules: Can contain letters, numbers, and underscores Cannot exceed 48 characters Must be unique within the keyspace
`definition`	`CollectionDefinition`	Optional. The full configuration for the collection. See Properties of `CollectionDefinition` and Examples for more details.
`document_type`	`type`	Optional. A formal specifier for the type checker. If provided, `document_type` must match the type hint specified in the assignment. For more information, see Typing support. Default: `Collection[dict]`
`keyspace`	`str`	Optional. The keyspace in which to create the collection. Default: The working keyspace for the database.
`collection_admin_timeout_ms`	`int`	Optional. A timeout, in milliseconds, to impose on the underlying API request. If not provided, the corresponding `Database` defaults apply.
`embedding_api_key`	`str \| EmbeddingHeadersProvider`	Optional. This only applies to collections with a vectorize embedding provider integration. Use this option to provide the API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize.
`spawn_api_options`	`APIOptions`	Optional. A complete or partial specification of the APIOptions to override the defaults inherited from the `Database`. Use this to customize the interaction of the Python client with the collection. For example, you can change the serialization/deserialization options or default timeouts. If `APIOptions` is passed together with a named parameter such as a timeout, the latter takes precedence over the corresponding `spawn_api_options` setting.

name

str

The name of the new collection.

Collection names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

definition

CollectionDefinition

Optional. The full configuration for the collection. See Properties of CollectionDefinition and Examples for more details.

document_type

type

Optional. A formal specifier for the type checker. If provided, document_type must match the type hint specified in the assignment. For more information, see Typing support.

Default: Collection[dict]

keyspace

str

Optional. The keyspace in which to create the collection.

Default: The working keyspace for the database.

collection_admin_timeout_ms

int

Optional. A timeout, in milliseconds, to impose on the underlying API request. If not provided, the corresponding Database defaults apply.

embedding_api_key

str | EmbeddingHeadersProvider

Optional. This only applies to collections with a vectorize embedding provider integration.

Use this option to provide the API key directly with headers instead of using an API key in the Astra DB KMS.

The API key is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize.

spawn_api_options

APIOptions

Optional. A complete or partial specification of the APIOptions to override the defaults inherited from the Database. Use this to customize the interaction of the Python client with the collection. For example, you can change the serialization/deserialization options or default timeouts. If APIOptions is passed together with a named parameter such as a timeout, the latter takes precedence over the corresponding spawn_api_options setting.

Properties of `CollectionDefinition`
Name	Type	Summary
`vector`	`CollectionVectorOptions`	Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. See Create a collection that can automatically generate vector embeddings and <<example-vector> for usage.
`lexical`	`CollectionLexicalOptions`	Optional. The lexical search configuration for the collection. Only collections in databases in the AWS `us-east-2` region support this parameter. The `CollectionLexicalOptions` object has the following properties: `enabled` (boolean): Whether to enable lexical search for the collection. Use this to disable lexical search for the collection. Required to support hybrid search. `analyzer`: A string describing a built-in analyzer, or a JSON object describing an analyzer configuration. Strings must be one of the supported built-in analyzers. JSON objects must follow the specifications in Find data with CQL analyzers. See Create a collection that supports hybrid search for usage. Default: A `CollectionLexicalOptions` object with an `enabled` value of `True` and an `analyzer` value of `"standard"`, which corresponds to the standard Apache Lucene™ analyzer.
`rerank`	`CollectionRerankOptions`	Optional. The reranker configuration for the collection. Only collections in databases in the AWS `us-east-2` region support this parameter. The `CollectionRerankOptions` object has the following properties: `enabled` (boolean): Whether to enable reranking for the collection. Use this to disable reranking and hybrid search for the collection. Required to support hybrid search. `service` (`RerankServiceOptions`): Describes the provider and model name for a reranker model. Only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model is supported. See Create a collection that supports hybrid search for usage. Default: A `RerankServiceOptions` object with an `enabled` value of `True` and a `service` value corresponding to the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model. This means that reranking is enabled by default.
`indexing`	`dict`	Optional. The selective indexing configuration for the collection. You must use `&` to escape any `.` or `&` in field names in the indexing clause. You cannot use `&` to escape any other characters. Dot notation, which is used to reference nested fields, should not be escaped. See Create a collection and specify which fields to index and Create a collection and specify which fields shouldn’t be indexed for usage. Default: All fields of all documents.
`default_id`	`CollectionDefaultIDOptions`	Optional. Specifies the default ID type for documents in the collection. This is used when you insert a document without an `_id` field. Can be one of: `CollectionDefaultIDOptions(DefaultIdType.OBJECTID)`: Each autogenerated `_id` value is an `objectId` as provided by the `bson` library. `CollectionDefaultIDOptions(DefaultIdType.UUIDV7)`: Each autogenerated `_id` value is a version 7 UUID. This is designed as a replacement for version 1 time UUID, and it is recommended for use in new systems. `CollectionDefaultIDOptions(DefaultIdType.UUIDV6)`: Each autogenerated `_id` value is a version 6 UUID. This is field-compatible with version 1 time UUIDs, and it supports lexicographical sorting. `CollectionDefaultIDOptions(DefaultIdType.UUID)`: Each autogenerated `_id` value is a version 4 UUID. This type is analogous to the `uuid` type and functions in Apache Cassandra®. `CollectionDefaultIDOptions(DefaultIdType.DEFAULT)`: Each autogenerated `_id` value is a string form of a version 4 UUID. See Create a collection and specify the default ID format for usage. For more information, see Document IDs. Default: `CollectionDefaultIDOptions(DefaultIdType.DEFAULT)`

Use the createCollection method, which belongs to the Db class.

Method signature

async createCollection<Schema extends SomeDoc = SomeDoc>(
  name: string,
  options?: {
    vector?: CollectionVectorOptions,
    indexing?: CollectionIndexingOptions<Schema>,
    defaultId?: CollectionDefaultIdOptions,
    lexical?: CollectionLexicalOptions,
    rerank?: CollectionRerankOptions,
    logging?: DataAPILoggingConfig,
    keyspace?: string,
    embeddingApiKey?: string | EmbeddingHeadersProvider,
    serdes?: CollectionSerDesConfig,
    timeoutDefaults?: TimeoutDescriptor,
    timeout?: number | TimeoutDescriptor,
  }
): Collection<Schema>

Name Type Summary

Name	Type	Summary
`name`	`string`	The name of the new collection. Collection names must follow these rules: Can contain letters, numbers, and underscores Cannot exceed 48 characters Must be unique within the keyspace
`options`	`CreateCollectionOptions`	Optional. The options for this operation. See Properties of `options` for more details.

name

string

The name of the new collection.

Collection names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

options

CreateCollectionOptions

Optional. The options for this operation. See Properties of options for more details.

Properties of `options`
Name	Type	Summary
`vector`	`CollectionVectorOptions`	Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. See Create a collection that can automatically generate vector embeddings and Create a collection that can store vector embeddings for usage.
`lexical`	`CollectionLexicalOptions`	Optional. The lexical search configuration for the collection. Only collections in databases in the AWS `us-east-2` region support this parameter. The `CollectionLexicalOptions` object has the following properties: `enabled` (boolean): Whether to enable lexical search for the collection. Use this to disable lexical search for the collection. Required to support hybrid search. `analyzer`: A string describing a built-in analyzer, or a JSON object describing an analyzer configuration. Strings must be one of the supported built-in analyzers. JSON objects must follow the specifications in Find data with CQL analyzers. See Create a collection that supports hybrid search for usage. Default: A `CollectionLexicalOptions` object with `enabled: true` and `analyzer: "STANDARD"`, which corresponds to the standard Apache Lucene™ analyzer.
`rerank`	`CollectionRerankOptions`	Optional. The reranker configuration for the collection. Only collections in databases in the AWS `us-east-2` region support this parameter. The `CollectionRerankOptions` object has the following properties: `enabled` (boolean): Whether to enable reranking for the collection. Use this to disable reranking and hybrid search for the collection. Required to support hybrid search. `service`: a `RerankServiceOptions` object describing the provider and model name for a reranker model. Only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model is supported. See Create a collection that supports hybrid search for usage. Default: A `RerankServiceOptions` object with `enabled: true` and a `service` value corresponding to the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model. This means that reranking is enabled by default.
`indexing`	`CollectionIndexingOptions<Schema>`	Optional. The selective indexing configuration for the collection. You must use `&` to escape any `.` or `&` in field names in the indexing clause. You cannot use `&` to escape any other characters. Dot notation, which is used to reference nested fields, should not be escaped. See Create a collection and specify which fields to index and Create a collection and specify which fields shouldn’t be indexed for usage. Default: All fields of all documents.
`defaultId`	`CollectionDefaultIdOptions`	Optional. Specifies the default ID type for documents in the collection. This is used when you insert a document without an `_id` field. Can be one of: `{type: "objectId"}`: Each autogenerated `_id` value is an `objectId` as provided by the `bson` library. `{type: "uuidv7"}`: Each autogenerated `_id` value is a version 7 UUID. This is designed as a replacement for version 1 time UUID, and it is recommended for use in new systems. `{type: "uuidv6"}`: Each autogenerated `_id` value is a version 6 UUID. This is field-compatible with version 1 time UUIDs, and it supports lexicographical sorting. `{type: "uuid"}`: Each autogenerated `_id` value is a version 4 UUID. This type is analogous to the `uuid` type and functions in Apache Cassandra®. See Create a collection and specify the default ID format for usage. For more information, see Document IDs. Default: Each autogenerated `_id` value is a string form of a version 4 UUID
`embeddingApiKey`	`string \| EmbeddingHeadersProvider`	Optional. This only applies to collections with a vectorize embedding provider integration. Use this option to provide the API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the collection. It is useful when a vectorize service is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Auto-generate embeddings with vectorize.
`keyspace`	`string`	Optional. The keyspace in which to create the collection. Default: The working keyspace for the database.
`logging`	`string`	Optional. The configuration for logging events emitted by the DataAPIClient.
`serdes`	`string`	Optional. The configuration for serialization/deserialization by the DataAPIClient. For more information, see Custom Ser/Des.
`timeoutDefaults`	`TimeoutDescriptor`	Optional. The default timeout(s) to apply to operations performed on this Collection instance. You can specify `requestTimeoutMs`, `generalMethodTimeoutMs`, and `collectionAdminTimeoutMs`. Details about the `timeoutDefaults` parameter The default timeout options for any operation performed on this Collection instance. The `TimeoutDescriptor` object can contain these properties: `requestTimeoutMs` (`number`): The maximum time, in milliseconds, that the client should wait for each underlying HTTP request. Default: 10 seconds. `generalMethodTimeoutMs` (`number`): The maximum time, in milliseconds, that the whole operation, which may involve multiple HTTP requests, can take. Default: 30 seconds. `collectionAdminTimeoutMs` (`number`): The maximum time, in milliseconds, for collection admin operations like creating, dropping, and listing collections. Default: 60 seconds.
`timeout`	`number` \| `TimeoutDescriptor`	Optional. The timeout to apply to this method. Only `collectionAdminTimeoutMs` applies to this method. This is the maximum time, in milliseconds, for collection admin operations like creating, dropping, and listing collections. Default: 60 seconds, unless you specified a different default along the Options Hierarchy.

Use the createCollection method, which belongs to the com.datastax.astra.client.Database class.

Method signature

Collection<Document> createCollection(String collectionName)

Collection<Document> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition
)

Collection<Document> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition,
  CreateCollectionOptions options
)

<T> Collection<T> createCollection(
  String collectionName,
  Class<T> documentClass
)

<T>  Collection<T> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition,
  Class<T> documentClass
)

<T> Collection<T> createCollection(
  String collectionName,
  CollectionDefinition collectionDefinition,
  Class<T> documentClass,
  CreateCollectionOptions options
)

Name Type Summary

Name	Type	Summary
`collectionName`	`String`	The name of the new collection. Collection names must follow these rules: Can contain letters, numbers, and underscores Cannot exceed 48 characters Must be unique within the keyspace
`collectionDefinition`	`CollectionDefinition`	Settings for the collection, including vector options, the default ID format, and indexing options.
`options`	`CreateCollectionOptions`	Options for the operation, including the keyspace. You must use `&` to escape any `.` or `&` in field names in the indexing clause. You cannot use `&` to escape any other characters. Dot notation, which is used to reference nested fields, should not be escaped.
`documentClass`	`Class<T>`	Work with specialized beans for the collection instead of the default `Document` type.

collectionName

String

The name of the new collection.

Collection names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

collectionDefinition

CollectionDefinition

Settings for the collection, including vector options, the default ID format, and indexing options.

options

CreateCollectionOptions

Options for the operation, including the keyspace.

You must use & to escape any . or & in field names in the indexing clause. You cannot use & to escape any other characters. Dot notation, which is used to reference nested fields, should not be escaped.

documentClass

Class<T>

Work with specialized beans for the collection instead of the default Document type.

Use the createCollection command.

Command signature

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": OPTIONS
  }
}'

Name Type Summary

Name	Type	Summary
`name`	`string`	The name of the new collection. Collection names must follow these rules: Can contain letters, numbers, and underscores Cannot exceed 48 characters Must be unique within the keyspace
`options`	`object`	Optional. The options for this operation. See Properties of `options` for more details.

name

string

The name of the new collection.

Collection names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

options

object

Optional. The options for this operation. See Properties of options for more details.

Properties of `options`
Name	Type	Summary
`defaultId`	`object`	Optional. Specifies the default ID type for documents in the collection. This is used when you insert a document without an `_id` field. Can be one of: `{"type": "objectId"}`: Each autogenerated `_id` value is an `objectId` as provided by the `bson` library. `{"type": "uuidv7"}`: Each autogenerated `_id` value is a version 7 UUID. This is designed as a replacement for version 1 time UUID, and it is recommended for use in new systems. `{"type": "uuidv6"}`: Each autogenerated `_id` value is a version 6 UUID. This is field-compatible with version 1 time UUIDs, and it supports lexicographical sorting. `{"type": "uuid"}`: Each autogenerated `_id` value is a version 4 UUID. This type is analogous to the `uuid` type and functions in Apache Cassandra®. See Create a collection and specify the default ID format for usage. For more information, see Document IDs. Default: Each autogenerated `_id` value is a string form of a version 4 UUID
`vector`	`object`	Optional. The vector configuration for the collection. This includes things like the vector dimension and similarity metric. This also includes settings for server-side embedding generation if you want your collection to have vectorize enabled. Required for vector search and hybrid search. The `vector` object contains the following properties: `dimension` (int): The dimension for vector embeddings in the collection. This should match the dimension vector that your embedding model produces. Optional if you specify a `vector.service.modelName` value that has a default dimension value. `metric` (string): The similarity metric to use for vector search. Can be one of: `cosine` (default), `dot_product`, `euclidean`. `service` (object): Optional. The configuration for a vectorize embedding provider integration. This lets your collection use vectorize to automatically generate embeddings. Use findEmbeddingProviders or see the documentation for your embedding provider integration to determine what values to specify. The `service` object contains the following properties: `provider` (string): The name of the vectorize embedding provider. `modelName` (string): A valid model name for the specified vectorize embedding provider. `authentication` (string): Optional depending on your provider. Use credentials stored in Astra DB KMS to authenticate with your vectorize embedding provider. In `options.vector.service.authentication.providerKey`, provide the credential’s API Key name as given in Astra DB KMS. Alternatively, you can omit the `authentication` object, and then provide the authentication key in an `x-embedding-api-key` header instead. If you use header authentication, you must provide the `x-embedding-api-key` header with every command that requires vectorize for this collection, including inserting data and vector search with vectorize. `parameters` (object): Optional depending on your provider. Additional parameters required for your embedding provider See Create a collection that can automatically generate vector embeddings and Create a collection that can store vector embeddings for usage.
`lexical`	`object`	Optional. The lexical search configuration for the collection. Only collections in databases in the AWS `us-east-2` region support this parameter. The `lexical` object has the following properties: `enabled` (boolean): Whether to enable lexical search for the collection. Use this to disable lexical search for the collection. Required to support hybrid search. `analyzer`: A string describing a built-in analyzer, or a JSON object describing an analyzer configuration. Strings must be one of the supported built-in analyzers. JSON objects must follow the specifications in Find data with CQL analyzers. See Create a collection that supports hybrid search for usage. Default: An object with an `enabled` value of `true` and an `analyzer` value of `"standard"`, which corresponds to the standard Apache Lucene™ analyzer.
`rerank`	`object`	Optional. The reranker configuration for the collection. Only collections in databases in the AWS `us-east-2` region support this parameter. The `rerank` object has the following properties: `enabled` (boolean): Whether to enable reranking for the collection. Use this to disable reranking and hybrid search for the collection. Required to support hybrid search. `service` (object): Describes the provider and model name for a reranker model. The `service` object contains the following properties: `provider` (string): The name of the reranking provider. Only `Nvidia` is supported. `modelName` (string): The name of a reranking model supported by the reranking provider. Only `nvidia/llama-3.2-nv-rerankqa-1b-v2` is supported. See Create a collection that supports hybrid search for usage. Default: An object with an `enabled` value of `true` and a `service` value corresponding to the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model. This means that reranking is enabled by default.
`indexing`	`object`	Optional. Configures selective indexing for data inserted to the collection. The `indexing` object must contain one of: `allow` (array): The properties to index. Must contain at least one property. `"allow": [""]` indexes all properties, which is the same as the default behavior. `deny` (array): The properties to not* index. Must contain at least one property. `"deny": [""]` means that no properties are indexed. You must use `&` to escape any `.` or `&` in field names in the indexing clause. You cannot use `&` to escape any other characters. Dot notation, which is used to reference nested fields, should not be escaped. See Create a collection and specify which fields to index and Create a collection and specify which fields shouldn’t be indexed for usage. Default:* All fields of all documents.

Examples

The following examples demonstrate how to create a collection.

Create a collection that is not vector-enabled

Python
TypeScript
Java
curl

from astrapy import DataAPIClient

# Get a database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection = database.create_collection("COLLECTION_NAME")

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

// Define the type for the collection
interface User {
  name: string;
  age?: number;
}

// Create a collection
(async function () {
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

// Create a collection
(async function () {
  const collection = await database.createCollection("COLLECTION_NAME");
})();

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    // Create a collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME");
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {}
  }
}'

Create a collection that can store vector embeddings

Collections that are vector-enabled can store vector embeddings in the reserved $vector field and work with vector search.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a collection:

You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.
You can use a fluent interface to build the collection definition and then create the collection from the definition.

CollectionDefinition object
Fluent interface

from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CollectionDefinition, CollectionVectorOptions

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        dimension=1024,
        metric=VectorMetric.COSINE,
    ),
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(1024)
    .set_vector_metric(VectorMetric.COSINE)
    .build()
)

collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector as an inline field in your interfaces, or you can extend the utility VectorDoc type provided by the client.

import { DataAPIClient, VectorDoc } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

// Define the type for the collection
interface User extends VectorDoc {
  name: string;
  age?: number;
}

(async function () {
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    {
      vector: {
        dimension: 1024,
        metric: "cosine",
      },
    },
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, or type-related issues will occur.

Consider using a type like VectorDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector field to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
    },
  });
})();

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    // Create a collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition().vectorDimension(1024).vectorSimilarity(SimilarityMetric.COSINE);

    Collection<Document> collection =
        database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": 1024,
        "metric": "cosine"
      }
    }
  }
}'

Create a collection that can automatically generate vector embeddings

If you want to automatically generate vector embeddings, create a vector-enabled collection and configure an embedding provider integration for the collection.

The configuration depends on the embedding provider.

Python
TypeScript
Java
curl

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="azureOpenAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "resourceName": "RESOURCE_NAME",
                "deploymentId": "DEPLOYMENT_ID",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="azureOpenAI",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        },
        parameters={
            "resourceName": "RESOURCE_NAME",
            "deploymentId": "DEPLOYMENT_ID",
        },
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="huggingfaceDedicated",
            model_name="endpoint-defined-model",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "endpointName": "ENDPOINT_NAME",
                "regionName": "REGION_NAME",
                "cloudName": "CLOUD_NAME",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="huggingfaceDedicated",
        model_name="endpoint-defined-model",
        authentication={
            "providerKey": "API_KEY_NAME",
        },
        parameters={
            "endpointName": "ENDPOINT_NAME",
            "regionName": "REGION_NAME",
            "cloudName": "CLOUD_NAME",
        },
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="huggingface",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="huggingface",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        }
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="jinaAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="jinaAI",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        }
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="mistral",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="mistral",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        }
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.COSINE,
        service=VectorServiceOptions(
            provider="nvidia",
            model_name="NV-Embed-QA",
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_metric(VectorMetric.COSINE)
    .set_vector_service(
        provider="nvidia",
        model_name="NV-Embed-QA"
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="openai",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "organizationId": "ORGANIZATION_ID",
                "projectId": "PROJECT_ID",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="openai",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        },
        parameters={
            "organizationId": "ORGANIZATION_ID",
            "projectId": "PROJECT_ID",
        },
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="upstageAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="upstageAI",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        }
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

CollectionDefinition object
Fluent interface

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.SIMILARITY_METRIC,
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="voyageAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(MODEL_DIMENSIONS)
    .set_vector_metric(VectorMetric.SIMILARITY_METRIC)
    .set_vector_service(
        provider="voyageAI",
        model_name="MODEL_NAME",
        authentication={
            "providerKey": "API_KEY_NAME",
        }
    )
    .build()
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "azureOpenAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
      parameters: {
        resourceName: "RESOURCE_NAME",
        deploymentId: "DEPLOYMENT_ID",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "azureOpenAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
      parameters: {
        resourceName: "RESOURCE_NAME",
        deploymentId: "DEPLOYMENT_ID",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "huggingfaceDedicated",
      modelName: "endpoint-defined-model",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
      parameters: {
        endpointName: "ENDPOINT_NAME",
        regionName: "REGION_NAME",
        cloudName: "CLOUD_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "huggingfaceDedicated",
      modelName: "endpoint-defined-model",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
      parameters: {
        endpointName: "ENDPOINT_NAME",
        regionName: "REGION_NAME",
        cloudName: "CLOUD_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "huggingface",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "huggingface",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "jinaAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "jinaAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "mistral",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "mistral",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    metric: "cosine",
    service: {
      provider: "nvidia",
      modelName: "NV-Embed-QA",
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    metric: "cosine",
    service: {
      provider: "nvidia",
      modelName: "NV-Embed-QA",
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "openai",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
      parameters: {
        organizationId: "ORGANIZATION_ID",
        projectId: "PROJECT_ID",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "openai",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
      parameters: {
        organizationId: "ORGANIZATION_ID",
        projectId: "PROJECT_ID",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "upstageAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "upstageAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector and $vectorize as inline fields in your interfaces, or you can extend the utility VectorizeDoc types provided by the client.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the type for the collection
interface User extends VectorizeDoc {
  name: string,
  age?: number,
}

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "voyageAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    collection_definition
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize field must still be a string, or type-related issues will occur.

Consider using a type like VectorizeDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector and $vectorize fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    dimension: MODEL_DIMENSIONS,
    metric: "SIMILARITY_METRIC",
    service: {
      provider: "voyageAI",
      modelName: "MODEL_NAME",
      authentication: {
        providerKey: "API_KEY_NAME",
      },
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object> parameters = new HashMap<>();
    parameters.put("resourceName", "RESOURCE_NAME");
    parameters.put("deploymentId", "DEPLOYMENT_ID");

    // Define the collection
    CollectionDefinition collectionDefinition =
    new CollectionDefinition()
        .vectorDimension(MODEL_DIMENSIONS)
        .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
        .vectorize(
            "azureOpenAI",
            "MODEL_NAME",
            "API_KEY_NAME",
            parameters);

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object> parameters = new HashMap<>();
    parameters.put("endpointName", "ENDPOINT_NAME");
    parameters.put("regionName", "REGION_NAME");
    parameters.put("cloudName", "CLOUD_NAME");

    // Define the collection
    CollectionDefinition collectionDefinition =
    new CollectionDefinition()
        .vectorDimension(MODEL_DIMENSIONS)
        .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
        .vectorize(
            "huggingfaceDedicated",
            "endpoint-defined-model",
            "API_KEY_NAME",
            parameters);

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorDimension(MODEL_DIMENSIONS)
            .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
            .vectorize(
                "huggingface",
                "MODEL_NAME",
                "API_KEY_NAME");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorDimension(MODEL_DIMENSIONS)
            .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
            .vectorize(
                "jinaAI",
                "MODEL_NAME",
                "API_KEY_NAME");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorDimension(MODEL_DIMENSIONS)
            .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
            .vectorize(
                "mistral",
                "MODEL_NAME",
                "API_KEY_NAME");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorSimilarity(SimilarityMetric.COSINE)
            .vectorize(
                "nvidia",
                "NV-Embed-QA");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object> parameters = new HashMap<>();
    parameters.put("organizationId", "ORGANIZATION_ID");
    parameters.put("projectId", "PROJECT_ID");

    // Define the collection
    CollectionDefinition collectionDefinition =
    new CollectionDefinition()
        .vectorDimension(MODEL_DIMENSIONS)
        .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
        .vectorize(
            "openai",
            "MODEL_NAME",
            "API_KEY_NAME",
            parameters);

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorDimension(MODEL_DIMENSIONS)
            .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
            .vectorize(
                "upstageAI",
                "MODEL_NAME",
                "API_KEY_NAME");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorDimension(MODEL_DIMENSIONS)
            .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
            .vectorize(
                "voyageAI",
                "MODEL_NAME",
                "API_KEY_NAME");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "azureOpenAI",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          },
          "parameters": {
            "resourceName": "RESOURCE_NAME",
            "deploymentId": "DEPLOYMENT_ID"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "metric": "SIMILARITY_METRIC",
        "dimension": MODEL_DIMENSIONS,
        "service": {
          "provider": "huggingfaceDedicated",
          "modelName": "endpoint-defined-model",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          },
          "parameters": {
            "endpointName": "ENDPOINT_NAME",
            "regionName": "REGION_NAME",
            "cloudName": "CLOUD_NAME"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "huggingface",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "jinaAI",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "mistral",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "metric": "cosine",
        "service": {
          "provider": "nvidia",
          "modelName": "NV-Embed-QA"
        }
      }
    }
  }
}'

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "openai",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          },
          "parameters": {
            "organizationId": "ORGANIZATION_ID",
            "projectId": "PROJECT_ID"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "upstageAI",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "voyageAI",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          }
        }
      }
    }
  }
}'

Replace the following:

COLLECTION_NAME: The name for your collection.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Create a collection that supports hybrid search

If you want to perform hybrid search on your collection, you must create a collection that has vector, lexical, and rerank enabled. Your collection must also be in a database in the AWS us-east-2 region.

Lexical and rerank are enabled by default when you create a collection in a database in the AWS us-east-2 region, but you can optionally configure the lexical analyzer and the reranker model.

For configuration details about the lexical analyzer, see Find data with CQL analyzers. The following example uses a configuration suitable for English text.

For configuration details about the reranker model, inspect the available reranker models. Only the NVIDIA llama-3.2-nv-rerankqa-1b-v2 reranking model reranker model is supported.

For configuration details about vector, see Create a collection that can store vector embeddings and Create a collection that can automatically generate vector embeddings.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a collection:

You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.
You can use a fluent interface to build the collection definition and then create the collection from the definition.

CollectionDefinition object
Fluent interface

from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionLexicalOptions,
    CollectionRerankOptions,
    CollectionVectorOptions,
    RerankServiceOptions,
    VectorServiceOptions,
)

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.COSINE,
        dimension=1024,
        service=VectorServiceOptions(
            provider="nvidia",
            model_name="NV-Embed-QA",
        ),
    ),
    lexical=CollectionLexicalOptions(
        analyzer={
            "tokenizer": {"name": "standard", "args": {}},
            "filters": [
                {"name": "lowercase"},
                {"name": "stop"},
                {"name": "porterstem"},
                {"name": "asciifolding"},
            ],
            "charFilters": [],
        },
        enabled=True,
    ),
    rerank=CollectionRerankOptions(
        enabled=True,
        service=RerankServiceOptions(
            provider="nvidia",
            model_name="nvidia/llama-3.2-nv-rerankqa-1b-v2",
        ),
    ),
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import VectorMetric

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder()
    .set_vector_dimension(1024)
    .set_vector_metric(VectorMetric.COSINE)
    .set_vector_service(
        provider="nvidia",
        model_name="NV-Embed-QA",
    )
    .set_lexical(
        {
            "tokenizer": {"name": "standard", "args": {}},
            "filters": [
                {"name": "lowercase"},
                {"name": "stop"},
                {"name": "porterstem"},
                {"name": "asciifolding"},
            ],
            "charFilters": [],
        }
    )
    .set_rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2")
    .build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

You can define $vector, $vectorize, and $lexical as inline fields in your interfaces, or you can extend the utility VectorDoc, VectorizeDoc, and LexicalDoc types provided by the client.

import { DataAPIClient, LexicalDoc, VectorizeDoc } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

// Define the type for the collection
interface User extends VectorizeDoc, LexicalDoc {
  name: string;
  age?: number;
}

(async function () {
  const collection = await database.createCollection<User>("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
      service: {
        provider: "nvidia",
        modelName: "NV-Embed-QA",
      },
    },
    lexical: {
      enabled: true,
      analyzer: {
        tokenizer: {
          name: "standard",
          args: {},
        },
        filters: [
          {
            name: "lowercase",
          },
          {
            name: "stop",
          },
          {
            name: "porterstem",
          },
          {
            name: "asciifolding",
          },
        ],
        charFilters: [],
      },
    },
    rerank: {
      enabled: true,
      service: {
        provider: "nvidia",
        modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
      },
    },
  });
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

The $vector field must still be number[] or DataAPIVector, and the $vectorize and $lexical fields must still be a string, or type-related issues will occur.

Consider using a type like VectorDoc & LexicalDoc & SomeDoc or VectorizeDoc & LexicalDoc & SomeDoc which allows the documents to remain untyped, but still statically requires the $vector, $vectorize, and $lexical fields to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    vector: {
      dimension: 1024,
      metric: "cosine",
      service: {
        provider: "nvidia",
        modelName: "NV-Embed-QA",
      },
    },
    lexical: {
      enabled: true,
      analyzer: {
        tokenizer: {
          name: "standard",
          args: {},
        },
        filters: [
          {
            name: "lowercase",
          },
          {
            name: "stop",
          },
          {
            name: "porterstem",
          },
          {
            name: "asciifolding",
          },
        ],
        charFilters: [],
      },
    },
    rerank: {
      enabled: true,
      service: {
        provider: "nvidia",
        modelName: "nvidia/llama-3.2-nv-rerankqa-1b-v2",
      },
    },
  });
})();

The Java client supports multiple ways to create a collection:

You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.
You can use a fluent interface to build the collection definition and then create the collection from the definition.

CollectionDefinition object
Fluent interface

import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.rerank.CollectionRerankOptions;
import com.datastax.astra.client.core.rerank.RerankServiceOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vector.VectorOptions;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    // Create a collection
    CollectionDefinition collectionDefinition = new CollectionDefinition();

    // Vector Options
    VectorServiceOptions vectorService =
        new VectorServiceOptions().provider("nvidia").modelName("NV-Embed-QA");
    VectorOptions vectorOptions =
        new VectorOptions()
            .dimension(1024)
            .metric(SimilarityMetric.COSINE.getValue())
            .service(vectorService);
    collectionDefinition.vector(vectorOptions);

    // Lexical Options
    Analyzer analyzer =
        new Analyzer()
            .tokenizer(STANDARD.getValue())
            .addFilter("lowercase")
            .addFilter("stop")
            .addFilter("porterstem")
            .addFilter("asciifolding");
    LexicalOptions lexicalOptions = new LexicalOptions().enabled(true).analyzer(analyzer);
    collectionDefinition.lexical(lexicalOptions);

    // Rerank Options
    RerankServiceOptions rerankService =
        new RerankServiceOptions()
            .modelName("nvidia/llama-3.2-nv-rerankqa-1b-v2")
            .provider("nvidia");
    CollectionRerankOptions rerankOptions =
        new CollectionRerankOptions().enabled(true).service(rerankService);
    collectionDefinition.rerank(rerankOptions);

    database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

import static com.datastax.astra.client.core.lexical.AnalyzerTypes.STANDARD;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.core.lexical.Analyzer;
import com.datastax.astra.client.core.lexical.LexicalOptions;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    database.createCollection(
        "COLLECTION_NAME",
        new CollectionDefinition()
            .vector(1024, SimilarityMetric.COSINE)
            .vectorize("nvidia", "NV-Embed-QA")
            .lexical(
                new LexicalOptions()
                    .enabled(true)
                    .analyzer(
                        new Analyzer()
                            .tokenizer(STANDARD.getValue())
                            .addFilter("lowercase")
                            .addFilter("stop")
                            .addFilter("porterstem")
                            .addFilter("asciifolding")))
            .rerank("nvidia", "nvidia/llama-3.2-nv-rerankqa-1b-v2"));
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "lexical": {
        "analyzer": {
            "tokenizer": {
                "name": "standard",
                "args": {}
            },
            "filters": [
                {
                   "name": "lowercase"
                },
                {
                   "name": "stop"
                },
                {
                   "name": "porterstem"
                },
                {
                   "name": "asciifolding"
                }
            ],
            "charFilters": []
        },
        "enabled": true
      },
      "rerank": {
        "enabled": true,
        "service": {
          "modelName": "nvidia/llama-3.2-nv-rerankqa-1b-v2",
          "provider": "nvidia"
        }
      },
      "vector": {
        "dimension": 1024,
        "metric": "cosine",
        "service": {
          "provider": "nvidia",
          "modelName": "NV-Embed-QA"
        }
      }
    }
  }
}'

Create a collection and specify the default ID format

For more information about the default ID format, see Document IDs. For allowed values, see the Parameters.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a collection:

You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.
You can use a fluent interface to build the collection definition and then create the collection from the definition.

CollectionDefinition object
Fluent interface

from astrapy import DataAPIClient
from astrapy.info import (
    CollectionDefinition,
    CollectionDefaultIDOptions,
)
from astrapy.constants import DefaultIdType

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    default_id=CollectionDefaultIDOptions(DefaultIdType.OBJECTID),
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition
from astrapy.constants import DefaultIdType

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder().set_default_id(DefaultIdType.OBJECTID).build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

Typed collections
Untyped collections

You can manually define a client-side type for your collection to help statically catch errors.

The _id field type should match the defaultId type.

import { DataAPIClient, ObjectId } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

// Define the type for the collection
interface User {
  _id: ObjectId;
  name: string;
  age?: number;
}

(async function () {
  const collection = await database.createCollection<User>(
    "COLLECTION_NAME",
    {
      defaultId: {
        type: "objectId",
      },
    },
  );
})();

If you don’t pass a type parameter, the collection remains untyped. This is a more flexible but less type-safe option.

However, if you later specify _id when you insert a document, DataStax recommends that it has the same type as the defaultId.

Consider using a type like { id: ObjectId } & SomeDoc which allows the documents to remain untyped, but still statically requires the _id field to have the correct type.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    defaultId: {
      type: "objectId",
    },
  });
})();

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefaultIdTypes;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    // Create a collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition().defaultId(CollectionDefaultIdTypes.OBJECT_ID);

    Collection<Document> collection =
        database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "defaultId": {
        "type": "uuidv7"
      }
    }
  }
}'

Create a collection and specify which fields to index

For more information about selective indexing, see Indexes in collections.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a collection:

You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.
You can use a fluent interface to build the collection definition and then create the collection from the definition.

CollectionDefinition object
Fluent interface

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    indexing={"allow": ["city", "country"]},
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder().set_indexing("allow", ["city", "country"]).build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    indexing: {
      allow: ["city", "country"],
    },
  });
})();

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    // Create a collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition().indexingAllow("city", "country");

    Collection<Document> collection =
        database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "indexing": {
        "allow": ["city", "country"]
      }
    }
  }
}'

Create a collection and specify which fields shouldn’t be indexed

For more information about selective indexing, see Indexes in collections.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a collection:

You can define the collection parameters in a CollectionDefinition object and then create the collection from the CollectionDefinition object.
You can use a fluent interface to build the collection definition and then create the collection from the definition.

CollectionDefinition object
Fluent interface

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = CollectionDefinition(
    indexing={"deny": ["city", "country"]},
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

from astrapy import DataAPIClient
from astrapy.info import CollectionDefinition

# Get an existing database
client = DataAPIClient()
database = client.get_database(
    "API_ENDPOINT",
    token="APPLICATION_TOKEN",
)

# Create a collection
collection_definition = (
    CollectionDefinition.builder().set_indexing("deny", ["city", "country"]).build()
)
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get a database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

(async function () {
  const collection = await database.createCollection("COLLECTION_NAME", {
    indexing: {
      deny: ["city", "country"],
    },
  });
})();

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.databases.Database;

public class Example {

  public static void main(String[] args) {
    // Get a database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    // Create a collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition().indexingDeny("city", "country");

    Collection<Document> collection =
        database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "indexing": {
        "deny": ["city", "country"]
      }
    }
  }
}'

Client reference

Python
TypeScript
Java
curl

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Create a collection

Result

Parameters

Examples

Create a collection that is not vector-enabled

Create a collection that can store vector embeddings

Create a collection that can automatically generate vector embeddings

Create a collection that supports hybrid search

Create a collection and specify the default ID format

Create a collection and specify which fields to index

Create a collection and specify which fields shouldn’t be indexed

Client reference

Was this helpful?

Give Feedback