Create a table

Tables with the Data API are currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

Creates a new table in a keyspace in a Serverless (Vector) database.

After you create a table, index columns that you want to sort or filter. This optimizes your queries and avoids resource intensive, long running allow filtering operations. All indexed column names must use snake case, not camel case.

You can also modify the table columns later. To add data to your table, insert rows.

Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart.

Result

  • Python

  • TypeScript

  • Java

  • curl

Creates a table with the specified parameters.

Returns a Table object. You can use this object to work with rows in the table.

Unless you specify the row_type parameter, the table is typed as Table[dict].

For more information, see Typing support.

Creates a table with the specified parameters.

Returns a promise that resolves to a <Table<Schema, PKey>> object. You can use this object to work with rows in the table.

Unless you specify the Schema, the table is typed as Table<Record<string, any>>.

Creates a table with the specified parameters.

Returns a Table<T> object. You can use this object to work with rows in the table.

Unless you specify the rowClass parameter, the table is typed as Table<Row>.

Creates a table with the specified parameters.

If the command succeeds, the response indicates the success.

Example response:

{
  "status": {
    "ok": 1
  }
}

Parameters

When you create a table, you specify the following:

  • Table name

  • Column names and data types

  • Primary keys in tables, which is the unique identifier for the rows in the table

  • Additional table, command, or client-specific settings, which can be optional

  • Python

  • TypeScript

  • Java

  • curl

Use the create_table method, which belongs to the astrapy.Database class.

Method signature
create_table(
  name: str,
  *,
  definition: CreateTableDefinition | dict[str, Any],
  row_type: type[Any],
  keyspace: str,
  if_not_exists: bool,
  table_admin_timeout_ms: int,
  request_timeout_ms: int,
  timeout_ms: int,
  embedding_api_key: str | EmbeddingHeadersProvider,
  spawn_api_options: APIOptions,
) -> Table[ROW]
Name Type Summary

name

str

The name of the table.

definition

CreateTableDefinition | dict

A complete table definition for the table, including the column names, data types, other column settings, and the primary key.

This can be an instance of CreateTableDefinition or an equivalent nested dictionary, in which case it is parsed into a CreateTableDefinition. For examples of both formats, see the examples for this command.

Some types require specific column definitions, particularly maps, lists, sets, and vector columns. For more information about all types, see Data types in tables.

row_type

type

This parameter acts a formal specifier for the type checker. If omitted, the resulting Table is implicitly a Table[dict]. If provided, row_type must match the type hint specified in the assignment. For more information, see Typing support.

keyspace

str | None

The keyspace where the table is to be created. If not specified, the general keyspace setting for the database is used.

if_not_exists

bool | None

If True, the command doesn’t throw an error if a table with the given name already exists. In this case, the command silently does nothing and no actual table creation takes place on the database.

If False (default), an error occurs if a table with the specified name already exists in the database.

if_not_exists: True, does not check the definition of any existing tables. This parameter checks table names only.

This means that the command succeeds if the given table name is already in use, even if the table definition is different.

table_admin_timeout_ms

int | None

A timeout, in milliseconds, to impose on the underlying API request. If not provided, the corresponding Database defaults apply. This parameter is aliased as request_timeout_ms and timeout_ms for convenience.

embedding_api_key

str | EmbeddingHeadersProvider

Optional parameter for tables that have a vector column with a vectorize embedding provider integration. For more information, see Define a column to automatically generate vector embeddings.

As an alternative to Astra DB KMS authentication, use embedding_api_key to store one or more embedding provider API keys on the Table instance for vectorize header authentication. The client automatically passes the key as an X-embedding-api-key header with all table operations.

Most embedding provider integrations accept a plain string for header authentication. However, some vectorize providers and models require specialized subclasses of EmbeddingHeadersProvider, such as AWSEmbeddingHeadersProvider, for header authentication.

spawn_api_options

APIOptions

A complete or partial specification of the APIOptions to override the defaults inherited from the Database. This allows for nuanced table configuration. For example, if APIOptions is passed together with named timeout parameters, the latter take precedence in their respective settings.

Use the createTable method, which belongs to the Db class.

Method signature
async createTable<const Def extends CreateTableDefinition>(
  name: string,
  options: {
    definition: CreateTableDefinition,
    ifNotExists: boolean,
    embeddingApiKey?: string | EmbeddingHeadersProvider,
    logging?: DataAPILoggingConfig,
    serdes?: TableSerDesConfig,
    timeoutDefaults?: Partial<TimeoutDescriptor>,
    keyspace?: string,

  }
): Table<InferTableSchema<Def>, InferTablePrimaryKey<Def>>

Parameters:

Name Type Summary

name

string

The name of the table.

option?

CreateTableOptions

The options for spawning the Table instance.

Options (CreateTableOptions<Schema>):

Name Type Summary

definition

CreateTableDefinition

A TypeScript object defining the table to create, including the following:

  • columns: An object defining the table’s columns as a series of key-value pairs where each key is a column name and each value is the column’s data type. Column names must be unique within a table.

    The Data API accepts column definitions in two formats:

    "columns": {
      "COLUMN_NAME": "DATA_TYPE",
      "COLUMN_NAME": {
        "type": "DATA_TYPE"
      }
    }

    Data types are enums of supported data types, such as 'text', 'int', or 'boolean'.

    For 'map', 'list', and 'set', types, you must use the object format and provide additional options. For more information, see Map, list, and set types.

    For the 'vector' type, you must use the object format and provide information about the stored vectors, such as dimension and service options. For more information, see Vector type.

  • primaryKey: The table’s primary key definition as a single string or an object containing partition keys. For more information, see Primary keys in tables.

ifNotExists

boolean

If true, the command doesn’t throw an error if a table with the given name already exists. In this case, the command silently does nothing and no actual table creation takes place on the database.

If false (default), an error occurs if a table with the specified name already exists in the database.

ifNotExists: true, does not check the schema of any existing tables. This parameter checks table names only.

This means that the command succeeds if the given table name is already in use, even if the schema is different.

keyspace?

string

The keyspace where you want to create the table. If not specified, the working keyspace of the Db is used.

embeddingApiKey?

string | EmbeddingHeadersProvider

Optional parameter for tables that have a vector column with a vectorize embedding provider integration. For more information, see Define a column to automatically generate vector embeddings.

As an alternative to Astra DB KMS authentication, use embeddingApiKey to store an embedding provider API key on the Table instance for vectorize header authentication. The client automatically passes the key as an X-embedding-api-key header with operations that use vectorize.

Most embedding provider integrations accept a plain string for header authentication. However, some vectorize providers and models require specialized subclasses of EmbeddingHeadersProvider for header authentication.

logging?

DataAPILoggingConfig

The configuration for logging events emitted by the DataAPIClient.

timeoutDefaults?

Partial<TimeoutDescriptor>

The default timeout options for any operation performed on this Table instance. For more information, see TimeoutDescriptor.

serdes?

TableSerDesConfig

Lower-level serialization/deserialization configuration for this table. For more information, see Custom Ser/Des.

Use the createTable method, which belongs to the com.datastax.astra.client.databases.Database class.

Method signature
<T> Table<T> createTable(
  String tableName,
  TableDefinition tableDefinition,
  Class<T> rowClass,
  CreateTableOptions createTableOptions
)
<T> Table<T> createTable(
  String tableName,
  TableDefinition tableDefinition,
  Class<T> rowClass
)
<T> Table<T> createTable(Class<T> rowClass)
<T> Table<T> createTable(
  Class<T> rowClass,
  CreateTableOptions createTableOptions
)
<T> Table<T> createTable(
  String tableName,
  Class<T> rowClass,
  CreateTableOptions createTableOptions
)
Table<Row> createTable(
  String tableName,
  TableDefinition tableDefinition,
  CreateTableOptions options
)
Table<Row> createTable(
  String tableName,
  TableDefinition tableDefinition
)
Name Type Summary

name

String

The name of the table.

definition

TableDefinition

A complete table definition for the table, including the column names, data types, other column settings, and the primary key.

Some types require specific column definitions, particularly maps, lists, sets, and vector columns. For more information about all types, see Data types in tables.

rowClass

Class<?>

An optional specification of the class of the table’s row object. If not provided, the default is Row, which is close to a Map object.

createTableOptions

CreateTableOptions

Options and additional parameters for the createTable operation, such as ifNotExists, timeout, and embeddingAuthProvider:

  • ifNotExists(): If true, the command doesn’t throw an error if a table with the given name already exists. In this case, the command silently does nothing and no actual table creation takes place on the database.

    If false (default), an error occurs if a table with the specified name already exists in the database.

    ifNotExists(true), does not check the definition of any existing tables. This parameter checks table names only.

    This means that the command succeeds if the given table name is already in use, even if the table definition is different.

  • timeout(): A timeout, in milliseconds, to impose on the underlying API request.

  • embeddingAuthProvider(): Optional parameter for tables that have a vector column with a vectorize embedding provider integration. For more information, see Define a column to automatically generate vector embeddings.

    As an alternative to Astra DB KMS authentication, use embeddingAuthProvider to store an embedding provider API key on the Table instance for vectorize header authentication. The client automatically passes the key as an X-embedding-api-key header with operations that use vectorize.

    Most embedding provider integrations accept a plain string for header authentication. However, some vectorize providers and models require specialized subclasses of EmbeddingHeadersProvider for header authentication.

Use the createTable command.

Command signature
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        "COLUMN_NAME": "DATA_TYPE",
        "COLUMN_NAME": "DATA_TYPE"
      },
      "primaryKey": "PRIMARY_KEY_DEFINITION"
    }
  }
}'
Name Type Summary

createTable

command

The Data API command to create a table in a Serverless (Vector) database. It acts as a container for all the attributes and settings required to create the table.

name

string

The name of the table. This must be unique within the database specified in the request URL.

definition

object

Contains the columns and primary key definition for the table.

definition.columns

object

Defines the table’s columns as a series of key-value pairs where each key is a column name and each value is the column’s data type. Column names must be unique within a table.

The Data API accepts column definitions in two formats:

"columns": {
  "COLUMN_NAME": "DATA_TYPE",
  "COLUMN_NAME": {
    "type": "DATA_TYPE"
  }
}

Data types are enums of supported data types, such as "text", "int", or "boolean".

For map, list, and set, types, you must use the object format and provide additional options. For more information, see Map, list, and set types.

For the vector type, you must use the object format and provide information about the stored vectors, such as dimension and service options. For more information, see Vector type.

definition.primaryKey

string or object

Defines the primary key for the table. For more information, see Primary keys in tables.

Examples

The following examples demonstrate how to create a table.

Create a table with a single-column primary key

A single-column primary key is a primary key consisting of one column. For more information, see Primary keys in tables.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

  • CreateTableDefinition object

  • Fluent interface

  • Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableKeyValuedColumnType,
    TableKeyValuedColumnTypeDescriptor,
    TableScalarColumnTypeDescriptor,
    TableValuedColumnTypeDescriptor,
    TableValuedColumnType,
    TablePrimaryKeyDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "title": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
        "number_of_pages": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.INT
        ),
        "rating": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.FLOAT
        ),
        "genres": TableValuedColumnTypeDescriptor(
            column_type=TableValuedColumnType.SET,
            value_type=ColumnType.TEXT,
        ),
        "metadata": TableKeyValuedColumnTypeDescriptor(
            column_type=TableKeyValuedColumnType.MAP,
            key_type=ColumnType.TEXT,
            value_type=ColumnType.TEXT,
        ),
        "is_checked_out": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.BOOLEAN
        ),
        "due_date": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.DATE
        ),
    },
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["title"],
        partition_sort={}
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_column("title", ColumnType.TEXT)
    .add_column("number_of_pages", ColumnType.INT)
    .add_column("rating", ColumnType.FLOAT)
    .add_set_column(
        "genres",
        ColumnType.TEXT,
    )
    .add_map_column(
        "metadata",
        # This is the key type for the map column
        ColumnType.TEXT,
        # This is the value type for the map column
        ColumnType.TEXT,
    )
    .add_column("is_checked_out", ColumnType.BOOLEAN)
    .add_column("due_date", ColumnType.DATE)
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    .add_partition_by(["title"])
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "title": {"type": "text"},
        "number_of_pages": {"type": "int"},
        "rating": {"type": "float"},
        "genres": {"type": "set", "valueType": "text"},
        "metadata": {"type": "map", "keyType": "text", "valueType": "text"},
        "is_checked_out": {"type": "boolean"},
        "due_date": {"type": "date"},
    },
    "primaryKey": {
        "partitionBy": ["title"],
        "partitionSort": {},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

To do this, first create the table definition. Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key. To create the table, provide the table definition and the inferred types to the createTable method.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["title"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["title"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  title: string;
  number_of_pages?: number | null | undefined;
  rating?: number | null | undefined;
  genres?: Set<string> | undefined;
  metadata?: Map<string, string> | undefined;
  is_checked_out?: boolean | null | undefined;
  due_date?: DataAPIDate | null | undefined;
};

type TablePrimaryKey = Pick<TableSchema, "title">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["title"],
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>(
    "example_table",
    { definition: tableDefinition },
  );
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

  • Use a generic type

  • Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;

public class CreateTable {
  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnText("title")
            .addColumnInt("number_of_pages")
            .addColumn("rating", ColumnTypes.FLOAT)
            .addColumnSet("genres", ColumnTypes.TEXT)
            .addColumnMap("metadata", ColumnTypes.TEXT, ColumnTypes.TEXT)
            .addColumnBoolean("is_checked_out")
            .addColumn("due_date", ColumnTypes.DATE)
            // Define the primary key for the table.
            // In this case, the table uses a single-column primary key.
            .addPartitionBy("title");

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

package com.examples;

import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import com.datastax.astra.client.tables.Table;
import lombok.Data;

import java.util.Date;
import java.util.Map;
import java.util.Set;

public class CreateTable {
  @EntityTable("example_table")
  @Data
  public class Book {
    @PartitionBy(0)
    @Column(name = "title", type = ColumnTypes.TEXT)
    private String title;

    @Column(name = "number_of_pages", type = ColumnTypes.INT)
    private Integer number_of_pages;

    @Column(name = "rating", type = ColumnTypes.FLOAT)
    private Float rating;

    @Column(name = "genres", type = ColumnTypes.SET, valueType = ColumnTypes.TEXT)
    private Set<String> genres;

    @Column(
        name = "metadata",
        type = ColumnTypes.MAP,
        keyType = ColumnTypes.TEXT,
        valueType = ColumnTypes.TEXT)
    private Map<String, String> metadata;

    @Column(name = "is_checked_out", type = ColumnTypes.BOOLEAN)
    private Boolean is_checked_out;

    @Column(name = "due_date", type = ColumnTypes.DATE)
    private Date due_date;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "title": {
          "type": "text"
        },
        "number_of_pages": {
          "type": "int"
        },
        "rating": {
          "type": "float"
        },
        "metadata": {
          "type": "map",
          "keyType": "text",
          "valueType": "text"
        },
        "genres": {
          "type": "set",
          "valueType": "text"
        },
        "is_checked_out": {
          "type": "boolean"
        },
        "due_date": {
          "type": "date"
        }
      },
      "primaryKey": "title"
    }
  }
}'

Create a table with a composite primary key

A composite primary key is a primary key consisting of multiple columns. For more information, see Primary keys in tables.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

  • CreateTableDefinition object

  • Fluent interface

  • Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableKeyValuedColumnType,
    TableKeyValuedColumnTypeDescriptor,
    TableScalarColumnTypeDescriptor,
    TableValuedColumnTypeDescriptor,
    TableValuedColumnType,
    TablePrimaryKeyDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "title": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
        "number_of_pages": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.INT
        ),
        "rating": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.FLOAT
        ),
        "genres": TableValuedColumnTypeDescriptor(
            column_type=TableValuedColumnType.SET,
            value_type=ColumnType.TEXT,
        ),
        "metadata": TableKeyValuedColumnTypeDescriptor(
            column_type=TableKeyValuedColumnType.MAP,
            key_type=ColumnType.TEXT,
            value_type=ColumnType.TEXT,
        ),
        "is_checked_out": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.BOOLEAN
        ),
        "due_date": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.DATE
        ),
    },
    # Define the primary key for the table.
    # In this case, the table uses a composite primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["title", "rating"],
        partition_sort={}
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_column("title", ColumnType.TEXT)
    .add_column("number_of_pages", ColumnType.INT)
    .add_column("rating", ColumnType.FLOAT)
    .add_set_column(
        "genres",
        ColumnType.TEXT,
    )
    .add_map_column(
        "metadata",
        # This is the key type for the map column
        ColumnType.TEXT,
        # This is the value type for the map column
        ColumnType.TEXT,
    )
    .add_column("is_checked_out", ColumnType.BOOLEAN)
    .add_column("due_date", ColumnType.DATE)
    # Define the primary key for the table.
    # In this case, the table uses a composite primary key.
    .add_partition_by(["title", "rating"])
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "title": {"type": "text"},
        "number_of_pages": {"type": "int"},
        "rating": {"type": "float"},
        "genres": {"type": "set", "valueType": "text"},
        "metadata": {"type": "map", "keyType": "text", "valueType": "text"},
        "is_checked_out": {"type": "boolean"},
        "due_date": {"type": "date"},
    },
    "primaryKey": {
        "partitionBy": ["title", "rating"],
        "partitionSort": {},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

To do this, first create the table definition. Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key. To create the table, provide the table definition and the inferred types to the createTable method.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a composite primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a composite primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  title: string;
  number_of_pages?: number | null | undefined;
  rating?: number | null | undefined;
  genres?: Set<string> | undefined;
  metadata?: Map<string, string> | undefined;
  is_checked_out?: boolean | null | undefined;
  due_date?: DataAPIDate | null | undefined;
};

type TablePrimaryKey = Pick<TableSchema, "title" | "rating">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a composite primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>(
    "example_table",
    { definition: tableDefinition },
  );
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

  • Use a generic type

  • Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;

public class CreateTable {
  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnText("title")
            .addColumnInt("number_of_pages")
            .addColumn("rating", ColumnTypes.FLOAT)
            .addColumnSet("genres", ColumnTypes.TEXT)
            .addColumnMap("metadata", ColumnTypes.TEXT, ColumnTypes.TEXT)
            .addColumnBoolean("is_checked_out")
            .addColumn("due_date", ColumnTypes.DATE)
            // Define the primary key for the table.
            // In this case, the table uses a composite primary key.
            .addPartitionBy("title")
            .addPartitionBy("rating");

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

package com.examples;

import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import com.datastax.astra.client.tables.Table;
import lombok.Data;

import java.util.Date;
import java.util.Map;
import java.util.Set;

public class CreateTable {
  @EntityTable("example_table")
  @Data
  public class Book {
    @PartitionBy(0)
    @Column(name = "title", type = ColumnTypes.TEXT)
    private String title;

    @Column(name = "number_of_pages", type = ColumnTypes.INT)
    private Integer number_of_pages;

    @PartitionBy(1)
    @Column(name = "rating", type = ColumnTypes.FLOAT)
    private Float rating;

    @Column(name = "genres", type = ColumnTypes.SET, valueType = ColumnTypes.TEXT)
    private Set<String> genres;

    @Column(
        name = "metadata",
        type = ColumnTypes.MAP,
        keyType = ColumnTypes.TEXT,
        valueType = ColumnTypes.TEXT)
    private Map<String, String> metadata;

    @Column(name = "is_checked_out", type = ColumnTypes.BOOLEAN)
    private Boolean is_checked_out;

    @Column(name = "due_date", type = ColumnTypes.DATE)
    private Date due_date;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "title": {
          "type": "text"
        },
        "number_of_pages": {
          "type": "int"
        },
        "rating": {
          "type": "float"
        },
        "metadata": {
          "type": "map",
          "keyType": "text",
          "valueType": "text"
        },
        "genres": {
          "type": "set",
          "valueType": "text"
        },
        "is_checked_out": {
          "type": "boolean"
        },
        "due_date": {
          "type": "date"
        }
      },
      "primaryKey": {
        "partitionBy": [
          "title", "rating"
        ]
      }
    }
  }
}'

Create a table with a compound primary key

A compound primary key is a primary key consisting of partition (grouping) columns and clustering (sorting) columns. For more information, see Primary keys in tables.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

  • CreateTableDefinition object

  • Fluent interface

  • Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableKeyValuedColumnType,
    TableKeyValuedColumnTypeDescriptor,
    TableScalarColumnTypeDescriptor,
    TableValuedColumnTypeDescriptor,
    TableValuedColumnType,
    TablePrimaryKeyDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "title": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
        "number_of_pages": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.INT
        ),
        "rating": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.FLOAT
        ),
        "genres": TableValuedColumnTypeDescriptor(
            column_type=TableValuedColumnType.SET,
            value_type=ColumnType.TEXT,
        ),
        "metadata": TableKeyValuedColumnTypeDescriptor(
            column_type=TableKeyValuedColumnType.MAP,
            key_type=ColumnType.TEXT,
            value_type=ColumnType.TEXT,
        ),
        "is_checked_out": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.BOOLEAN
        ),
        "due_date": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.DATE
        ),
    },
    # Define the primary key for the table.
    # In this case, the table uses a compound primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["title", "rating"],
        partition_sort={
          "number_of_pages": SortMode.ASCENDING,
          "is_checked_out": SortMode.DESCENDING,
          }
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.constants import SortMode
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_column("title", ColumnType.TEXT)
    .add_column("number_of_pages", ColumnType.INT)
    .add_column("rating", ColumnType.FLOAT)
    .add_set_column(
        "genres",
        ColumnType.TEXT,
    )
    .add_map_column(
        "metadata",
        # This is the key type for the map column
        ColumnType.TEXT,
        # This is the value type for the map column
        ColumnType.TEXT,
    )
    .add_column("is_checked_out", ColumnType.BOOLEAN)
    .add_column("due_date", ColumnType.DATE)
    # Define the primary key for the table.
    # In this case, the table uses a compound primary key.
    .add_partition_by(["title", "rating"])
    .add_partition_sort({
      "number_of_pages": SortMode.ASCENDING,
      "is_checked_out": SortMode.DESCENDING,
    })
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "title": {"type": "text"},
        "number_of_pages": {"type": "int"},
        "rating": {"type": "float"},
        "genres": {"type": "set", "valueType": "text"},
        "metadata": {"type": "map", "keyType": "text", "valueType": "text"},
        "is_checked_out": {"type": "boolean"},
        "due_date": {"type": "date"},
    },
    "primaryKey": {
        "partitionBy": ["title", "rating"],
        "partitionSort": {"number_of_pages": 1, "is_checked_out": -1},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

To do this, first create the table definition. Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key. To create the table, provide the table definition and the inferred types to the createTable method.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a compound primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
    partitionSort: { number_of_pages: 1, is_checked_out: -1},
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a compound primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
    partitionSort: { number_of_pages: 1, is_checked_out: -1 },
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  title: string;
  number_of_pages?: number | null | undefined;
  rating?: number | null | undefined;
  genres?: Set<string> | undefined;
  metadata?: Map<string, string> | undefined;
  is_checked_out?: boolean | null | undefined;
  due_date?: DataAPIDate | null | undefined;
};

type TablePrimaryKey = Pick<TableSchema, "title" | "rating">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a compound primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
    partitionSort: { number_of_pages: 1, is_checked_out: -1 },
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>(
    "example_table",
    { definition: tableDefinition },
  );
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

  • Use a generic type

  • Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;
import static com.datastax.astra.client.core.query.Sort.ascending;
import static com.datastax.astra.client.core.query.Sort.descending;

public class CreateTable {
  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnText("title")
            .addColumnInt("number_of_pages")
            .addColumn("rating", ColumnTypes.FLOAT)
            .addColumnSet("genres", ColumnTypes.TEXT)
            .addColumnMap("metadata", ColumnTypes.TEXT, ColumnTypes.TEXT)
            .addColumnBoolean("is_checked_out")
            .addColumn("due_date", ColumnTypes.DATE)
            // Define the primary key for the table.
            // In this case, the table uses a compound primary key.
            .addPartitionBy("title")
            .addPartitionBy("rating")
            .addPartitionSort(ascending("number_of_pages"))
            .addPartitionSort(descending("is_checked_out"));

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

package com.examples;

import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import com.datastax.astra.client.tables.mapping.PartitionSort;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.core.query.SortOrder;
import lombok.Data;

import java.util.Date;
import java.util.Map;
import java.util.Set;

public class CreateTable {
  @EntityTable("example_table")
  @Data
  public class Book {
    @PartitionBy(0)
    @Column(name = "title", type = ColumnTypes.TEXT)
    private String title;

    @PartitionSort(position = 0, order= SortOrder.ASCENDING)
    @Column(name = "number_of_pages", type = ColumnTypes.INT)
    private Integer number_of_pages;

    @PartitionBy(1)
    @Column(name = "rating", type = ColumnTypes.FLOAT)
    private Float rating;

    @Column(name = "genres", type = ColumnTypes.SET, valueType = ColumnTypes.TEXT)
    private Set<String> genres;

    @Column(
        name = "metadata",
        type = ColumnTypes.MAP,
        keyType = ColumnTypes.TEXT,
        valueType = ColumnTypes.TEXT)
    private Map<String, String> metadata;

    @PartitionSort(position = 1, order= SortOrder.DESCENDING)
    @Column(name = "is_checked_out", type = ColumnTypes.BOOLEAN)
    private Boolean is_checked_out;

    @Column(name = "due_date", type = ColumnTypes.DATE)
    private Date due_date;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "title": {
          "type": "text"
        },
        "number_of_pages": {
          "type": "int"
        },
        "rating": {
          "type": "float"
        },
        "metadata": {
          "type": "map",
          "keyType": "text",
          "valueType": "text"
        },
        "genres": {
          "type": "set",
          "valueType": "text"
        },
        "is_checked_out": {
          "type": "boolean"
        },
        "due_date": {
          "type": "date"
        }
      },
      "primaryKey": {
        "partitionBy": [
          "title",
          "rating"
        ],
        "partitionSort": {
          "number_of_pages": 1,
          "is_checked_out": -1
        }
      }
    }
  }
}'

Create a table with a column to store vector embeddings

If you want to store pre-generated vector embeddings in a table, create a table with a vector column. A table can include more than one vector column.

  • Python

  • TypeScript

  • Java

  • curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

  • CreateTableDefinition object

  • Fluent interface

  • Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "example_vector": TableVectorColumnTypeDescriptor(
            dimension=1024,
        ),
        "example_non_vector": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        )
    },
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["example_non_vector"],
        partition_sort={}
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_vector_column("example_vector", dimension=1024)
    .add_column("example_non_vector", ColumnType.TEXT)
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    .add_partition_by(["example_non_vector"])
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "example_vector": {"type": "vector", "dimension": 1024},
        "example_non_vector": {"type": "text"},
    },
    "primaryKey": {
        "partitionBy": ["example_non_vector"],
        "partitionSort": {},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

To do this, first create the table definition. Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key. To create the table, provide the table definition and the inferred types to the createTable method.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    example_vector: { type: 'vector', dimension: 1024 },
    example_non_vector: "text",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["example_non_vector"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    example_vector: { type: 'vector', dimension: 1024 },
    example_non_vector: "text",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["example_non_vector"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  example_vector: DataAPIVector,
  example_non_vector: string;
};

type TablePrimaryKey = Pick<TableSchema, "example_non_vector">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    example_vector: { type: 'vector', dimension: 1024 },
    example_non_vector: "text",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["example_non_vector"],
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>(
    "example_table",
    { definition: tableDefinition },
  );
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

  • Use a generic type

  • Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

package com.examples;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.core.vector.SimilarityMetric;

public class CreateTable {
  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnVector(
                "example_vector",
                 new ColumnDefinitionVector()
                    .dimension(1024)
                    .metric(SimilarityMetric.COSINE)
            )
            .addColumnText("example_non_vector")
            // Define the primary key for the table.
            // In this case, the table uses a single-column primary key.
            .addPartitionBy("example_non_vector");

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

package com.examples;

import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.tables.Table;
import lombok.Data;

public class CreateTable {
  @EntityTable("example_table")
  @Data
  public class Book {
    @ColumnVector(name ="example_vector", dimension = 1024, metric = SimilarityMetric.COSINE)
    private DataAPIVector vector;

    @PartitionBy(0)
    @Column(name = "example_non_vector", type = ColumnTypes.TEXT)
    private String exampleNonVector;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "example_vector": {
          "type": "vector",
          "dimension": 1024
        },
        "example_non_vector": {
          "type": "text"
        }
      },
      "primaryKey": "example_non_vector"
    }
  }
}'

Create a table with a column to automatically generate vector embeddings

If you want to automatically generate vector embeddings, create a table with a vector column and configure an embedding provider integration for the column.

The configuration depends on the embedding provider.

You can also configure an embedding provider integration after table creation. For more information, see Alter a table.

If you want to store the original text in addition to the vector embeddings that were generated from the text, then you need to create a separate column to store the text.

  • Python

  • TypeScript

  • Java

  • curl

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Azure OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="azureOpenAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "resourceName": "RESOURCE_NAME",
                    "deploymentId": "DEPLOYMENT_ID",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Azure OpenAI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="azureOpenAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "resourceName": "RESOURCE_NAME",
                "deploymentId": "DEPLOYMENT_ID",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Azure OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "azureOpenAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
            "parameters": {
                "resourceName": "RESOURCE_NAME",
                "deploymentId": "DEPLOYMENT_ID",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Hugging Face Dedicated integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="huggingfaceDedicated",
                model_name="endpoint-defined-model",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "endpointName": "ENDPOINT_NAME",
                    "regionName": "REGION_NAME",
                    "cloudName": "CLOUD_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Hugging Face Dedicated integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="huggingfaceDedicated",
            model_name="endpoint-defined-model",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "endpointName": "ENDPOINT_NAME",
                "regionName": "REGION_NAME",
                "cloudName": "CLOUD_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Dedicated integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingfaceDedicated",
            "model_name": "endpoint-defined-model",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
            "parameters": {
                "endpointName": "ENDPOINT_NAME",
                "regionName": "REGION_NAME",
                "cloudName": "CLOUD_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Hugging Face Serverless integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="huggingface",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Hugging Face Serverless integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="huggingface",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Serverless integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingface",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Jina AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="jinaAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Jina AI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="jinaAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Jina AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "jinaAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Mistral AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="mistral",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Mistral AI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="mistral",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Mistral AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "mistral",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            service=VectorServiceOptions(
                provider="nvidia",
                model_name="NV-Embed-QA",
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The NVIDIA integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        service=VectorServiceOptions(
            provider="nvidia",
            model_name="NV-Embed-QA",
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "service": {
            "provider": "nvidia",
            "model_name": "NV-Embed-QA",
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="openai",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "organizationId": "ORGANIZATION_ID",
                    "projectId": "PROJECT_ID",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The OpenAI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="openai",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "organizationId": "ORGANIZATION_ID",
                "projectId": "PROJECT_ID",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "openai",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
            "parameters": {
                "organizationId": "ORGANIZATION_ID",
                "projectId": "PROJECT_ID",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Upstage integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="upstageAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Upstage integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="upstageAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Upstage integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "upstageAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

  • CollectionDefinition object

  • Fluent interface

  • Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Voyage AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="voyageAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Voyage AI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="voyageAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)
import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Voyage AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "voyageAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Azure OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'azureOpenAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          resourceName: 'RESOURCE_NAME',
          deploymentId: 'DEPLOYMENT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Azure OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'azureOpenAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          resourceName: 'RESOURCE_NAME',
          deploymentId: 'DEPLOYMENT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Azure OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'azureOpenAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          resourceName: 'RESOURCE_NAME',
          deploymentId: 'DEPLOYMENT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Dedicated integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingfaceDedicated',
        modelName: 'endpoint-defined-model',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          endpointName: 'ENDPOINT_NAME',
          regionName: 'REGION_NAME',
          cloudName: 'CLOUD_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Dedicated integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingfaceDedicated',
        modelName: 'endpoint-defined-model',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          endpointName: 'ENDPOINT_NAME',
          regionName: 'REGION_NAME',
          cloudName: 'CLOUD_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Dedicated integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingfaceDedicated',
        modelName: 'endpoint-defined-model',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          endpointName: 'ENDPOINT_NAME',
          regionName: 'REGION_NAME',
          cloudName: 'CLOUD_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Serverless integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingface',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Serverless integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingface',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Serverless integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingface',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Jina AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'jinaAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Jina AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'jinaAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Jina AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'jinaAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Mistral AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'mistral',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Mistral AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'mistral',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Mistral AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'mistral',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});
OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'openai',
        modelName: 'MODEL_NAME}',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          organizationId: 'ORGANIZATION_ID',
          projectId: 'PROJECT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'openai',
        modelName: 'MODEL_NAME}',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          organizationId: 'ORGANIZATION_ID',
          projectId: 'PROJECT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'openai',
        modelName: 'MODEL_NAME}',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          organizationId: 'ORGANIZATION_ID',
          projectId: 'PROJECT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Upstage integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'upstageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Upstage integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'upstageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Upstage integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'upstageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

  • Automatic type inference

  • Manually typed tables

  • Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Voyage AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'voyageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Voyage AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'voyageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Voyage AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'voyageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));
    // Define parameters for the service provider
    Map<String, Object > params = new HashMap<>();
    params.put("resourceName", "RESOURCE_NAME");
    params.put("deploymentId", "DEPLOYMENT_ID");


    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Azure OpenAI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("azureOpenAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                            .parameters(params)
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Azure OpenAI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "azureOpenAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
            parameters = {
                @KeyValue(key = "resourceName", value = "RESOURCE_NAME"),
                @KeyValue(key = "deploymentId", value = "DEPLOYMENT_ID")
            })
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object > params = new HashMap<>();
    params.put("endpointName", "ENDPOINT_NAME");
    params.put("regionName", "REGION_NAME");
    params.put("cloudName", "CLOUD_NAME");

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Hugging Face Dedicated integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("huggingfaceDedicated")
                            .modelName("endpoint-defined-model")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Hugging Face Dedicated integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "huggingfaceDedicated",
            modelName = "endpoint-defined-model",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
            parameters = {
                @KeyValue(key = "endpointName", value = "ENDPOINT_NAME"),
                @KeyValue(key = "regionName", value = "REGION_NAME"),
                @KeyValue(key = "cloudName", value = "CLOUD_NAME")
            })
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Hugging Face Serverless integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("huggingface")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Hugging Face Serverless integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "huggingface",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Jina AI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("jinaAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Jina AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "jinaAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Mistral AI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("mistral")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Mistral AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "mistral",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));
    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The NVIDIA integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .metric(SimilarityMetric.COSINE)
                    .service(
                        new VectorServiceOptions()
                            .provider("nvidia")
                            .modelName("NV-Embed-QA")
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The NVIDIA integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            provider = "nvidia",
            modelName = "NV-Embed-QA")
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}
OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object > params = new HashMap<>();
    params.put("organizationId", "ORGANIZATION_ID");
    params.put("projectId", "PROJECT_ID");

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The OpenAI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("openai")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                            .parameters(params)
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The OpenAI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "openai",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
            parameters = {
                @KeyValue(key = "organizationId", value = "ORGANIZATION_ID"),
                @KeyValue(key = "projectId", value = "PROJECT_ID")
            })
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Upstage integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("upstageAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Upstage integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "upstageAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

  • Use a generic type

  • Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Voyage AI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("voyageAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Voyage AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "voyageAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Azure OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "azureOpenAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "resourceName": "RESOURCE_NAME",
              "deploymentId": "DEPLOYMENT_ID"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Dedicated integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingfaceDedicated",
            "modelName": "endpoint-defined-model",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "endpointName": "ENDPOINT_NAME",
              "regionName": "REGION_NAME",
              "cloudName": "CLOUD_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Serverless integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingface",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Jina AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "jinaAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Mistral AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "mistral",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "service": {
            "provider": "nvidia",
            "modelName": "NV-Embed-QA"
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'
OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "openai",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "organizationId": "ORGANIZATION_ID",
              "projectId": "PROJECT_ID"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Upstage integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "upstageAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Voyage AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "voyageAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • INDEX_NAME: The name for the index.

  • SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Client reference

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the client reference.

For more information, see the client reference.

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com