Create a table

Tables with the Data API are currently in public preview. Development is ongoing, and the features and functionality are subject to change. Astra DB Serverless, and the use of such, is subject to the DataStax Preview Terms.

Creates a new table in a keyspace in a database.

After you create a table, index columns that you want to sort or filter. This optimizes your queries and avoids resource intensive, long running allow filtering operations. All indexed column names must use snake case, not camel case.

You can also modify the table columns later. To add data to your table, insert rows.

Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart.

Result

Python
TypeScript
Java
curl

Creates a table with the specified parameters.

Returns a Table object. You can use this object to work with rows in the table.

Unless you specify the row_type parameter, the table is typed as Table[dict].

For more information, see Typing support.

Creates a table with the specified parameters.

Returns a promise that resolves to a <Table<Schema, PKey>> object. You can use this object to work with rows in the table.

Unless you specify the Schema, the table is typed as Table<Record<string, any>>.

Creates a table with the specified parameters.

Returns a Table<T> object. You can use this object to work with rows in the table.

Unless you specify the rowClass parameter, the table is typed as Table<Row>.

Creates a table with the specified parameters.

If the command succeeds, the response indicates the success.

Example response:

{
  "status": {
    "ok": 1
  }
}

Parameters

When you create a table, you specify the following:

Table name
Column names and data types
Primary keys in tables, which is the unique identifier for the rows in the table
Additional table, command, or client-specific settings, which can be optional

Python
TypeScript
Java
curl

Use the create_table method, which belongs to the astrapy.Database class.

Method signature

create_table(
  name: str,
  *,
  definition: CreateTableDefinition | dict[str, Any],
  row_type: type[Any],
  keyspace: str,
  if_not_exists: bool,
  table_admin_timeout_ms: int,
  request_timeout_ms: int,
  timeout_ms: int,
  embedding_api_key: str | EmbeddingHeadersProvider,
  spawn_api_options: APIOptions,
) -> Table[ROW]

Name Type Summary

name

str

The name of the table.

Table names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

definition

CreateTableDefinition | dict

A complete table definition for the table, including the column names, data types, other column settings, and the primary key. All indexed column names must use snake case, not camel case.

This can be an instance of CreateTableDefinition or an equivalent nested dictionary, in which case it is parsed into a CreateTableDefinition. For examples of both formats, see the examples for this command.

Some types require specific column definitions, particularly maps, lists, sets, and vector columns. For more information about all types, see Data types in tables.

row_type

type

This parameter acts a formal specifier for the type checker. If omitted, the resulting Table is implicitly a Table[dict]. If provided, row_type must match the type hint specified in the assignment. For more information, see Typing support.

keyspace

str | None

Optional. The keyspace where you want to create the table.

Default: The database’s working keyspace.

if_not_exists

bool | None

If True, the command doesn’t throw an error if a table with the given name already exists. In this case, the command silently does nothing and no actual table creation takes place on the database.

If False (default), an error occurs if a table with the specified name already exists in the database.

if_not_exists: True, does not check the definition of any existing tables. This parameter checks table names only.

This means that the command succeeds if the given table name is already in use, even if the table definition is different.

table_admin_timeout_ms

int | None

A timeout, in milliseconds, to impose on the underlying API request. If not provided, the corresponding Database defaults apply. This parameter is aliased as request_timeout_ms and timeout_ms for convenience.

embedding_api_key

str | EmbeddingHeadersProvider

Optional parameter for tables that have a vector column with a vectorize embedding provider integration. For more information, see Define a column to automatically generate vector embeddings.

As an alternative to Astra DB KMS authentication, use embedding_api_key to store one or more embedding provider API keys on the Table instance for vectorize header authentication. The client automatically passes the key as an X-embedding-api-key header with all table operations.

Most embedding provider integrations accept a plain string for header authentication. However, some vectorize providers and models require specialized subclasses of EmbeddingHeadersProvider, such as AWSEmbeddingHeadersProvider, for header authentication.

You can use this authentication method only if all affected columns use the same embedding provider.

spawn_api_options

APIOptions

A complete or partial specification of the APIOptions to override the defaults inherited from the Database. This allows for nuanced table configuration. For example, if APIOptions is passed together with named timeout parameters, the latter take precedence in their respective settings.

Use the createTable method, which belongs to the Db class.

Method signature

async createTable<const Def extends CreateTableDefinition>(
  name: string,
  options: {
    definition: CreateTableDefinition,
    ifNotExists: boolean,
    embeddingApiKey?: string | EmbeddingHeadersProvider,
    logging?: DataAPILoggingConfig,
    serdes?: TableSerDesConfig,
    timeoutDefaults?: Partial<TimeoutDescriptor>,
    keyspace?: string,

  }
): Table<InferTableSchema<Def>, InferTablePrimaryKey<Def>>

Parameters:

Name Type Summary

Name	Type	Summary
`name`	`string`	The name of the table. Table names must follow these rules: Can contain letters, numbers, and underscores Cannot exceed 48 characters Must be unique within the keyspace
`option?`	`CreateTableOptions`	The options for spawning the `Table` instance.

name

string

The name of the table.

Table names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

option?

CreateTableOptions

The options for spawning the Table instance.

Options (CreateTableOptions<Schema>):

Name Type Summary

definition

CreateTableDefinition

A TypeScript object defining the table to create, including the following:

columns: An object defining the table’s columns as a series of key-value pairs where each key is a column name and each value is the column’s data type. Column names must be unique within a table. All indexed column names must use snake case, not camel case.

The Data API accepts column definitions in two formats:
```
"columns": {
  "COLUMN_NAME": "DATA_TYPE",
  "COLUMN_NAME": {
    "type": "DATA_TYPE"
  }
}
```
Data types are enums of supported data types, such as 'text', 'int', or 'boolean'.

For 'map', 'list', and 'set', types, you must use the object format and provide additional options. For more information, see Map, list, and set types.

For the 'vector' type, you must use the object format and provide information about the stored vectors, such as dimension and service options. For more information, see Vector type.
primaryKey: The table’s primary key definition as a single string or an object containing partition keys. For more information, see Primary keys in tables.

ifNotExists

boolean

If true, the command doesn’t throw an error if a table with the given name already exists. In this case, the command silently does nothing and no actual table creation takes place on the database.

If false (default), an error occurs if a table with the specified name already exists in the database.

ifNotExists: true, does not check the schema of any existing tables. This parameter checks table names only.

This means that the command succeeds if the given table name is already in use, even if the schema is different.

keyspace?

string

Optional. The keyspace where you want to create the table.

Default: The database’s working keyspace.

embeddingApiKey?

string | EmbeddingHeadersProvider

Optional parameter for tables that have a vector column with a vectorize embedding provider integration. For more information, see Define a column to automatically generate vector embeddings.

As an alternative to Astra DB KMS authentication, use embeddingApiKey to store an embedding provider API key on the Table instance for vectorize header authentication. The client automatically passes the key as an X-embedding-api-key header with operations that use vectorize.

You can use this authentication method only if all affected columns use the same embedding provider.

logging?

DataAPILoggingConfig

The configuration for logging events emitted by the DataAPIClient.

timeoutDefaults?

Partial<TimeoutDescriptor>

The default timeout options for any operation performed on this Table instance. For more information, see TimeoutDescriptor.

serdes?

TableSerDesConfig

Lower-level serialization/deserialization configuration for this table. For more information, see Custom Ser/Des.

Use the createTable method, which belongs to the com.datastax.astra.client.databases.Database class.

Method signature

<T> Table<T> createTable(
  String tableName,
  TableDefinition tableDefinition,
  Class<T> rowClass,
  CreateTableOptions createTableOptions
)

<T> Table<T> createTable(
  String tableName,
  TableDefinition tableDefinition,
  Class<T> rowClass
)

<T> Table<T> createTable(Class<T> rowClass)

<T> Table<T> createTable(
  Class<T> rowClass,
  CreateTableOptions createTableOptions
)

<T> Table<T> createTable(
  String tableName,
  Class<T> rowClass,
  CreateTableOptions createTableOptions
)

Table<Row> createTable(
  String tableName,
  TableDefinition tableDefinition,
  CreateTableOptions options
)

Table<Row> createTable(
  String tableName,
  TableDefinition tableDefinition
)

Name Type Summary

name

String

The name of the table.

Table names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

definition

TableDefinition

A complete table definition for the table, including the column names, data types, other column settings, and the primary key. All indexed column names must use snake case, not camel case.

Some types require specific column definitions, particularly maps, lists, sets, and vector columns. For more information about all types, see Data types in tables.

rowClass

Class<?>

An optional specification of the class of the table’s row object. If not provided, the default is Row, which is close to a Map object.

createTableOptions

CreateTableOptions

Options and additional parameters for the createTable operation, such as ifNotExists, timeout, and embeddingAuthProvider:

ifNotExists(): If true, the command doesn’t throw an error if a table with the given name already exists. In this case, the command silently does nothing and no actual table creation takes place on the database.

If false (default), an error occurs if a table with the specified name already exists in the database.

ifNotExists(true), does not check the definition of any existing tables. This parameter checks table names only.

This means that the command succeeds if the given table name is already in use, even if the table definition is different.

timeout(): A timeout, in milliseconds, to impose on the underlying API request.
embeddingAuthProvider(): Optional parameter for tables that have a vector column with a vectorize embedding provider integration. For more information, see Define a column to automatically generate vector embeddings.

As an alternative to Astra DB KMS authentication, use embeddingAuthProvider to store an embedding provider API key on the Table instance for vectorize header authentication. The client automatically passes the key as an X-embedding-api-key header with operations that use vectorize.

Most embedding provider integrations accept a plain string for header authentication. However, some vectorize providers and models require specialized subclasses of EmbeddingHeadersProvider for header authentication.

You can use this authentication method only if all affected columns use the same embedding provider.

Use the createTable command.

Command signature

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        "COLUMN_NAME": "DATA_TYPE",
        "COLUMN_NAME": "DATA_TYPE"
      },
      "primaryKey": "PRIMARY_KEY_DEFINITION"
    }
  }
}'

Name Type Summary

Name	Type	Summary
`createTable`	`command`	The Data API command to create a table in a database. It acts as a container for all the attributes and settings required to create the table.
`name`	`string`	The name of the table. Table names must follow these rules: Can contain letters, numbers, and underscores Cannot exceed 48 characters Must be unique within the keyspace
`definition`	`object`	Contains the columns and primary key definition for the table.
`definition.columns`	`object`	Defines the table’s columns as a series of key-value pairs where each key is a column name and each value is the column’s data type. Column names must be unique within a table. All indexed column names must use snake case, not camel case. The Data API accepts column definitions in two formats: `"columns": { "COLUMN_NAME": "DATA_TYPE", "COLUMN_NAME": { "type": "DATA_TYPE" } }` Data types are enums of supported data types, such as `"text"`, `"int"`, or `"boolean"`. For `map`, `list`, and `set`, types, you must use the object format and provide additional options. For more information, see Map, list, and set types. For the `vector` type, you must use the object format and provide information about the stored vectors, such as `dimension` and `service` options. For more information, see Vector type.
`definition.primaryKey`	`string` or `object`	Defines the primary key for the table. For more information, see Primary keys in tables.

createTable

command

The Data API command to create a table in a database. It acts as a container for all the attributes and settings required to create the table.

name

string

The name of the table.

Table names must follow these rules:

Can contain letters, numbers, and underscores
Cannot exceed 48 characters
Must be unique within the keyspace

definition

object

Contains the columns and primary key definition for the table.

definition.columns

object

Defines the table’s columns as a series of key-value pairs where each key is a column name and each value is the column’s data type. Column names must be unique within a table. All indexed column names must use snake case, not camel case.

The Data API accepts column definitions in two formats:

"columns": {
  "COLUMN_NAME": "DATA_TYPE",
  "COLUMN_NAME": {
    "type": "DATA_TYPE"
  }
}

Data types are enums of supported data types, such as "text", "int", or "boolean".

For map, list, and set, types, you must use the object format and provide additional options. For more information, see Map, list, and set types.

For the vector type, you must use the object format and provide information about the stored vectors, such as dimension and service options. For more information, see Vector type.

definition.primaryKey

string or object

Defines the primary key for the table. For more information, see Primary keys in tables.

Examples

The following examples demonstrate how to create a table.

Create a table with a single-column primary key

A single-column primary key is a primary key consisting of one column. For more information, see Primary keys in tables.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

CreateTableDefinition object
Fluent interface
Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableKeyValuedColumnType,
    TableKeyValuedColumnTypeDescriptor,
    TableScalarColumnTypeDescriptor,
    TableValuedColumnTypeDescriptor,
    TableValuedColumnType,
    TablePrimaryKeyDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "title": TableScalarColumnTypeDescriptor(column_type=ColumnType.TEXT),
        "number_of_pages": TableScalarColumnTypeDescriptor(column_type=ColumnType.INT),
        "rating": TableScalarColumnTypeDescriptor(column_type=ColumnType.FLOAT),
        "genres": TableValuedColumnTypeDescriptor(
            column_type=TableValuedColumnType.SET,
            value_type=ColumnType.TEXT,
        ),
        "metadata": TableKeyValuedColumnTypeDescriptor(
            column_type=TableKeyValuedColumnType.MAP,
            key_type=ColumnType.TEXT,
            value_type=ColumnType.TEXT,
        ),
        "is_checked_out": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.BOOLEAN
        ),
        "due_date": TableScalarColumnTypeDescriptor(column_type=ColumnType.DATE),
    },
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    primary_key=TablePrimaryKeyDescriptor(partition_by=["title"], partition_sort={}),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_column("title", ColumnType.TEXT)
    .add_column("number_of_pages", ColumnType.INT)
    .add_column("rating", ColumnType.FLOAT)
    .add_set_column(
        "genres",
        ColumnType.TEXT,
    )
    .add_map_column(
        "metadata",
        # This is the key type for the map column
        ColumnType.TEXT,
        # This is the value type for the map column
        ColumnType.TEXT,
    )
    .add_column("is_checked_out", ColumnType.BOOLEAN)
    .add_column("due_date", ColumnType.DATE)
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    .add_partition_by(["title"])
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "title": {"type": "text"},
        "number_of_pages": {"type": "int"},
        "rating": {"type": "float"},
        "genres": {"type": "set", "valueType": "text"},
        "metadata": {"type": "map", "keyType": "text", "valueType": "text"},
        "is_checked_out": {"type": "boolean"},
        "due_date": {"type": "date"},
    },
    "primaryKey": {
        "partitionBy": ["title"],
        "partitionSort": {},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

Automatic type inference
Manually typed tables
Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

To do this, first create the table definition. Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key. To create the table, provide the table definition and the inferred types to the createTable method.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["title"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["title"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  title: string;
  number_of_pages?: number | null | undefined;
  rating?: number | null | undefined;
  genres?: Set<string> | undefined;
  metadata?: Map<string, string> | undefined;
  is_checked_out?: boolean | null | undefined;
  due_date?: DataAPIDate | null | undefined;
};

type TablePrimaryKey = Pick<TableSchema, "title">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["title"],
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>("example_table", {
    definition: tableDefinition,
  });
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

Use a generic type
Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {
  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnText("title")
            .addColumnInt("number_of_pages")
            .addColumn("rating", TableColumnTypes.FLOAT)
            .addColumnSet("genres", TableColumnTypes.TEXT)
            .addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
            .addColumnBoolean("is_checked_out")
            .addColumn("due_date", TableColumnTypes.DATE)
            // Define the primary key for the table.
            // In this case, the table uses a single-column primary key.
            .addPartitionBy("title");

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import java.util.Date;
import java.util.Map;
import java.util.Set;
import lombok.Data;

public class Example {
  @EntityTable("example_table")
  @Data
  public class Book {
    @PartitionBy(0)
    @Column(name = "title", type = TableColumnTypes.TEXT)
    private String title;

    @Column(name = "number_of_pages", type = TableColumnTypes.INT)
    private Integer number_of_pages;

    @Column(name = "rating", type = TableColumnTypes.FLOAT)
    private Float rating;

    @Column(name = "genres", type = TableColumnTypes.SET, valueType = TableColumnTypes.TEXT)
    private Set<String> genres;

    @Column(
        name = "metadata",
        type = TableColumnTypes.MAP,
        keyType = TableColumnTypes.TEXT,
        valueType = TableColumnTypes.TEXT)
    private Map<String, String> metadata;

    @Column(name = "is_checked_out", type = TableColumnTypes.BOOLEAN)
    private Boolean is_checked_out;

    @Column(name = "due_date", type = TableColumnTypes.DATE)
    private Date due_date;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "title": {
          "type": "text"
        },
        "number_of_pages": {
          "type": "int"
        },
        "rating": {
          "type": "float"
        },
        "metadata": {
          "type": "map",
          "keyType": "text",
          "valueType": "text"
        },
        "genres": {
          "type": "set",
          "valueType": "text"
        },
        "is_checked_out": {
          "type": "boolean"
        },
        "due_date": {
          "type": "date"
        }
      },
      "primaryKey": "title"
    }
  }
}'

Create a table with a composite primary key

A composite primary key is a primary key consisting of multiple columns. For more information, see Primary keys in tables.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

CreateTableDefinition object
Fluent interface
Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableKeyValuedColumnType,
    TableKeyValuedColumnTypeDescriptor,
    TableScalarColumnTypeDescriptor,
    TableValuedColumnTypeDescriptor,
    TableValuedColumnType,
    TablePrimaryKeyDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "title": TableScalarColumnTypeDescriptor(column_type=ColumnType.TEXT),
        "number_of_pages": TableScalarColumnTypeDescriptor(column_type=ColumnType.INT),
        "rating": TableScalarColumnTypeDescriptor(column_type=ColumnType.FLOAT),
        "genres": TableValuedColumnTypeDescriptor(
            column_type=TableValuedColumnType.SET,
            value_type=ColumnType.TEXT,
        ),
        "metadata": TableKeyValuedColumnTypeDescriptor(
            column_type=TableKeyValuedColumnType.MAP,
            key_type=ColumnType.TEXT,
            value_type=ColumnType.TEXT,
        ),
        "is_checked_out": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.BOOLEAN
        ),
        "due_date": TableScalarColumnTypeDescriptor(column_type=ColumnType.DATE),
    },
    # Define the primary key for the table.
    # In this case, the table uses a composite primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["title", "rating"], partition_sort={}
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_column("title", ColumnType.TEXT)
    .add_column("number_of_pages", ColumnType.INT)
    .add_column("rating", ColumnType.FLOAT)
    .add_set_column(
        "genres",
        ColumnType.TEXT,
    )
    .add_map_column(
        "metadata",
        # This is the key type for the map column
        ColumnType.TEXT,
        # This is the value type for the map column
        ColumnType.TEXT,
    )
    .add_column("is_checked_out", ColumnType.BOOLEAN)
    .add_column("due_date", ColumnType.DATE)
    # Define the primary key for the table.
    # In this case, the table uses a composite primary key.
    .add_partition_by(["title", "rating"])
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "title": {"type": "text"},
        "number_of_pages": {"type": "int"},
        "rating": {"type": "float"},
        "genres": {"type": "set", "valueType": "text"},
        "metadata": {"type": "map", "keyType": "text", "valueType": "text"},
        "is_checked_out": {"type": "boolean"},
        "due_date": {"type": "date"},
    },
    "primaryKey": {
        "partitionBy": ["title", "rating"],
        "partitionSort": {},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

Automatic type inference
Manually typed tables
Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a composite primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a composite primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  title: string;
  number_of_pages?: number | null | undefined;
  rating?: number | null | undefined;
  genres?: Set<string> | undefined;
  metadata?: Map<string, string> | undefined;
  is_checked_out?: boolean | null | undefined;
  due_date?: DataAPIDate | null | undefined;
};

type TablePrimaryKey = Pick<TableSchema, "title" | "rating">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a composite primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>("example_table", {
    definition: tableDefinition,
  });
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

Use a generic type
Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {
  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnText("title")
            .addColumnInt("number_of_pages")
            .addColumn("rating", TableColumnTypes.FLOAT)
            .addColumnSet("genres", TableColumnTypes.TEXT)
            .addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
            .addColumnBoolean("is_checked_out")
            .addColumn("due_date", TableColumnTypes.DATE)
            // Define the primary key for the table.
            // In this case, the table uses a composite primary key.
            .addPartitionBy("title")
            .addPartitionBy("rating");

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import java.util.Date;
import java.util.Map;
import java.util.Set;
import lombok.Data;

public class Example {
  @EntityTable("example_table")
  @Data
  public class Book {
    @PartitionBy(0)
    @Column(name = "title", type = TableColumnTypes.TEXT)
    private String title;

    @Column(name = "number_of_pages", type = TableColumnTypes.INT)
    private Integer number_of_pages;

    @PartitionBy(1)
    @Column(name = "rating", type = TableColumnTypes.FLOAT)
    private Float rating;

    @Column(name = "genres", type = TableColumnTypes.SET, valueType = TableColumnTypes.TEXT)
    private Set<String> genres;

    @Column(
        name = "metadata",
        type = TableColumnTypes.MAP,
        keyType = TableColumnTypes.TEXT,
        valueType = TableColumnTypes.TEXT)
    private Map<String, String> metadata;

    @Column(name = "is_checked_out", type = TableColumnTypes.BOOLEAN)
    private Boolean is_checked_out;

    @Column(name = "due_date", type = TableColumnTypes.DATE)
    private Date due_date;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "title": {
          "type": "text"
        },
        "number_of_pages": {
          "type": "int"
        },
        "rating": {
          "type": "float"
        },
        "metadata": {
          "type": "map",
          "keyType": "text",
          "valueType": "text"
        },
        "genres": {
          "type": "set",
          "valueType": "text"
        },
        "is_checked_out": {
          "type": "boolean"
        },
        "due_date": {
          "type": "date"
        }
      },
      "primaryKey": {
        "partitionBy": [
          "title", "rating"
        ]
      }
    }
  }
}'

Create a table with a compound primary key

A compound primary key is a primary key consisting of partition (grouping) columns and clustering (sorting) columns. For more information, see Primary keys in tables.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

CreateTableDefinition object
Fluent interface
Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.constants import SortMode
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableKeyValuedColumnType,
    TableKeyValuedColumnTypeDescriptor,
    TableScalarColumnTypeDescriptor,
    TableValuedColumnTypeDescriptor,
    TableValuedColumnType,
    TablePrimaryKeyDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "title": TableScalarColumnTypeDescriptor(column_type=ColumnType.TEXT),
        "number_of_pages": TableScalarColumnTypeDescriptor(column_type=ColumnType.INT),
        "rating": TableScalarColumnTypeDescriptor(column_type=ColumnType.FLOAT),
        "genres": TableValuedColumnTypeDescriptor(
            column_type=TableValuedColumnType.SET,
            value_type=ColumnType.TEXT,
        ),
        "metadata": TableKeyValuedColumnTypeDescriptor(
            column_type=TableKeyValuedColumnType.MAP,
            key_type=ColumnType.TEXT,
            value_type=ColumnType.TEXT,
        ),
        "is_checked_out": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.BOOLEAN
        ),
        "due_date": TableScalarColumnTypeDescriptor(column_type=ColumnType.DATE),
    },
    # Define the primary key for the table.
    # In this case, the table uses a compound primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["title", "rating"],
        partition_sort={
            "number_of_pages": SortMode.ASCENDING,
            "is_checked_out": SortMode.DESCENDING,
        },
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.constants import SortMode
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_column("title", ColumnType.TEXT)
    .add_column("number_of_pages", ColumnType.INT)
    .add_column("rating", ColumnType.FLOAT)
    .add_set_column(
        "genres",
        ColumnType.TEXT,
    )
    .add_map_column(
        "metadata",
        # This is the key type for the map column
        ColumnType.TEXT,
        # This is the value type for the map column
        ColumnType.TEXT,
    )
    .add_column("is_checked_out", ColumnType.BOOLEAN)
    .add_column("due_date", ColumnType.DATE)
    # Define the primary key for the table.
    # In this case, the table uses a compound primary key.
    .add_partition_by(["title", "rating"])
    .add_partition_sort(
        {
            "number_of_pages": SortMode.ASCENDING,
            "is_checked_out": SortMode.DESCENDING,
        }
    )
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "title": {"type": "text"},
        "number_of_pages": {"type": "int"},
        "rating": {"type": "float"},
        "genres": {"type": "set", "valueType": "text"},
        "metadata": {"type": "map", "keyType": "text", "valueType": "text"},
        "is_checked_out": {"type": "boolean"},
        "due_date": {"type": "date"},
    },
    "primaryKey": {
        "partitionBy": ["title", "rating"],
        "partitionSort": {"number_of_pages": 1, "is_checked_out": -1},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

Automatic type inference
Manually typed tables
Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a compound primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
    partitionSort: { number_of_pages: 1, is_checked_out: -1 },
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a compound primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
    partitionSort: { number_of_pages: 1, is_checked_out: -1 },
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  title: string;
  number_of_pages?: number | null | undefined;
  rating?: number | null | undefined;
  genres?: Set<string> | undefined;
  metadata?: Map<string, string> | undefined;
  is_checked_out?: boolean | null | undefined;
  due_date?: DataAPIDate | null | undefined;
};

type TablePrimaryKey = Pick<TableSchema, "title" | "rating">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can use the TableSchema type as you would any other type. For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:

  const row: TableSchema = {
    title: "Wind with No Name",
    number_of_pages: 193,
    bad_field: "I will error",
  };

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    title: "text",
    number_of_pages: "int",
    rating: "float",
    genres: { type: "set", valueType: "text" },
    metadata: {
      type: "map",
      keyType: "text",
      valueType: "text",
    },
    is_checked_out: "boolean",
    due_date: "date",
  },
  // Define the primary key for the table.
  // In this case, the table uses a compound primary key.
  primaryKey: {
    partitionBy: ["title", "rating"],
    partitionSort: { number_of_pages: 1, is_checked_out: -1 },
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>("example_table", {
    definition: tableDefinition,
  });
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

Use a generic type
Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

import static com.datastax.astra.client.core.query.Sort.ascending;
import static com.datastax.astra.client.core.query.Sort.descending;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {
  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnText("title")
            .addColumnInt("number_of_pages")
            .addColumn("rating", TableColumnTypes.FLOAT)
            .addColumnSet("genres", TableColumnTypes.TEXT)
            .addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
            .addColumnBoolean("is_checked_out")
            .addColumn("due_date", TableColumnTypes.DATE)
            // Define the primary key for the table.
            // In this case, the table uses a compound primary key.
            .addPartitionBy("title")
            .addPartitionBy("rating")
            .addPartitionSort(ascending("number_of_pages"))
            .addPartitionSort(descending("is_checked_out"));

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.query.SortOrder;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import com.datastax.astra.client.tables.mapping.PartitionSort;
import java.util.Date;
import java.util.Map;
import java.util.Set;
import lombok.Data;

public class Example {
  @EntityTable("example_table")
  @Data
  public class Book {
    @PartitionBy(0)
    @Column(name = "title", type = TableColumnTypes.TEXT)
    private String title;

    @PartitionSort(position = 0, order = SortOrder.ASCENDING)
    @Column(name = "number_of_pages", type = TableColumnTypes.INT)
    private Integer number_of_pages;

    @PartitionBy(1)
    @Column(name = "rating", type = TableColumnTypes.FLOAT)
    private Float rating;

    @Column(name = "genres", type = TableColumnTypes.SET, valueType = TableColumnTypes.TEXT)
    private Set<String> genres;

    @Column(
        name = "metadata",
        type = TableColumnTypes.MAP,
        keyType = TableColumnTypes.TEXT,
        valueType = TableColumnTypes.TEXT)
    private Map<String, String> metadata;

    @PartitionSort(position = 1, order = SortOrder.DESCENDING)
    @Column(name = "is_checked_out", type = TableColumnTypes.BOOLEAN)
    private Boolean is_checked_out;

    @Column(name = "due_date", type = TableColumnTypes.DATE)
    private Date due_date;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "title": {
          "type": "text"
        },
        "number_of_pages": {
          "type": "int"
        },
        "rating": {
          "type": "float"
        },
        "metadata": {
          "type": "map",
          "keyType": "text",
          "valueType": "text"
        },
        "genres": {
          "type": "set",
          "valueType": "text"
        },
        "is_checked_out": {
          "type": "boolean"
        },
        "due_date": {
          "type": "date"
        }
      },
      "primaryKey": {
        "partitionBy": [
          "title",
          "rating"
        ],
        "partitionSort": {
          "number_of_pages": 1,
          "is_checked_out": -1
        }
      }
    }
  }
}'

Create a table with a column to store vector embeddings

If you want to store pre-generated vector embeddings in a table, create a table with a vector column. A table can include more than one vector column.

Python
TypeScript
Java
curl

The Python client supports multiple ways to create a table. In all cases, you must define the table schema, and then pass the definition to the create_table method.

CreateTableDefinition object
Fluent interface
Dictionary

You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.

from astrapy import DataAPIClient
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
)

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = CreateTableDefinition(
    # Define all of the columns in the table
    columns={
        "example_vector": TableVectorColumnTypeDescriptor(
            dimension=1024,
        ),
        "example_non_vector": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["example_non_vector"], partition_sort={}
    ),
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can use a fluent interface to build the table definition and then create the table from the definition.

from astrapy import DataAPIClient
from astrapy.info import CreateTableDefinition, ColumnType

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

table_definition = (
    CreateTableDefinition.builder()
    # Define all of the columns in the table
    .add_vector_column("example_vector", dimension=1024)
    .add_column("example_non_vector", ColumnType.TEXT)
    # Define the primary key for the table.
    # In this case, the table uses a single-column primary key.
    .add_partition_by(["example_non_vector"])
    # Finally, build the table definition.
    .build()
)

table = database.create_table(
    "example_table",
    definition=table_definition,
)

You can define the table as a dictionary and then build the table from the dictionary.

from astrapy import DataAPIClient

# Get an existing database
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        "example_vector": {"type": "vector", "dimension": 1024},
        "example_non_vector": {"type": "text"},
    },
    "primaryKey": {
        "partitionBy": ["example_non_vector"],
        "partitionSort": {},
    },
}

table = database.create_table(
    "example_table",
    definition=table_definition,
)

The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.

For more information, see Collection and table typing.

Automatic type inference
Manually typed tables
Untyped tables

The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    example_vector: { type: "vector", dimension: 1024 },
    example_non_vector: "text",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["example_non_vector"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  // Provide the types and the definition
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

You can manually define the type for your table’s schema and primary key. To create the table, provide the table definition and the types to the createTable method.

This may be necessary if you modify the table’s default ser/des configuration.

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    example_vector: { type: "vector", dimension: 1024 },
    example_non_vector: "text",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["example_non_vector"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  example_vector: DataAPIVector;
  example_non_vector: string;
};

type TablePrimaryKey = Pick<TableSchema, "example_non_vector">;

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    "example_table",
    { definition: tableDefinition },
  );
})();

To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method. This types the table’s rows as Record<string, any>.

This is the most flexible but least type-safe option.

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Get an existing database
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");

const tableDefinition = Table.schema({
  // Define all of the columns in the table
  columns: {
    example_vector: { type: "vector", dimension: 1024 },
    example_non_vector: "text",
  },
  // Define the primary key for the table.
  // In this case, the table uses a single-column primary key.
  primaryKey: {
    partitionBy: ["example_non_vector"],
  },
});

(async function () {
  // Provide the types and the definition to create the table
  const table = await database.createTable<SomeRow>("example_table", {
    definition: tableDefinition,
  });
})();

The Java client supports multiple ways to create a table. In all cases, you must define the table schema.

Use a generic type
Define the row type

If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type. In this case, the working object type T is Row.class.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {
  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    TableDefinition tableDefinition =
        new TableDefinition()
            // Define all of the columns in the table
            .addColumnVector(
                "example_vector",
                new TableColumnDefinitionVector().dimension(1024).metric(SimilarityMetric.COSINE))
            .addColumnText("example_non_vector")
            // Define the primary key for the table.
            // In this case, the table uses a single-column primary key.
            .addPartitionBy("example_non_vector");

    Table<Row> table = database.createTable("example_table", tableDefinition);
  }
}

Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.

This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.

The following example defines a Book class and then uses it to create the table.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
  @EntityTable("example_table")
  @Data
  public class Book {
    @ColumnVector(name = "example_vector", dimension = 1024, metric = SimilarityMetric.COSINE)
    private DataAPIVector vector;

    @PartitionBy(0)
    @Column(name = "example_non_vector", type = TableColumnTypes.TEXT)
    private String exampleNonVector;
  }

  public static void main(String[] args) {
    // Get an existing database
    Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");

    Table<Book> table = database.createTable(Book.class);
  }
}

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYS PACE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "example_table",
    "definition": {
      "columns": {
        "example_vector": {
          "type": "vector",
          "dimension": 1024
        },
        "example_non_vector": {
          "type": "text"
        }
      },
      "primaryKey": "example_non_vector"
    }
  }
}'

Create a table with a column to automatically generate vector embeddings

If you want to automatically generate vector embeddings, create a table with a vector column and configure an embedding provider integration for the column.

The configuration depends on the embedding provider.

You can also configure an embedding provider integration after table creation. For more information, see Alter a table.

If you want to store the original text in addition to the vector embeddings that were generated from the text, then you need to create a separate column to store the text.

You can configure a different embedding provider for each vector column in the table. If you want to use the same embedding provider for all vector columns in the table, you must still configure the embedding provider for each vector column.

Python
TypeScript
Java
curl

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Azure OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="azureOpenAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "resourceName": "RESOURCE_NAME",
                    "deploymentId": "DEPLOYMENT_ID",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Azure OpenAI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="azureOpenAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "resourceName": "RESOURCE_NAME",
                "deploymentId": "DEPLOYMENT_ID",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Azure OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "azureOpenAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
            "parameters": {
                "resourceName": "RESOURCE_NAME",
                "deploymentId": "DEPLOYMENT_ID",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Hugging Face Dedicated integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="huggingfaceDedicated",
                model_name="endpoint-defined-model",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "endpointName": "ENDPOINT_NAME",
                    "regionName": "REGION_NAME",
                    "cloudName": "CLOUD_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Hugging Face Dedicated integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="huggingfaceDedicated",
            model_name="endpoint-defined-model",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "endpointName": "ENDPOINT_NAME",
                "regionName": "REGION_NAME",
                "cloudName": "CLOUD_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Dedicated integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingfaceDedicated",
            "model_name": "endpoint-defined-model",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
            "parameters": {
                "endpointName": "ENDPOINT_NAME",
                "regionName": "REGION_NAME",
                "cloudName": "CLOUD_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Hugging Face Serverless integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="huggingface",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Hugging Face Serverless integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="huggingface",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Serverless integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingface",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Jina AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="jinaAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Jina AI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="jinaAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Jina AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "jinaAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Mistral AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="mistral",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Mistral AI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="mistral",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Mistral AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "mistral",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            service=VectorServiceOptions(
                provider="nvidia",
                model_name="NV-Embed-QA",
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The NVIDIA integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        service=VectorServiceOptions(
            provider="nvidia",
            model_name="NV-Embed-QA",
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "service": {
            "provider": "nvidia",
            "model_name": "NV-Embed-QA",
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="openai",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "organizationId": "ORGANIZATION_ID",
                    "projectId": "PROJECT_ID",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The OpenAI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="openai",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
            parameters={
                "organizationId": "ORGANIZATION_ID",
                "projectId": "PROJECT_ID",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "openai",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
            "parameters": {
                "organizationId": "ORGANIZATION_ID",
                "projectId": "PROJECT_ID",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Upstage integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="upstageAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Upstage integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="upstageAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Upstage integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "upstageAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

CollectionDefinition object
Fluent interface
Dictionary

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The Voyage AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            dimension=MODEL_DIMENSIONS,
            service=VectorServiceOptions(
                provider="voyageAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = (
    CreateTableDefinition.builder()
    # This column will store vector embeddings.
    # The Voyage AI integration
    # will automatically generate vector embeddings
    # for any text inserted to this column.
    .add_vector_column("VECTOR_COLUMN_NAME",
        dimension=MODEL_DIMENSIONS,
        service=VectorServiceOptions(
            provider="voyageAI",
            model_name="MODEL_NAME",
            authentication={
                "providerKey": "API_KEY_NAME",
            },
        ),
    )
    # If you want to store the original text
    # in addition to the generated embeddings
    # you must create a separate column.
    .add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
    # You should change the primary key definition to meet the needs of your data.
    .add_partition_by(["TEXT_COLUMN_NAME"])
    # Finally, build the table definition.
    .build()
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

import os
from astrapy import DataAPIClient

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = {
    "columns": {
        # This column will store vector embeddings.
        # The Voyage AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "voyageAI",
            "model_name": "MODEL_NAME",
            "authentication": {
                "providerKey": "API_KEY_NAME",
            },
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": {"type": "text"},
    },
    # You should change the primary key definition to meet the needs of your data.
    "primaryKey": {
        "partitionBy": ["TEXT_COLUMN_NAME"],
        "partitionSort": {},
    },
}

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Azure OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'azureOpenAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          resourceName: 'RESOURCE_NAME',
          deploymentId: 'DEPLOYMENT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Azure OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'azureOpenAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          resourceName: 'RESOURCE_NAME',
          deploymentId: 'DEPLOYMENT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Azure OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'azureOpenAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          resourceName: 'RESOURCE_NAME',
          deploymentId: 'DEPLOYMENT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Dedicated integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingfaceDedicated',
        modelName: 'endpoint-defined-model',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          endpointName: 'ENDPOINT_NAME',
          regionName: 'REGION_NAME',
          cloudName: 'CLOUD_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Dedicated integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingfaceDedicated',
        modelName: 'endpoint-defined-model',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          endpointName: 'ENDPOINT_NAME',
          regionName: 'REGION_NAME',
          cloudName: 'CLOUD_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Dedicated integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingfaceDedicated',
        modelName: 'endpoint-defined-model',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          endpointName: 'ENDPOINT_NAME',
          regionName: 'REGION_NAME',
          cloudName: 'CLOUD_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Serverless integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingface',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Serverless integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingface',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Hugging Face Serverless integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'huggingface',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Jina AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'jinaAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Jina AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'jinaAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Jina AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'jinaAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Mistral AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'mistral',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Mistral AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'mistral',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Mistral AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'mistral',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'openai',
        modelName: 'MODEL_NAME}',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          organizationId: 'ORGANIZATION_ID',
          projectId: 'PROJECT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'openai',
        modelName: 'MODEL_NAME}',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          organizationId: 'ORGANIZATION_ID',
          projectId: 'PROJECT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The OpenAI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'openai',
        modelName: 'MODEL_NAME}',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
        parameters: {
          organizationId: 'ORGANIZATION_ID',
          projectId: 'PROJECT_ID',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Upstage integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'upstageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Upstage integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'upstageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Upstage integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'upstageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

Automatic type inference
Manually typed tables
Untyped tables

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Voyage AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'voyageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Voyage AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'voyageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Manually define the type of the table's schema and primary key
type TableSchema = {
  VECTOR_COLUMN_NAME: DataAPIVector,
  TEXT_COLUMN_NAME: string;
};

type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;

import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The Voyage AI integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      dimension: MODEL_DIMENSIONS,
      service: {
        provider: 'voyageAI',
        modelName: 'MODEL_NAME',
        authentication: {
          providerKey: 'API_KEY_NAME',
        },
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));
    // Define parameters for the service provider
    Map<String, Object > params = new HashMap<>();
    params.put("resourceName", "RESOURCE_NAME");
    params.put("deploymentId", "DEPLOYMENT_ID");


    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Azure OpenAI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("azureOpenAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                            .parameters(params)
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Azure OpenAI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "azureOpenAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
            parameters = {
                @KeyValue(key = "resourceName", value = "RESOURCE_NAME"),
                @KeyValue(key = "deploymentId", value = "DEPLOYMENT_ID")
            })
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object > params = new HashMap<>();
    params.put("endpointName", "ENDPOINT_NAME");
    params.put("regionName", "REGION_NAME");
    params.put("cloudName", "CLOUD_NAME");

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Hugging Face Dedicated integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("huggingfaceDedicated")
                            .modelName("endpoint-defined-model")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Hugging Face Dedicated integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "huggingfaceDedicated",
            modelName = "endpoint-defined-model",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
            parameters = {
                @KeyValue(key = "endpointName", value = "ENDPOINT_NAME"),
                @KeyValue(key = "regionName", value = "REGION_NAME"),
                @KeyValue(key = "cloudName", value = "CLOUD_NAME")
            })
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Hugging Face Serverless integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("huggingface")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Hugging Face Serverless integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "huggingface",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Jina AI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("jinaAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Jina AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "jinaAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Mistral AI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("mistral")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Mistral AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "mistral",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));
    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The NVIDIA integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .metric(SimilarityMetric.COSINE)
                    .service(
                        new VectorServiceOptions()
                            .provider("nvidia")
                            .modelName("NV-Embed-QA")
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The NVIDIA integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            provider = "nvidia",
            modelName = "NV-Embed-QA")
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define parameters for the service provider
    Map<String, Object > params = new HashMap<>();
    params.put("organizationId", "ORGANIZATION_ID");
    params.put("projectId", "PROJECT_ID");

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The OpenAI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("openai")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                            .parameters(params)
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The OpenAI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "openai",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
            parameters = {
                @KeyValue(key = "organizationId", value = "ORGANIZATION_ID"),
                @KeyValue(key = "projectId", value = "PROJECT_ID")
            })
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Upstage integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("upstageAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Upstage integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "upstageAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

Use a generic type
Define the row type

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The Voyage AI integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .dimension(MODEL_DIMENSIONS)
                    .metric(SimilarityMetric.SIMILARITY_METRIC)
                    .service(
                        new VectorServiceOptions()
                            .provider("voyageAI")
                            .modelName("MODEL_NAME")
                            .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
  }
}

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;

public class Example {
    @EntityTable("TABLE_NAME")
    @Data
    public class ExampleRow {
        // This column will store vector embeddings.
        // The Voyage AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        @ColumnVector(
            name = "VECTOR_COLUMN_NAME",
            dimension = MODEL_DIMENSIONS,
            metric = SimilarityMetric.SIMILARITY_METRIC,
            provider = "voyageAI",
            modelName = "MODEL_NAME",
            authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
        private DataAPIVector exampleVector;

        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        // You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
        @PartitionBy(0)
        @Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
        private String originalText;
    }
    public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Create the table
    Table<ExampleRow> table = database.createTable(ExampleRow.class);
  }
}

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Azure OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "azureOpenAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "resourceName": "RESOURCE_NAME",
              "deploymentId": "DEPLOYMENT_ID"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Dedicated integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingfaceDedicated",
            "modelName": "endpoint-defined-model",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "endpointName": "ENDPOINT_NAME",
              "regionName": "REGION_NAME",
              "cloudName": "CLOUD_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Hugging Face Serverless integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "huggingface",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Jina AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "jinaAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Mistral AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "mistral",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "service": {
            "provider": "nvidia",
            "modelName": "NV-Embed-QA"
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The OpenAI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "openai",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "organizationId": "ORGANIZATION_ID",
              "projectId": "PROJECT_ID"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Upstage integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "upstageAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The Voyage AI integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "dimension": MODEL_DIMENSIONS,
          "service": {
            "provider": "voyageAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Replace the following:

TABLE_NAME: The name for your table.
VECTOR_COLUMN_NAME: The name for your vector column.
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
INDEX_NAME: The name for the index.
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are COSINE (default), DOT_PRODUCT, and EUCLIDEAN.
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal.
MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Client reference

Python
TypeScript
Java
curl

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Create a table

Result

Parameters

Examples

Create a table with a single-column primary key

Create a table with a composite primary key

Create a table with a compound primary key

Create a table with a column to store vector embeddings

Create a table with a column to automatically generate vector embeddings

Client reference

Was this helpful?

Give Feedback