Create a table
Creates a new table in a keyspace in a database.
After you create a table, index columns that you want to sort or filter. This optimizes your queries and avoids resource intensive, long running allow filtering operations. All indexed column names must use snake case, not camel case.
You can also modify the table columns later. To add data to your table, insert rows.
|
Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart. |
Result
-
Python
-
TypeScript
-
Java
-
C#
-
curl
Creates a table with the specified parameters.
Returns a Table object.
You can use this object to work with rows in the table.
Unless you specify the row_type parameter, the table is typed as Table[dict].
For more information, see Typing support.
Creates a table with the specified parameters.
Returns a promise that resolves to a <Table<Schema, PKey>> object.
You can use this object to work with rows in the table.
Unless you specify the Schema, the table is typed as Table<Record<string, any>>.
Creates a table with the specified parameters.
Returns a Table<T> object.
You can use this object to work with rows in the table.
Unless you specify the rowClass parameter, the table is typed as Table<Row>.
Creates a table with the specified parameters.
Returns a Table object.
You can use this object to work with rows in the table.
By default, the Table object is typed as Table<Row>, where Row is Dictionary<string, object>.
You can enable stronger typing by specifying a type when you create the table.
For more information and examples, see Custom typing for tables.
Creates a table with the specified parameters.
If the command succeeds, the response indicates the success.
Example successful response:
{
"status": {
"ok": 1
}
}
Parameters
-
Python
-
TypeScript
-
Java
-
C#
-
curl
Use the create_table method, which belongs to the astrapy.Database class.
Method signature
create_table(
name: str,
*,
definition: CreateTableDefinition | dict[str, Any],
row_type: type[Any],
keyspace: str,
if_not_exists: bool,
table_admin_timeout_ms: int,
request_timeout_ms: int,
timeout_ms: int,
embedding_api_key: str | EmbeddingHeadersProvider,
spawn_api_options: APIOptions,
) -> Table[ROW]
| Name | Type | Summary |
|---|---|---|
|
|
The name of the table. Table names must follow these rules:
|
|
|
The full schema for the table, including column names, column data types, and the primary key. See the examples for usage. All column names used in the schema must be unique within the table. Any columns that will be indexes must use snake case, not camel case, for their name. |
|
|
Optional.
A formal specifier for the type checker.
If provided, Default: |
|
|
Optional. The keyspace in which to create the table. For an example, see Create a table and specify the keyspace. Default: The working keyspace for the database. |
|
|
Optional. Whether the command should silently succeed even if a table with the given name already exists in the keyspace and no new table was created. This option only checks table names. It does not check table schemas. Default: false |
|
|
Optional. This only applies to tables that have a vector column with a vectorize embedding provider integration. Use this option to provide the embedding provider API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the table. It is useful when a vectorize integration is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Manage embedding provider integrations for vectorize. You can use this authentication method only if all affected columns use the same embedding provider. Most vectorize integrations accept a plain string for header authentication.
However, some vectorize integrations and models require specialized subclasses of |
|
Optional.
A complete or partial specification of the APIOptions to override the defaults inherited from the If |
|
|
|
Optional.
A timeout, in milliseconds, for the underlying HTTP request.
If not provided, the |
Use the createTable method, which belongs to the Db class.
Method signature
async createTable<const Def extends CreateTableDefinition>(
name: string,
options: {
definition: CreateTableDefinition,
ifNotExists?: boolean,
embeddingApiKey?: string | EmbeddingHeadersProvider,
logging?: DataAPILoggingConfig,
serdes?: TableSerDesConfig,
timeoutDefaults?: Partial<TimeoutDescriptor>,
keyspace?: string,
}
): Table<InferTableSchema<Def>, InferTablePrimaryKey<Def>>
Parameters:
| Name | Type | Summary |
|---|---|---|
|
|
The name of the table. Table names must follow these rules:
|
|
|
The options for this operation. See Properties of |
| Name | Type | Summary |
|---|---|---|
|
|
The full schema for the table, including column names, column data types, and the primary key. See the examples for usage. All column names used in the schema must be unique within the table. Any columns that will be indexes must use snake case, not camel case, for their name. |
|
|
Optional. Whether the command should silently succeed even if a table with the given name already exists in the keyspace and no new table was created. This option only checks table names. It does not check table schemas. Default: false |
|
|
Optional. The keyspace in which to create the table. For an example, see Create a table and specify the keyspace. Default: The working keyspace for the database. |
|
|
Optional. This only applies to tables that have a vector column with a vectorize embedding provider integration. Use this option to provide the embedding provider API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the table. It is useful when a vectorize integration is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Manage embedding provider integrations for vectorize. You can use this authentication method only if all affected columns use the same embedding provider. |
|
|
Optional.
The configuration for logging events emitted by the |
|
|
Optional.
The default timeout options for any operation performed on this |
|
|
Optional. Lower-level serialization/deserialization configuration for this table. For more information, see Custom Ser/Des. |
Use the createTable method, which belongs to the com.datastax.astra.client.databases.Database class.
Method signature
<T> Table<T> createTable(
String tableName,
TableDefinition tableDefinition,
Class<T> rowClass,
CreateTableOptions createTableOptions
)
<T> Table<T> createTable(
String tableName,
TableDefinition tableDefinition,
Class<T> rowClass
)
<T> Table<T> createTable(Class<T> rowClass)
<T> Table<T> createTable(
Class<T> rowClass,
CreateTableOptions createTableOptions
)
<T> Table<T> createTable(
String tableName,
Class<T> rowClass,
CreateTableOptions createTableOptions
)
Table<Row> createTable(
String tableName,
TableDefinition tableDefinition,
CreateTableOptions options
)
Table<Row> createTable(
String tableName,
TableDefinition tableDefinition
)
| Name | Type | Summary |
|---|---|---|
|
|
The name of the table. Table names must follow these rules:
|
|
The full schema for the table, including column names, column data types, and the primary key. See the examples for usage. All column names used in the schema must be unique within the table. Any columns that will be indexes must use snake case, not camel case, for their name. |
|
|
|
Optional. A specification of the class of the table’s row object. Default: |
|
Optional.
The options for this operation. See Selected methods of |
| Method | Parameters | Summary |
|---|---|---|
|
|
Optional. Whether the command should silently succeed even if a table with the given name already exists in the keyspace and no new table was created. This option only checks table names. It does not check table schemas. Default: false |
|
|
Optional. This only applies to tables that have a vector column with a vectorize embedding provider integration. Use this option to provide the embedding provider API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the table. It is useful when a vectorize integration is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Manage embedding provider integrations for vectorize. You can use this authentication method only if all affected columns use the same embedding provider. Most vectorize integrations accept a plain string for header authentication.
However, some vectorize integrations and models require specialized subclasses of |
|
|
Optional.
A timeout, in milliseconds, for the underlying HTTP request.
If not provided, the |
Use the CreateTableAsync method, which belongs to the Database class.
You can also use CreateTable, which is the synchronous version of the method.
Method signature
public Task<Table<TRow>> CreateTableAsync<TRow>(
TableDefinition definition, CreateTableOptions options = null
) where TRow : class, new();
public Task<Table<TRow>> CreateTableAsync<TRow>(
string tableName,
TableDefinition definition,
CreateTableOptions options = null
) where TRow : class;
public Task<Table<TRow>> CreateTableAsync<TRow>(
string tableName, CreateTableOptions options = null
) where TRow : class, new();
public Task<Table<Row>> CreateTableAsync(
string tableName,
TableDefinition definition,
CreateTableOptions options = null
);
public Task<Table<TRow>> CreateTableAsync<TRow>(
CreateTableOptions options = null
) where TRow : class, new();
| Name | Type | Summary |
|---|---|---|
|
|
The name of the table. Table names must follow these rules:
If not specified, the client attempts to extract it from the |
|
The full schema for the table, including column names, column data types, and the primary key. See the examples for usage. All column names used in the schema must be unique within the table. Any columns that will be indexes must use snake case, not camel case, for their name. |
|
|
Optional.
Options for this operation.
For more information and examples for general options such as timeout and keyspace, see Customize API interaction.
For options specific to this method, see Method-specific properties of the |
| Name | Type | Summary |
|---|---|---|
|
|
Optional. This only applies to tables that have a vector column with a vectorize embedding provider integration. Use this option to provide the embedding provider API key directly with headers instead of using an API key in the Astra DB KMS. The API key is sent to the Data API for every operation on the table. It is useful when a vectorize integration is configured but no credentials are stored, or when you want to override the stored credentials. For more information, see Manage embedding provider integrations for vectorize. You can use this authentication method only if all affected columns use the same embedding provider. |
Use the createTable command.
Command signature
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
"COLUMN_NAME": "DATA_TYPE",
"COLUMN_NAME": "DATA_TYPE"
},
"primaryKey": "PRIMARY_KEY_DEFINITION"
}
}
}'
| Name | Type | Summary |
|---|---|---|
|
|
The name of the table. Table names must follow these rules:
|
|
|
The full schema for the table, including column names, column data types, and the primary key. See the examples for usage.
See Properties of |
| Name | Type | Summary |
|---|---|---|
|
|
The column names and data types. All column names must be unique within the table. Any columns that will be indexes must use snake case, not camel case, for their name. See the examples for usage. |
|
|
The primary key for the table. See the examples for usage. |
Examples
The following examples demonstrate how to create a table.
Create a table with a single-column primary key
A single-column primary key is a primary key consisting of one column. For more information, see Primary keys in tables.
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The Python client supports multiple ways to create a table.
In all cases, you must define the table schema, and then pass the definition to the create_table method.
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
-
CreateTableDefinition object
-
Fluent interface
-
Dictionary
You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.
from astrapy import DataAPIClient
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TableKeyValuedColumnType,
TableKeyValuedColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableScalarColumnTypeDescriptor,
TableValuedColumnType,
TableValuedColumnTypeDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = CreateTableDefinition(
# Define all of the columns in the table
columns={
"title": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
"number_of_pages": TableScalarColumnTypeDescriptor(
column_type=ColumnType.INT
),
"rating": TableScalarColumnTypeDescriptor(
column_type=ColumnType.FLOAT
),
"genres": TableValuedColumnTypeDescriptor(
column_type=TableValuedColumnType.SET,
value_type=ColumnType.TEXT,
),
"metadata": TableKeyValuedColumnTypeDescriptor(
column_type=TableKeyValuedColumnType.MAP,
key_type=ColumnType.TEXT,
value_type=ColumnType.TEXT,
),
"is_checked_out": TableScalarColumnTypeDescriptor(
column_type=ColumnType.BOOLEAN
),
"due_date": TableScalarColumnTypeDescriptor(
column_type=ColumnType.DATE
),
},
# Define the primary key for the table.
# In this case, the table uses a single-column primary key.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["title"], partition_sort={}
),
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can use a fluent interface to build the table definition and then create the table from the definition.
from astrapy import DataAPIClient
from astrapy.info import ColumnType, CreateTableDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = (
CreateTableDefinition.builder()
# Define all of the columns in the table
.add_column("title", ColumnType.TEXT)
.add_column("number_of_pages", ColumnType.INT)
.add_column("rating", ColumnType.FLOAT)
.add_set_column(
"genres",
ColumnType.TEXT,
)
.add_map_column(
"metadata",
# This is the key type for the map column
ColumnType.TEXT,
# This is the value type for the map column
ColumnType.TEXT,
)
.add_column("is_checked_out", ColumnType.BOOLEAN)
.add_column("due_date", ColumnType.DATE)
# Define the primary key for the table.
# In this case, the table uses a single-column primary key.
.add_partition_by(["title"])
# Finally, build the table definition.
.build()
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can define the table as a dictionary and then build the table from the dictionary.
from astrapy import DataAPIClient
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
"title": {"type": "text"},
"number_of_pages": {"type": "int"},
"rating": {"type": "float"},
"genres": {"type": "set", "valueType": "text"},
"metadata": {
"type": "map",
"keyType": "text",
"valueType": "text",
},
"is_checked_out": {"type": "boolean"},
"due_date": {"type": "date"},
},
"primaryKey": {
"partitionBy": ["title"],
"partitionSort": {},
},
}
table = database.create_table(
"example_table",
definition=table_definition,
)
The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.
For more information, see Collection and table typing.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.
To do this, first create the table definition.
Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key.
To create the table, provide the table definition and the inferred types to the createTable method.
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["title"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
(async function () {
// Provide the types and the definition
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can use the TableSchema type as you would any other type.
For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:
const row: TableSchema = {
title: "Wind with No Name",
number_of_pages: 193,
bad_field: "I will error",
};
You can manually define the type for your table’s schema and primary key.
To create the table, provide the table definition and the types to the createTable method.
This may be necessary if you modify the table’s default ser/des configuration.
import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["title"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
title: string;
number_of_pages?: number | null | undefined;
rating?: number | null | undefined;
genres?: Set<string> | undefined;
metadata?: Map<string, string> | undefined;
is_checked_out?: boolean | null | undefined;
due_date?: DataAPIDate | null | undefined;
};
type TablePrimaryKey = Pick<TableSchema, "title">;
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can use the TableSchema type as you would any other type.
For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:
const row: TableSchema = {
title: "Wind with No Name",
number_of_pages: 193,
bad_field: "I will error",
};
To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method.
This types the table’s rows as Record<string, any>.
This is the most flexible but least type-safe option.
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["title"],
},
});
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<SomeRow>("example_table", {
definition: tableDefinition,
});
})();
The Java client supports multiple ways to create a table. In all cases, you must define the table schema.
-
Use a generic type
-
Define the row type
If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type.
In this case, the working object type T is Row.class.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
TableDefinition tableDefinition =
new TableDefinition()
// Define all of the columns in the table
.addColumnText("title")
.addColumnInt("number_of_pages")
.addColumn("rating", TableColumnTypes.FLOAT)
.addColumnSet("genres", TableColumnTypes.TEXT)
.addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
.addColumnBoolean("is_checked_out")
.addColumn("due_date", TableColumnTypes.DATE)
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
.addPartitionBy("title");
Table<Row> table = database.createTable("example_table", tableDefinition);
}
}
Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.
This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.
The following example defines a Book class and then uses it to create the table.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import java.util.Date;
import java.util.Map;
import java.util.Set;
import lombok.Data;
public class Example {
@EntityTable("example_table")
@Data
public class Book {
@PartitionBy(0)
@Column(name = "title", type = TableColumnTypes.TEXT)
private String title;
@Column(name = "number_of_pages", type = TableColumnTypes.INT)
private Integer number_of_pages;
@Column(name = "rating", type = TableColumnTypes.FLOAT)
private Float rating;
@Column(name = "genres", type = TableColumnTypes.SET, valueType = TableColumnTypes.TEXT)
private Set<String> genres;
@Column(
name = "metadata",
type = TableColumnTypes.MAP,
keyType = TableColumnTypes.TEXT,
valueType = TableColumnTypes.TEXT)
private Map<String, String> metadata;
@Column(name = "is_checked_out", type = TableColumnTypes.BOOLEAN)
private Boolean is_checked_out;
@Column(name = "due_date", type = TableColumnTypes.DATE)
private Date due_date;
}
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
Table<Book> table = database.createTable(Book.class);
}
}
-
Typed tables
-
Untyped tables
You can manually define a client-side type for your table to help statically catch errors. For more information and examples, see Custom typing for tables.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
// Define the type for the row
[TableName("TABLE_NAME")]
public class Book
{
[ColumnPrimaryKey]
[ColumnName("title")]
public string Title { get; set; } = null!;
[ColumnName("number_of_pages")]
public int? NumberOfPages { get; set; }
[ColumnName("rating")]
public float? Rating { get; set; }
[ColumnName("genres")]
public string[]? Genres { get; set; }
[ColumnName("metadata")]
public Dictionary<string, string>? Metadata { get; set; }
[ColumnName("is_checked_out")]
public bool? IsCheckedOut { get; set; }
[ColumnName("due_date")]
public DateOnly? DueDate { get; set; }
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<Book>();
}
}
If you don’t pass a type parameter, the collection or table remains untyped. This is a more flexible but less type-safe option.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var definition = new TableDefinition()
.AddColumn("title", DataAPIType.Text())
.AddColumn("number_of_pages", DataAPIType.Int())
.AddColumn("rating", DataAPIType.Float())
.AddColumn("genres", DataAPIType.Set(DataAPIType.Text()))
.AddColumn(
"metadata",
DataAPIType.Map(DataAPIType.Text(), DataAPIType.Text())
)
.AddColumn("is_checked_out", DataAPIType.Boolean())
.AddColumn("due_date", DataAPIType.Date())
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
.AddSinglePrimaryKey("title");
var table = await database.CreateTableAsync(
"TABLE_NAME",
definition
);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "example_table",
"definition": {
"columns": {
"title": {
"type": "text"
},
"number_of_pages": {
"type": "int"
},
"rating": {
"type": "float"
},
"metadata": {
"type": "map",
"keyType": "text",
"valueType": "text"
},
"genres": {
"type": "set",
"valueType": "text"
},
"is_checked_out": {
"type": "boolean"
},
"due_date": {
"type": "date"
}
},
"primaryKey": "title"
}
}
}'
Create a table with a composite primary key
A composite primary key is a primary key consisting of multiple columns. For more information, see Primary keys in tables.
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The Python client supports multiple ways to create a table.
In all cases, you must define the table schema, and then pass the definition to the create_table method.
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
-
CreateTableDefinition object
-
Fluent interface
-
Dictionary
You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.
from astrapy import DataAPIClient
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TableKeyValuedColumnType,
TableKeyValuedColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableScalarColumnTypeDescriptor,
TableValuedColumnType,
TableValuedColumnTypeDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = CreateTableDefinition(
# Define all of the columns in the table
columns={
"title": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
"number_of_pages": TableScalarColumnTypeDescriptor(
column_type=ColumnType.INT
),
"rating": TableScalarColumnTypeDescriptor(
column_type=ColumnType.FLOAT
),
"genres": TableValuedColumnTypeDescriptor(
column_type=TableValuedColumnType.SET,
value_type=ColumnType.TEXT,
),
"metadata": TableKeyValuedColumnTypeDescriptor(
column_type=TableKeyValuedColumnType.MAP,
key_type=ColumnType.TEXT,
value_type=ColumnType.TEXT,
),
"is_checked_out": TableScalarColumnTypeDescriptor(
column_type=ColumnType.BOOLEAN
),
"due_date": TableScalarColumnTypeDescriptor(
column_type=ColumnType.DATE
),
},
# Define the primary key for the table.
# In this case, the table uses a composite primary key.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["title", "rating"], partition_sort={}
),
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can use a fluent interface to build the table definition and then create the table from the definition.
from astrapy import DataAPIClient
from astrapy.info import ColumnType, CreateTableDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = (
CreateTableDefinition.builder()
# Define all of the columns in the table
.add_column("title", ColumnType.TEXT)
.add_column("number_of_pages", ColumnType.INT)
.add_column("rating", ColumnType.FLOAT)
.add_set_column(
"genres",
ColumnType.TEXT,
)
.add_map_column(
"metadata",
# This is the key type for the map column
ColumnType.TEXT,
# This is the value type for the map column
ColumnType.TEXT,
)
.add_column("is_checked_out", ColumnType.BOOLEAN)
.add_column("due_date", ColumnType.DATE)
# Define the primary key for the table.
# In this case, the table uses a composite primary key.
.add_partition_by(["title", "rating"])
# Finally, build the table definition.
.build()
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can define the table as a dictionary and then build the table from the dictionary.
from astrapy import DataAPIClient
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
"title": {"type": "text"},
"number_of_pages": {"type": "int"},
"rating": {"type": "float"},
"genres": {"type": "set", "valueType": "text"},
"metadata": {
"type": "map",
"keyType": "text",
"valueType": "text",
},
"is_checked_out": {"type": "boolean"},
"due_date": {"type": "date"},
},
"primaryKey": {
"partitionBy": ["title", "rating"],
"partitionSort": {},
},
}
table = database.create_table(
"example_table",
definition=table_definition,
)
The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.
For more information, see Collection and table typing.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.
To do this, first create the table definition.
Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key.
To create the table, provide the table definition and the inferred types to the createTable method.
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a composite primary key.
primaryKey: {
partitionBy: ["title", "rating"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
(async function () {
// Provide the types and the definition
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can use the TableSchema type as you would any other type.
For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:
const row: TableSchema = {
title: "Wind with No Name",
number_of_pages: 193,
bad_field: "I will error",
};
You can manually define the type for your table’s schema and primary key.
To create the table, provide the table definition and the types to the createTable method.
This may be necessary if you modify the table’s default ser/des configuration.
import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a composite primary key.
primaryKey: {
partitionBy: ["title", "rating"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
title: string;
number_of_pages?: number | null | undefined;
rating?: number | null | undefined;
genres?: Set<string> | undefined;
metadata?: Map<string, string> | undefined;
is_checked_out?: boolean | null | undefined;
due_date?: DataAPIDate | null | undefined;
};
type TablePrimaryKey = Pick<TableSchema, "title" | "rating">;
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can use the TableSchema type as you would any other type.
For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:
const row: TableSchema = {
title: "Wind with No Name",
number_of_pages: 193,
bad_field: "I will error",
};
To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method.
This types the table’s rows as Record<string, any>.
This is the most flexible but least type-safe option.
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a composite primary key.
primaryKey: {
partitionBy: ["title", "rating"],
},
});
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<SomeRow>("example_table", {
definition: tableDefinition,
});
})();
The Java client supports multiple ways to create a table. In all cases, you must define the table schema.
-
Use a generic type
-
Define the row type
If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type.
In this case, the working object type T is Row.class.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
TableDefinition tableDefinition =
new TableDefinition()
// Define all of the columns in the table
.addColumnText("title")
.addColumnInt("number_of_pages")
.addColumn("rating", TableColumnTypes.FLOAT)
.addColumnSet("genres", TableColumnTypes.TEXT)
.addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
.addColumnBoolean("is_checked_out")
.addColumn("due_date", TableColumnTypes.DATE)
// Define the primary key for the table.
// In this case, the table uses a composite primary key.
.addPartitionBy("title")
.addPartitionBy("rating");
Table<Row> table = database.createTable("example_table", tableDefinition);
}
}
Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.
This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.
The following example defines a Book class and then uses it to create the table.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import java.util.Date;
import java.util.Map;
import java.util.Set;
import lombok.Data;
public class Example {
@EntityTable("example_table")
@Data
public class Book {
@PartitionBy(0)
@Column(name = "title", type = TableColumnTypes.TEXT)
private String title;
@Column(name = "number_of_pages", type = TableColumnTypes.INT)
private Integer number_of_pages;
@PartitionBy(1)
@Column(name = "rating", type = TableColumnTypes.FLOAT)
private Float rating;
@Column(name = "genres", type = TableColumnTypes.SET, valueType = TableColumnTypes.TEXT)
private Set<String> genres;
@Column(
name = "metadata",
type = TableColumnTypes.MAP,
keyType = TableColumnTypes.TEXT,
valueType = TableColumnTypes.TEXT)
private Map<String, String> metadata;
@Column(name = "is_checked_out", type = TableColumnTypes.BOOLEAN)
private Boolean is_checked_out;
@Column(name = "due_date", type = TableColumnTypes.DATE)
private Date due_date;
}
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
Table<Book> table = database.createTable(Book.class);
}
}
-
Typed tables
-
Untyped tables
You can manually define a client-side type for your table to help statically catch errors. For more information and examples, see Custom typing for tables.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
// Define the type for the row
[TableName("TABLE_NAME")]
public class Book
{
[ColumnPrimaryKey(1)]
[ColumnName("title")]
public string Title { get; set; } = null!;
[ColumnName("number_of_pages")]
public int? NumberOfPages { get; set; }
[ColumnPrimaryKey(2)]
[ColumnName("rating")]
public float Rating { get; set; }
[ColumnName("genres")]
public string[]? Genres { get; set; }
[ColumnName("metadata")]
public Dictionary<string, string>? Metadata { get; set; }
[ColumnName("is_checked_out")]
public bool? IsCheckedOut { get; set; }
[ColumnName("due_date")]
public DateOnly? DueDate { get; set; }
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<Book>();
}
}
If you don’t pass a type parameter, the collection or table remains untyped. This is a more flexible but less type-safe option.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var definition = new TableDefinition()
.AddColumn("title", DataAPIType.Text())
.AddColumn("number_of_pages", DataAPIType.Int())
.AddColumn("rating", DataAPIType.Float())
.AddColumn("genres", DataAPIType.Set(DataAPIType.Text()))
.AddColumn(
"metadata",
DataAPIType.Map(DataAPIType.Text(), DataAPIType.Text())
)
.AddColumn("is_checked_out", DataAPIType.Boolean())
.AddColumn("due_date", DataAPIType.Date())
// Define the primary key for the table.
// In this case, the table uses a composite primary key.
.AddCompositePrimaryKey(new[] { "title", "rating" });
var table = await database.CreateTableAsync(
"TABLE_NAME",
definition
);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "example_table",
"definition": {
"columns": {
"title": {
"type": "text"
},
"number_of_pages": {
"type": "int"
},
"rating": {
"type": "float"
},
"metadata": {
"type": "map",
"keyType": "text",
"valueType": "text"
},
"genres": {
"type": "set",
"valueType": "text"
},
"is_checked_out": {
"type": "boolean"
},
"due_date": {
"type": "date"
}
},
"primaryKey": {
"partitionBy": [
"title", "rating"
]
}
}
}
}'
Create a table with a compound primary key
A compound primary key is a primary key consisting of partition (grouping) columns and clustering (sorting) columns. For more information, see Primary keys in tables.
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The Python client supports multiple ways to create a table.
In all cases, you must define the table schema, and then pass the definition to the create_table method.
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
-
CreateTableDefinition object
-
Fluent interface
-
Dictionary
You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.
from astrapy import DataAPIClient
from astrapy.constants import SortMode
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TableKeyValuedColumnType,
TableKeyValuedColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableScalarColumnTypeDescriptor,
TableValuedColumnType,
TableValuedColumnTypeDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = CreateTableDefinition(
# Define all of the columns in the table
columns={
"title": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
"number_of_pages": TableScalarColumnTypeDescriptor(
column_type=ColumnType.INT
),
"rating": TableScalarColumnTypeDescriptor(
column_type=ColumnType.FLOAT
),
"genres": TableValuedColumnTypeDescriptor(
column_type=TableValuedColumnType.SET,
value_type=ColumnType.TEXT,
),
"metadata": TableKeyValuedColumnTypeDescriptor(
column_type=TableKeyValuedColumnType.MAP,
key_type=ColumnType.TEXT,
value_type=ColumnType.TEXT,
),
"is_checked_out": TableScalarColumnTypeDescriptor(
column_type=ColumnType.BOOLEAN
),
"due_date": TableScalarColumnTypeDescriptor(
column_type=ColumnType.DATE
),
},
# Define the primary key for the table.
# In this case, the table uses a compound primary key.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["title", "rating"],
partition_sort={
"number_of_pages": SortMode.ASCENDING,
"is_checked_out": SortMode.DESCENDING,
},
),
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can use a fluent interface to build the table definition and then create the table from the definition.
from astrapy import DataAPIClient
from astrapy.constants import SortMode
from astrapy.info import ColumnType, CreateTableDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = (
CreateTableDefinition.builder()
# Define all of the columns in the table
.add_column("title", ColumnType.TEXT)
.add_column("number_of_pages", ColumnType.INT)
.add_column("rating", ColumnType.FLOAT)
.add_set_column(
"genres",
ColumnType.TEXT,
)
.add_map_column(
"metadata",
# This is the key type for the map column
ColumnType.TEXT,
# This is the value type for the map column
ColumnType.TEXT,
)
.add_column("is_checked_out", ColumnType.BOOLEAN)
.add_column("due_date", ColumnType.DATE)
# Define the primary key for the table.
# In this case, the table uses a compound primary key.
.add_partition_by(["title", "rating"])
.add_partition_sort(
{
"number_of_pages": SortMode.ASCENDING,
"is_checked_out": SortMode.DESCENDING,
}
)
# Finally, build the table definition.
.build()
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can define the table as a dictionary and then build the table from the dictionary.
from astrapy import DataAPIClient
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
"title": {"type": "text"},
"number_of_pages": {"type": "int"},
"rating": {"type": "float"},
"genres": {"type": "set", "valueType": "text"},
"metadata": {
"type": "map",
"keyType": "text",
"valueType": "text",
},
"is_checked_out": {"type": "boolean"},
"due_date": {"type": "date"},
},
"primaryKey": {
"partitionBy": ["title", "rating"],
"partitionSort": {"number_of_pages": 1, "is_checked_out": -1},
},
}
table = database.create_table(
"example_table",
definition=table_definition,
)
The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.
For more information, see Collection and table typing.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.
To do this, first create the table definition.
Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key.
To create the table, provide the table definition and the inferred types to the createTable method.
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a compound primary key.
primaryKey: {
partitionBy: ["title", "rating"],
partitionSort: { number_of_pages: 1, is_checked_out: -1 },
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
(async function () {
// Provide the types and the definition
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can use the TableSchema type as you would any other type.
For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:
const row: TableSchema = {
title: "Wind with No Name",
number_of_pages: 193,
bad_field: "I will error",
};
You can manually define the type for your table’s schema and primary key.
To create the table, provide the table definition and the types to the createTable method.
This may be necessary if you modify the table’s default ser/des configuration.
import { DataAPIClient, DataAPIDate, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a compound primary key.
primaryKey: {
partitionBy: ["title", "rating"],
partitionSort: { number_of_pages: 1, is_checked_out: -1 },
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
title: string;
number_of_pages?: number | null | undefined;
rating?: number | null | undefined;
genres?: Set<string> | undefined;
metadata?: Map<string, string> | undefined;
is_checked_out?: boolean | null | undefined;
due_date?: DataAPIDate | null | undefined;
};
type TablePrimaryKey = Pick<TableSchema, "title" | "rating">;
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can use the TableSchema type as you would any other type.
For example, this gives a type error since the TableSchema type from the previous example does not include bad_field:
const row: TableSchema = {
title: "Wind with No Name",
number_of_pages: 193,
bad_field: "I will error",
};
To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method.
This types the table’s rows as Record<string, any>.
This is the most flexible but least type-safe option.
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a compound primary key.
primaryKey: {
partitionBy: ["title", "rating"],
partitionSort: { number_of_pages: 1, is_checked_out: -1 },
},
});
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<SomeRow>("example_table", {
definition: tableDefinition,
});
})();
The Java client supports multiple ways to create a table. In all cases, you must define the table schema.
-
Use a generic type
-
Define the row type
If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type.
In this case, the working object type T is Row.class.
import static com.datastax.astra.client.core.query.Sort.ascending;
import static com.datastax.astra.client.core.query.Sort.descending;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
TableDefinition tableDefinition =
new TableDefinition()
// Define all of the columns in the table
.addColumnText("title")
.addColumnInt("number_of_pages")
.addColumn("rating", TableColumnTypes.FLOAT)
.addColumnSet("genres", TableColumnTypes.TEXT)
.addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
.addColumnBoolean("is_checked_out")
.addColumn("due_date", TableColumnTypes.DATE)
// Define the primary key for the table.
// In this case, the table uses a compound primary key.
.addPartitionBy("title")
.addPartitionBy("rating")
.addPartitionSort(ascending("number_of_pages"))
.addPartitionSort(descending("is_checked_out"));
Table<Row> table = database.createTable("example_table", tableDefinition);
}
}
Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.
This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.
The following example defines a Book class and then uses it to create the table.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.query.SortOrder;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import com.datastax.astra.client.tables.mapping.PartitionSort;
import java.util.Date;
import java.util.Map;
import java.util.Set;
import lombok.Data;
public class Example {
@EntityTable("example_table")
@Data
public class Book {
@PartitionBy(0)
@Column(name = "title", type = TableColumnTypes.TEXT)
private String title;
@PartitionSort(position = 0, order = SortOrder.ASCENDING)
@Column(name = "number_of_pages", type = TableColumnTypes.INT)
private Integer number_of_pages;
@PartitionBy(1)
@Column(name = "rating", type = TableColumnTypes.FLOAT)
private Float rating;
@Column(name = "genres", type = TableColumnTypes.SET, valueType = TableColumnTypes.TEXT)
private Set<String> genres;
@Column(
name = "metadata",
type = TableColumnTypes.MAP,
keyType = TableColumnTypes.TEXT,
valueType = TableColumnTypes.TEXT)
private Map<String, String> metadata;
@PartitionSort(position = 1, order = SortOrder.DESCENDING)
@Column(name = "is_checked_out", type = TableColumnTypes.BOOLEAN)
private Boolean is_checked_out;
@Column(name = "due_date", type = TableColumnTypes.DATE)
private Date due_date;
}
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
Table<Book> table = database.createTable(Book.class);
}
}
-
Typed tables
-
Untyped tables
You can manually define a client-side type for your table to help statically catch errors. For more information and examples, see Custom typing for tables.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
// Define the type for the row
[TableName("TABLE_NAME")]
public class Book
{
[ColumnPrimaryKey(1)]
[ColumnName("title")]
public string Title { get; set; } = null!;
[ColumnPrimaryKeySort(1, SortDirection.Ascending)]
[ColumnName("number_of_pages")]
public int NumberOfPages { get; set; }
[ColumnPrimaryKey(2)]
[ColumnName("rating")]
public float Rating { get; set; }
[ColumnName("genres")]
public string[]? Genres { get; set; }
[ColumnName("metadata")]
public Dictionary<string, string>? Metadata { get; set; }
[ColumnPrimaryKeySort(2, SortDirection.Descending)]
[ColumnName("is_checked_out")]
public bool IsCheckedOut { get; set; }
[ColumnName("due_date")]
public DateOnly? DueDate { get; set; }
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<Book>();
}
}
If you don’t pass a type parameter, the collection or table remains untyped. This is a more flexible but less type-safe option.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var definition = new TableDefinition()
.AddColumn("title", DataAPIType.Text())
.AddColumn("number_of_pages", DataAPIType.Int())
.AddColumn("rating", DataAPIType.Float())
.AddColumn("genres", DataAPIType.Set(DataAPIType.Text()))
.AddColumn(
"metadata",
DataAPIType.Map(DataAPIType.Text(), DataAPIType.Text())
)
.AddColumn("is_checked_out", DataAPIType.Boolean())
.AddColumn("due_date", DataAPIType.Date())
// Define the primary key for the table.
// In this case, the table uses a compound primary key.
.AddCompoundPrimaryKey(
new[] { "title", "rating" },
new[]
{
new PrimaryKeySort("number_of_pages", SortDirection.Ascending),
new PrimaryKeySort("is_checked_out", SortDirection.Descending),
}
);
var table = await database.CreateTableAsync(
"TABLE_NAME",
definition
);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "example_table",
"definition": {
"columns": {
"title": {
"type": "text"
},
"number_of_pages": {
"type": "int"
},
"rating": {
"type": "float"
},
"metadata": {
"type": "map",
"keyType": "text",
"valueType": "text"
},
"genres": {
"type": "set",
"valueType": "text"
},
"is_checked_out": {
"type": "boolean"
},
"due_date": {
"type": "date"
}
},
"primaryKey": {
"partitionBy": [
"title",
"rating"
],
"partitionSort": {
"number_of_pages": 1,
"is_checked_out": -1
}
}
}
}
}'
Create a table with a column to store vector embeddings
If you want to store pre-generated vector embeddings in a table, create a table with a vector column. A table can include more than one vector column.
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The Python client supports multiple ways to create a table.
In all cases, you must define the table schema, and then pass the definition to the create_table method.
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
-
CreateTableDefinition object
-
Fluent interface
-
Dictionary
You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.
from astrapy import DataAPIClient
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TablePrimaryKeyDescriptor,
TableScalarColumnTypeDescriptor,
TableVectorColumnTypeDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = CreateTableDefinition(
# Define all of the columns in the table
columns={
"example_vector": TableVectorColumnTypeDescriptor(
dimension=1024,
),
"example_non_vector": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# Define the primary key for the table.
# In this case, the table uses a single-column primary key.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["example_non_vector"], partition_sort={}
),
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can use a fluent interface to build the table definition and then create the table from the definition.
from astrapy import DataAPIClient
from astrapy.info import ColumnType, CreateTableDefinition
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = (
CreateTableDefinition.builder()
# Define all of the columns in the table
.add_vector_column("example_vector", dimension=1024)
.add_column("example_non_vector", ColumnType.TEXT)
# Define the primary key for the table.
# In this case, the table uses a single-column primary key.
.add_partition_by(["example_non_vector"])
# Finally, build the table definition.
.build()
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can define the table as a dictionary and then build the table from the dictionary.
from astrapy import DataAPIClient
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
"example_vector": {"type": "vector", "dimension": 1024},
"example_non_vector": {"type": "text"},
},
"primaryKey": {
"partitionBy": ["example_non_vector"],
"partitionSort": {},
},
}
table = database.create_table(
"example_table",
definition=table_definition,
)
The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.
For more information, see Collection and table typing.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.
To do this, first create the table definition.
Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key.
To create the table, provide the table definition and the inferred types to the createTable method.
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
example_vector: { type: "vector", dimension: 1024 },
example_non_vector: "text",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["example_non_vector"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
(async function () {
// Provide the types and the definition
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can manually define the type for your table’s schema and primary key.
To create the table, provide the table definition and the types to the createTable method.
This may be necessary if you modify the table’s default ser/des configuration.
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
example_vector: { type: "vector", dimension: 1024 },
example_non_vector: "text",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["example_non_vector"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
example_vector: DataAPIVector;
example_non_vector: string;
};
type TablePrimaryKey = Pick<TableSchema, "example_non_vector">;
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method.
This types the table’s rows as Record<string, any>.
This is the most flexible but least type-safe option.
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
example_vector: { type: "vector", dimension: 1024 },
example_non_vector: "text",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["example_non_vector"],
},
});
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<SomeRow>("example_table", {
definition: tableDefinition,
});
})();
The Java client supports multiple ways to create a table. In all cases, you must define the table schema.
-
Use a generic type
-
Define the row type
If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type.
In this case, the working object type T is Row.class.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
TableDefinition tableDefinition =
new TableDefinition()
// Define all of the columns in the table
.addColumnVector("example_vector", new TableColumnDefinitionVector().dimension(1024))
.addColumnText("example_non_vector")
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
.addPartitionBy("example_non_vector");
Table<Row> table = database.createTable("example_table", tableDefinition);
}
}
Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.
This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.
The following example defines a Book class and then uses it to create the table.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("example_table")
@Data
public class Book {
@ColumnVector(name = "example_vector", dimension = 1024)
private DataAPIVector vector;
@PartitionBy(0)
@Column(name = "example_non_vector", type = TableColumnTypes.TEXT)
private String exampleNonVector;
}
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
Table<Book> table = database.createTable(Book.class);
}
}
-
Typed tables
-
Untyped tables
You can manually define a client-side type for your table to help statically catch errors. For more information and examples, see Custom typing for tables.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
// Define the type for the row
[TableName("TABLE_NAME")]
public class ExampleRow
{
[ColumnPrimaryKey]
[ColumnName("example_non_vector")]
public string ExampleNonVector { get; set; } = null!;
[ColumnVector(1024)]
[ColumnName("example_vector")]
public float[]? ExampleVector { get; set; }
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
If you don’t pass a type parameter, the collection or table remains untyped. This is a more flexible but less type-safe option.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var definition = new TableDefinition()
.AddColumn("example_vector", DataAPIType.Vector(1024))
.AddColumn("example_non_vector", DataAPIType.Text())
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
.AddSinglePrimaryKey("example_non_vector");
var table = await database.CreateTableAsync(
"TABLE_NAME",
definition
);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "example_table",
"definition": {
"columns": {
"example_vector": {
"type": "vector",
"dimension": 1024
},
"example_non_vector": {
"type": "text"
}
},
"primaryKey": "example_non_vector"
}
}
}'
Create a table with a column to automatically generate vector embeddings
If you want to automatically generate vector embeddings, create a table with a vector column and configure an embedding provider integration for the column.
The configuration depends on the embedding provider.
You can also configure an embedding provider integration after table creation. For more information, see Alter a table.
If you want to store the original text in addition to the vector embeddings that were generated from the text, then you need to create a separate column to store the text.
You can configure a different embedding provider for each vector column in the table. If you want to use the same embedding provider for all vector columns in the table, you must still configure the embedding provider for each vector column.
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Azure OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="azureOpenAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Azure OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="azureOpenAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Azure OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "azureOpenAI",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
"parameters": {
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Hugging Face Dedicated integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="huggingfaceDedicated",
model_name="endpoint-defined-model",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION",
"cloudName": "CLOUD_PROVIDER",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Hugging Face Dedicated integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="huggingfaceDedicated",
model_name="endpoint-defined-model",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION",
"cloudName": "CLOUD_PROVIDER",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Hugging Face Dedicated integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "huggingfaceDedicated",
"model_name": "endpoint-defined-model",
"authentication": {
"providerKey": "API_KEY_NAME",
},
"parameters": {
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION",
"cloudName": "CLOUD_PROVIDER",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_PROVIDER: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Hugging Face Serverless integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="huggingface",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Hugging Face Serverless integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="huggingface",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Hugging Face Serverless integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "huggingface",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Jina AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="jinaAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Jina AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="jinaAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Jina AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "jinaAI",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Mistral AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="mistral",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Mistral AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="mistral",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Mistral AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "mistral",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider. Your database must be in a supported region.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The NVIDIA integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
service=VectorServiceOptions(
provider="nvidia",
model_name="nvidia/nv-embedqa-e5-v5",
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The NVIDIA integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
service=VectorServiceOptions(
provider="nvidia",
model_name="nvidia/nv-embedqa-e5-v5",
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The NVIDIA integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"service": {
"provider": "nvidia",
"model_name": "nvidia/nv-embedqa-e5-v5",
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="openai",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="openai",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "openai",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
"parameters": {
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Upstage integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="upstageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Upstage integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="upstageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Upstage integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "upstageAI",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
-
TableDefinition object
-
Fluent interface
-
Dictionary
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
CreateTableDefinition,
ColumnType,
TableScalarColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableVectorColumnTypeDescriptor,
VectorServiceOptions
)
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
columns={
# This column will store vector embeddings.
# The Voyage AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="voyageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
# You should change the primary key definition to meet the needs of your data.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["TEXT_COLUMN_NAME"],
partition_sort={}
),
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import CreateTableDefinition, ColumnType, VectorServiceOptions
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = (
CreateTableDefinition.builder()
# This column will store vector embeddings.
# The Voyage AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
.add_vector_column("VECTOR_COLUMN_NAME",
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="voyageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
)
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
.add_column("TEXT_COLUMN_NAME", ColumnType.TEXT)
# You should change the primary key definition to meet the needs of your data.
.add_partition_by(["TEXT_COLUMN_NAME"])
# Finally, build the table definition.
.build()
)
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
from astrapy import DataAPIClient
# Instantiate the client
client = DataAPIClient()
# Connect to a database
database = client.get_database(
"API_ENDPOINT",
token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
# This column will store vector embeddings.
# The Voyage AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "voyageAI",
"model_name": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME",
},
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": {"type": "text"},
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": {
"partitionBy": ["TEXT_COLUMN_NAME"],
"partitionSort": {},
},
}
# Create the table
table = database.create_table(
"TABLE_NAME",
definition=table_definition,
)
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'azureOpenAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
resourceName: 'RESOURCE_NAME',
deploymentId: 'DEPLOYMENT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'azureOpenAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
resourceName: 'RESOURCE_NAME',
deploymentId: 'DEPLOYMENT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'azureOpenAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
resourceName: 'RESOURCE_NAME',
deploymentId: 'DEPLOYMENT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingfaceDedicated',
modelName: 'endpoint-defined-model',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
endpointName: 'ENDPOINT_NAME',
regionName: 'REGION',
cloudName: 'CLOUD_PROVIDER',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingfaceDedicated',
modelName: 'endpoint-defined-model',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
endpointName: 'ENDPOINT_NAME',
regionName: 'REGION',
cloudName: 'CLOUD_PROVIDER',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingfaceDedicated',
modelName: 'endpoint-defined-model',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
endpointName: 'ENDPOINT_NAME',
regionName: 'REGION',
cloudName: 'CLOUD_PROVIDER',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_PROVIDER: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingface',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingface',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingface',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'jinaAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'jinaAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'jinaAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'mistral',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'mistral',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'mistral',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider. Your database must be in a supported region.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
service: {
provider: 'nvidia',
modelName: 'nvidia/nv-embedqa-e5-v5',
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
service: {
provider: 'nvidia',
modelName: 'nvidia/nv-embedqa-e5-v5',
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
service: {
provider: 'nvidia',
modelName: 'nvidia/nv-embedqa-e5-v5',
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'openai',
modelName: 'MODEL_NAME}',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
organizationId: 'ORGANIZATION_ID',
projectId: 'PROJECT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'openai',
modelName: 'MODEL_NAME}',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
organizationId: 'ORGANIZATION_ID',
projectId: 'PROJECT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'openai',
modelName: 'MODEL_NAME}',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
organizationId: 'ORGANIZATION_ID',
projectId: 'PROJECT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'upstageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'upstageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'upstageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'voyageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
import { DataAPIClient, DataAPIVector, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'voyageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
// Manually define the type of the table's schema and primary key
type TableSchema = {
VECTOR_COLUMN_NAME: DataAPIVector,
TEXT_COLUMN_NAME: string;
};
type TablePrimaryKey = Pick<TableSchema, "TEXT_COLUMN_NAME">;
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Instantiate the client
const client = new DataAPIClient();
// Connect to a database
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
// Define the columns and primary key for the table
const tableDefinition = Table.schema({
columns: {
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'voyageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
// You should change the primary key definition to meet the needs of your data.
primaryKey: {
partitionBy: ["TEXT_COLUMN_NAME"],
},
});
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define parameters for the service provider
Map<String, Object > params = new HashMap<>();
params.put("resourceName", "RESOURCE_NAME");
params.put("deploymentId", "DEPLOYMENT_ID");
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("azureOpenAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
.parameters(params)
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "azureOpenAI",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
parameters = {
@KeyValue(key = "resourceName", value = "RESOURCE_NAME"),
@KeyValue(key = "deploymentId", value = "DEPLOYMENT_ID")
})
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define parameters for the service provider
Map<String, Object > params = new HashMap<>();
params.put("endpointName", "ENDPOINT_NAME");
params.put("regionName", "REGION");
params.put("cloudName", "CLOUD_PROVIDER");
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("huggingfaceDedicated")
.modelName("endpoint-defined-model")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "huggingfaceDedicated",
modelName = "endpoint-defined-model",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
parameters = {
@KeyValue(key = "endpointName", value = "ENDPOINT_NAME"),
@KeyValue(key = "regionName", value = "REGION"),
@KeyValue(key = "cloudName", value = "CLOUD_PROVIDER")
})
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_PROVIDER: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("huggingface")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "huggingface",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("jinaAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "jinaAI",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("mistral")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "mistral",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider. Your database must be in a supported region.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.metric(SimilarityMetric.COSINE)
.service(
new VectorServiceOptions()
.provider("nvidia")
.modelName("nvidia/nv-embedqa-e5-v5")
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
provider = "nvidia",
modelName = "nvidia/nv-embedqa-e5-v5")
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define parameters for the service provider
Map<String, Object > params = new HashMap<>();
params.put("organizationId", "ORGANIZATION_ID");
params.put("projectId", "PROJECT_ID");
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("openai")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
.parameters(params)
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "openai",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"),
parameters = {
@KeyValue(key = "organizationId", value = "ORGANIZATION_ID"),
@KeyValue(key = "projectId", value = "PROJECT_ID")
})
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("upstageAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "upstageAI",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
-
Use a generic type
-
Define the row type
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Define the columns and primary key for the table
TableDefinition tableDefinition =
new TableDefinition()
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new ColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("voyageAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME")
// You should change the primary key definition to meet the needs of your data.
.addPartitionBy("TEXT_COLUMN_NAME");
// Create the table
Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);
}
}
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.DataAPIVector;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.ColumnTypes;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.ColumnVector;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.KeyValue;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import lombok.Data;
public class Example {
@EntityTable("TABLE_NAME")
@Data
public class ExampleRow {
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
@ColumnVector(
name = "VECTOR_COLUMN_NAME",
dimension = MODEL_DIMENSIONS,
metric = SimilarityMetric.SIMILARITY_METRIC,
provider = "voyageAI",
modelName = "MODEL_NAME",
authentication = @KeyValue(key = "providerKey", value = "API_KEY_NAME"))
private DataAPIVector exampleVector;
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition (`PartitionBy`) to meet the needs of your data.
@PartitionBy(0)
@Column(name = "TEXT_COLUMN_NAME", type = ColumnTypes.TEXT)
private String originalText;
}
public static void main(String[] args) {
// Instantiate the client
DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());
// Connect to a database
Database database =
client.getDatabase(
"API_ENDPOINT",
new DatabaseOptions("APPLICATION_TOKEN", new DataAPIClientOptions()));
// Create the table
Table<ExampleRow> table = database.createTable(ExampleRow.class);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "azureOpenAI",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
},
parameterPairs: new object[]
{
"resourceName",
"RESOURCE_NAME",
"deploymentId",
"DEPLOYMENT_ID",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "azureOpenAI",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
Parameters = new Dictionary<string, object>()
{
{ "resourceName", "RESOURCE_NAME" },
{ "deploymentId", "DEPLOYMENT_ID" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "huggingfaceDedicated",
modelName: "endpoint-defined-model",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
},
parameterPairs: new object[]
{
"endpointName",
"ENDPOINT_NAME",
"regionName",
"REGION_NAME",
"cloudName",
"CLOUD_NAME",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "huggingfaceDedicated",
ModelName = "endpoint-defined-model",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
Parameters = new Dictionary<string, object>()
{
{ "endpointName", "ENDPOINT_NAME" },
{ "regionName", "REGION_NAME" },
{ "cloudName", "CLOUD_NAME" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_PROVIDER: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "huggingface",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "huggingface",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "jinaAI",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "jinaAI",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "mistral",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "mistral",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "nvidia",
modelName: "nvidia/nv-embedqa-e5-v5"
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
new VectorServiceOptions
{
Provider = "nvidia",
ModelName = "nvidia/nv-embedqa-e5-v5",
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "openai",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
},
parameterPairs: new object[]
{
"organizationId",
"ORGANIZATION_ID",
"projectId",
"PROJECT_ID",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "openai",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
Parameters = new Dictionary<string, object>()
{
{ "organizationId", "ORGANIZATION_ID" },
{ "projectId", "PROJECT_ID" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "upstageAI",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "upstageAI",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
-
Typed tables
-
Untyped tables
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
[TableName("TABLE_NAME")]
public class ExampleRow
{
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
[ColumnVectorize(
provider: "voyageAI",
modelName: "MODEL_NAME",
dimension: MODEL_DIMENSIONS,
authenticationPairs: new string[]
{
"providerKey", "API_KEY_NAME",
}
)]
[ColumnName("VECTOR_COLUMN_NAME")]
public object? ExampleVectorize { get; set; }
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
// You should change the primary key definition to meet the needs of your data.
[ColumnPrimaryKey(1)]
[ColumnName("TEXT_COLUMN_NAME")]
public string ExampleOriginalString { get; set; } = null!;
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Define the columns and primary key for the table
var tableDefinition = new TableDefinition()
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.AddColumn(
"VECTOR_COLUMN_NAME",
DataAPIType.Vectorize(
MODEL_DIMENSIONS,
new VectorServiceOptions
{
Provider = "voyageAI",
ModelName = "MODEL_NAME",
Authentication = new Dictionary<string, string>()
{
{ "providerKey", "API_KEY_NAME" },
},
}
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.AddColumn("TEXT_COLUMN_NAME", DataAPIType.Text())
// You should change the primary key definition to meet the needs of your data.
.AddSinglePrimaryKey("TEXT_COLUMN_NAME");
// Create the table
var table = await database.CreateTableAsync(
"TABLE_NAME",
tableDefinition
);
}
}
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
SIMILARITY_METRIC: The method you want to use to calculate vector similarity scores. The available metrics are Cosine (default), Dot Product, and Euclidean. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
EmbeddingAPIKeyproperty ofCreateTableOptionsorGetTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Azure OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "azureOpenAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Hugging Face Dedicated integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "huggingfaceDedicated",
"modelName": "endpoint-defined-model",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION",
"cloudName": "CLOUD_PROVIDER"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_PROVIDER: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Hugging Face Serverless integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "huggingface",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Jina AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "jinaAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Mistral AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "mistral",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider. Your database must be in a supported region.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The NVIDIA integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"service": {
"provider": "nvidia",
"modelName": "nvidia/nv-embedqa-e5-v5"
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "openai",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
-
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Upstage integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "upstageAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "TABLE_NAME",
"definition": {
"columns": {
# This column will store vector embeddings.
# The Voyage AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "voyageAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
},
# You should change the primary key definition to meet the needs of your data.
"primaryKey": "TEXT_COLUMN_NAME"
}
}
}'
Replace the following:
-
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit the dimension, Astra DB can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Create a table that uses a user-defined type (UDT)
In addition to the supported types, you can create a user-defined type to use in your table.
You can use a user-defined type as the type of a column or as the value type of a map, list, or set column. You can’t use a user-defined type as the key type of a map column or as a partitionKey or clustering key.
The following examples demonstrate how to use a user-defined type called person for the group_leader column, value type in the group_members set column, and value type in the group_roles map column.
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The Python client supports multiple ways to create a table.
In all cases, you must define the table schema, and then pass the definition to the create_table method.
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
-
CreateTableDefinition object
-
Fluent interface
-
Dictionary
You can define the table as a CreateTableDefinition and then build the table from the CreateTableDefinition object.
from astrapy import DataAPIClient
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TableKeyValuedColumnType,
TableKeyValuedColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableScalarColumnTypeDescriptor,
TableUDTColumnDescriptor,
TableValuedColumnType,
TableValuedColumnTypeDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = CreateTableDefinition(
# Define all of the columns in the table
columns={
"id": TableScalarColumnTypeDescriptor(
column_type=ColumnType.UUID
),
"group_leader": TableUDTColumnDescriptor(udt_name="person"),
"group_members": TableValuedColumnTypeDescriptor(
column_type=TableValuedColumnType.SET,
value_type=TableUDTColumnDescriptor(
udt_name="person",
),
),
"group_roles": TableKeyValuedColumnTypeDescriptor(
column_type=TableKeyValuedColumnType.MAP,
key_type=ColumnType.TEXT,
value_type=TableUDTColumnDescriptor(
udt_name="person",
),
),
},
primary_key=TablePrimaryKeyDescriptor(
partition_by=["id"], partition_sort={}
),
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can use a fluent interface to build the table definition and then create the table from the definition.
from astrapy import DataAPIClient
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TableUDTColumnDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = (
CreateTableDefinition.builder()
# Define all of the columns in the table
.add_scalar_column("id", ColumnType.UUID)
.add_userdefinedtype_column("group_leader", udt_name="person")
.add_set_column(
"group_members",
value_type=TableUDTColumnDescriptor(
udt_name="person",
),
)
.add_map_column(
"group_roles",
key_type=ColumnType.TEXT,
value_type=TableUDTColumnDescriptor(
udt_name="person",
),
)
# Define the primary key for the table.
.add_partition_by(["id"])
# Finally, build the table definition.
.build()
)
table = database.create_table(
"example_table",
definition=table_definition,
)
You can define the table as a dictionary and then build the table from the dictionary.
from astrapy import DataAPIClient
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
# Define the columns and primary key for the table
table_definition = {
"columns": {
"id": {"type": "uuid"},
"group_leader": {
"type": "userDefined",
"udtName": "person",
},
"group_members": {
"type": "set",
"valueType": {
"type": "userDefined",
"udtName": "person",
},
},
"group_roles": {
"type": "map",
"keyType": "text",
"valueType": {
"type": "userDefined",
"udtName": "person",
},
},
},
"primaryKey": {
"partitionBy": ["id"],
"partitionSort": {},
},
}
table = database.create_table(
"example_table",
definition=table_definition,
)
The TypeScript client supports multiple ways to create a table. The method you choose depends on your typing preferences and whether you modified the ser/des configuration.
For more information, see Collection and table typing.
-
Automatic type inference
-
Manually typed tables
-
Untyped tables
The TypeScript client can automatically infer the TypeScript-equivalent type of the table’s schema and primary key.
To do this, first create the table definition.
Then, use InferTableSchema and InferTablePrimaryKey to infer the type of the table and of the primary key.
To create the table, provide the table definition and the inferred types to the createTable method.
import {
DataAPIClient,
InferTablePrimaryKey,
InferTableSchema,
Table,
} from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
id: "uuid",
group_leader: {
type: "userDefined",
udtName: "person",
},
group_members: {
type: "set",
valueType: {
type: "userDefined",
udtName: "person",
},
},
group_roles: {
type: "map",
keyType: "text",
valueType: {
type: "userDefined",
udtName: "person",
},
},
},
// Define the primary key for the table.
primaryKey: {
partitionBy: ["id"],
},
});
// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;
(async function () {
// Provide the types and the definition
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
You can manually define the type for your table’s schema and primary key.
To create the table, provide the table definition and the types to the createTable method.
This may be necessary if you modify the table’s default ser/des configuration.
import { DataAPIClient, DataAPIDate, Table, UUID } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
id: "uuid",
group_leader: {
type: "userDefined",
udtName: "person",
},
group_members: {
type: "set",
valueType: {
type: "userDefined",
udtName: "person",
},
},
group_roles: {
type: "map",
keyType: "text",
valueType: {
type: "userDefined",
udtName: "person",
},
},
},
// Define the primary key for the table.
primaryKey: {
partitionBy: ["id"],
},
});
// Manually define the type of the table's schema and primary key
type Person = { name: string; level: number };
type TableSchema = {
id: UUID;
group_leader: Person;
group_members: Set<Person>;
group_roles: Map<string, Person>;
};
type TablePrimaryKey = Pick<TableSchema, "id">;
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<TableSchema, TablePrimaryKey>(
"example_table",
{ definition: tableDefinition },
);
})();
To create a table without any typing, pass SomeRow as the single generic type parameter to the createTable method.
This types the table’s rows as Record<string, any>.
This is the most flexible but least type-safe option.
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
id: "uuid",
group_leader: {
type: "userDefined",
udtName: "person",
},
group_members: {
type: "set",
valueType: {
type: "userDefined",
udtName: "person",
},
},
group_roles: {
type: "map",
keyType: "text",
valueType: {
type: "userDefined",
udtName: "person",
},
},
},
// Define the primary key for the table.
primaryKey: {
partitionBy: ["id"],
},
});
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<SomeRow>("example_table", {
definition: tableDefinition,
});
})();
The Java client supports multiple ways to create a table. In all cases, you must define the table schema.
-
Use a generic type
-
Define the row type
If you don’t specify the Class parameter when creating an instance of the generic class Table, the client defaults Table as the type.
In this case, the working object type T is Row.class.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing database
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
TableDefinition tableDefinition =
new TableDefinition()
// Define all of the columns in the table
.addColumnUuid("id")
.addColumnUserDefinedType("group_leader", "person")
.addColumnSetUserDefinedType("group_members", "person")
.addColumnMapUserDefinedType("group_roles", "person", TableColumnTypes.TEXT)
// Define the primary key for the table.
.addPartitionBy("id");
Table<Row> table = database.createTable("example_table", tableDefinition);
}
}
Instead of using the default type Row.class, you can define your own working object, which will be serialized as a Row.
This working object can be annotated when the field names do not exactly match the column names or when you want to fully describe your table to enable its creation solely from the entity definition.
The following example defines a Book class and then uses it to create the table.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.types.TableUserDefinedType;
import com.datastax.astra.client.tables.mapping.Column;
import com.datastax.astra.client.tables.mapping.EntityTable;
import com.datastax.astra.client.tables.mapping.PartitionBy;
import java.util.Map;
import java.util.Set;
import java.util.UUID;
import lombok.Data;
public class Example {
// Define the user-defined type "person"
@TableUserDefinedType("person")
public class Person {
@Column(name = "user_name", type = TableColumnTypes.TEXT)
private String userName;
@Column(name = "age", type = TableColumnTypes.INT)
private Integer age;
}
// Define the table
@EntityTable("example_table")
@Data
class Group {
@PartitionBy(0)
@Column(name = "id", type = TableColumnTypes.UUID)
private UUID id;
@Column(name = "group_leader", type = TableColumnTypes.USERDEFINED, udtName = "person")
private Person groupLeader;
@Column(
name = "group_members",
type = TableColumnTypes.SET,
valueType = TableColumnTypes.USERDEFINED,
udtName = "person")
private Set<Person> groupMembers;
@Column(
name = "group_roles",
type = TableColumnTypes.MAP,
keyType = TableColumnTypes.TEXT,
valueType = TableColumnTypes.USERDEFINED,
udtName = "person")
private Map<String, Person> groupRoles;
}
public static void main(String[] args) {
Database database = new DataAPIClient("APPLICATION_TOKEN").getDatabase("API_ENDPOINT");
Table<Group> table = database.createTable(Group.class);
}
}
-
Typed tables
-
Untyped tables
You can manually define a client-side type for your table to help statically catch errors. For more information and examples, see Custom typing for tables.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
// Define the user-defined type
// The type will be created if a type
// with the same name does not already exist
[UserDefinedType("person")]
public class Person
{
[ColumnName("name")]
public string? Name { get; set; }
[ColumnName("level")]
public int? Level { get; set; }
};
// Define the type for the row
[TableName("TABLE_NAME")]
public class ExampleRow
{
[ColumnPrimaryKey]
[ColumnName("id")]
public Guid Id { get; set; }
[ColumnName("group_leader")]
public Person? GroupLeader { get; set; }
[ColumnName("group_members")]
public Person[]? GroupMembers { get; set; }
[ColumnName("group_roles")]
public Dictionary<string, Person>? GroupRoles { get; set; }
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<ExampleRow>();
}
}
If you don’t pass a type parameter, the collection or table remains untyped. This is a more flexible but less type-safe option.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var definition = new TableDefinition()
.AddColumn("id", DataAPIType.Uuid())
.AddColumn("group_leader", DataAPIType.UserDefined("person"))
.AddColumn(
"group_members",
DataAPIType.Set(DataAPIType.UserDefined("person"))
)
.AddColumn(
"group_roles",
DataAPIType.Map(
DataAPIType.Text(),
DataAPIType.UserDefined("person")
)
)
.AddSinglePrimaryKey("id");
var table = await database.CreateTableAsync(
"TABLE_NAME",
definition
);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"createTable": {
"name": "example_table",
"definition": {
"columns": {
"id": {
"type": "uuid"
},
"group_leader": {
"type": "userDefined",
"udtName": "person"
},
"group_members": {
"type": "set",
"valueType": {
"type": "userDefined",
"udtName": "person"
}
},
"group_roles": {
"type": "map",
"keyType": "text",
"valueType": {
"type": "userDefined",
"udtName": "person"
}
}
},
"primaryKey": "id"
}
}
}'
Create a table and specify the keyspace
-
Python
-
TypeScript
-
Java
-
C#
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
from astrapy import DataAPIClient
from astrapy.info import (
ColumnType,
CreateTableDefinition,
TableKeyValuedColumnType,
TableKeyValuedColumnTypeDescriptor,
TablePrimaryKeyDescriptor,
TableScalarColumnTypeDescriptor,
TableValuedColumnType,
TableValuedColumnTypeDescriptor,
)
# Get an existing database
client = DataAPIClient()
database = client.get_database(
"API_ENDPOINT", token="APPLICATION_TOKEN"
)
table_definition = CreateTableDefinition(
# Define all of the columns in the table
columns={
"title": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
"number_of_pages": TableScalarColumnTypeDescriptor(
column_type=ColumnType.INT
),
"rating": TableScalarColumnTypeDescriptor(
column_type=ColumnType.FLOAT
),
"genres": TableValuedColumnTypeDescriptor(
column_type=TableValuedColumnType.SET,
value_type=ColumnType.TEXT,
),
"metadata": TableKeyValuedColumnTypeDescriptor(
column_type=TableKeyValuedColumnType.MAP,
key_type=ColumnType.TEXT,
value_type=ColumnType.TEXT,
),
"is_checked_out": TableScalarColumnTypeDescriptor(
column_type=ColumnType.BOOLEAN
),
"due_date": TableScalarColumnTypeDescriptor(
column_type=ColumnType.DATE
),
},
# Define the primary key for the table.
# In this case, the table uses a single-column primary key.
primary_key=TablePrimaryKeyDescriptor(
partition_by=["title"], partition_sort={}
),
)
table = database.create_table(
"example_table",
definition=table_definition,
keyspace="KEYSPACE_NAME",
)
import { DataAPIClient, SomeRow, Table } from "@datastax/astra-db-ts";
// Get an existing database
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const tableDefinition = Table.schema({
// Define all of the columns in the table
columns: {
title: "text",
number_of_pages: "int",
rating: "float",
genres: { type: "set", valueType: "text" },
metadata: {
type: "map",
keyType: "text",
valueType: "text",
},
is_checked_out: "boolean",
due_date: "date",
},
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
primaryKey: {
partitionBy: ["title"],
},
});
(async function () {
// Provide the types and the definition to create the table
const table = await database.createTable<SomeRow>("example_table", {
definition: tableDefinition,
keyspace: "KEYSPACE_NAME",
});
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.TableColumnTypes;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing database
Database database =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT", "KEYSPACE_NAME");
TableDefinition tableDefinition =
new TableDefinition()
// Define all of the columns in the table
.addColumnText("title")
.addColumnInt("number_of_pages")
.addColumn("rating", TableColumnTypes.FLOAT)
.addColumnSet("genres", TableColumnTypes.TEXT)
.addColumnMap("metadata", TableColumnTypes.TEXT, TableColumnTypes.TEXT)
.addColumnBoolean("is_checked_out")
.addColumn("due_date", TableColumnTypes.DATE)
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
.addPartitionBy("title");
Table<Row> table = database.createTable("example_table", tableDefinition);
}
}
-
Typed tables
-
Untyped tables
You can manually define a client-side type for your table to help statically catch errors. For more information and examples, see Custom typing for tables.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
namespace Examples;
// Define the type for the row
[TableName("TABLE_NAME")]
public class Book
{
[ColumnPrimaryKey]
[ColumnName("title")]
public string Title { get; set; } = null!;
[ColumnName("number_of_pages")]
public int? NumberOfPages { get; set; }
[ColumnName("rating")]
public float? Rating { get; set; }
[ColumnName("genres")]
public string[]? Genres { get; set; }
[ColumnName("metadata")]
public Dictionary<string, string>? Metadata { get; set; }
[ColumnName("is_checked_out")]
public bool? IsCheckedOut { get; set; }
[ColumnName("due_date")]
public DateOnly? DueDate { get; set; }
}
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var table = await database.CreateTableAsync<Book>(
new CreateTableOptions() { Keyspace = "KEYSPACE_NAME" }
);
}
}
If you don’t pass a type parameter, the collection or table remains untyped. This is a more flexible but less type-safe option.
using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Tables;
using DataStax.AstraDB.DataApi.Utils;
namespace Examples;
public class Program
{
static async Task Main()
{
// Instantiate the client
var client = new DataAPIClient();
// Connect to a database
var database = client.GetDatabase(
"API_ENDPOINT",
"APPLICATION_TOKEN"
);
// Create a table
var definition = new TableDefinition()
.AddColumn("title", DataAPIType.Text())
.AddColumn("number_of_pages", DataAPIType.Int())
.AddColumn("rating", DataAPIType.Float())
.AddColumn("genres", DataAPIType.Set(DataAPIType.Text()))
.AddColumn(
"metadata",
DataAPIType.Map(DataAPIType.Text(), DataAPIType.Text())
)
.AddColumn("is_checked_out", DataAPIType.Boolean())
.AddColumn("due_date", DataAPIType.Date())
// Define the primary key for the table.
// In this case, the table uses a single-column primary key.
.AddSinglePrimaryKey("title");
var table = await database.CreateTableAsync(
"TABLE_NAME",
definition,
new CreateTableOptions() { Keyspace = "KEYSPACE_NAME" }
);
}
}
This option has no literal equivalent in HTTP. Instead, you specify the keyspace in the path.
Client reference
-
Python
-
TypeScript
-
Java
-
C#
-
curl
For more information, see the client reference.
For more information, see the client reference.
For more information, see the client reference.
For more information, see the client reference.
Client reference documentation is not applicable for HTTP.