Alter a table
Alters a table by doing one of the following:
-
Adding one or more columns to a table
-
Dropping one or more columns from a table
-
Adding automatic embedding generation for one or more vector columns
-
Removing automatic embedding generation for one or more vector columns
You cannot change a column’s type. Instead, you must drop the column and add a new column.
After you add a column, you should index the column if you want to filter or sort on the column. All indexed column names must use snake case, not camel case. For more information, see Create an index and Create a vector index.
|
Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart. |
Result
-
Python
-
TypeScript
-
Java
-
curl
Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.
Returns a Table instance that represents the table after the modification.
Although the Table instance that you used to perform the alteration will still work, it will not reflect the updated typing if you added or dropped columns.
To reflect the new typing, use the row_type parameter.
Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.
Returns a promise that resolves to a Table instance that represents the table after the modification.
Although the Table instance that you used to perform the alteration will still work, it will not reflect the updated typing if you added or dropped columns.
To reflect the new typing, provide the new type of the table to the alter method.
For example:
const newTable = await table.alter<NewSchema>({
operation: {
add: {
columns: {
venue: 'text',
},
},
},
});
Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.
Returns a Table<T> instance that represents the table after the schema change.
Although the Table instance that you used to perform the alteration will still work, it will not reflect the updated typing if you added or dropped columns.
To reflect the new typing, use the rowClass parameter.
Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.
If the command succeeds, the response indicates the success.
Example response:
{
"status": {
"ok": 1
}
}
Parameters
-
Python
-
TypeScript
-
Java
-
curl
Use the alter method, which belongs to the astrapy.Table class.
Method signature
alter(
operation: AlterTableOperation | dict[str, Any],
*,
row_type: type[Any] = DefaultRowType,
table_admin_timeout_ms: int,
request_timeout_ms: int,
timeout_ms: int,
) -> Table[NEW_ROW]
| Name | Type | Summary |
|---|---|---|
|
|
The alter operation to perform. Can be one of the following:
|
|
|
Optional.
A formal specifier for the type checker.
If provided, |
|
|
Optional.
A timeout, in milliseconds, for the underlying HTTP request.
If not provided, the |
Use the alter method, which belongs to the Table class.
Method signature
async alter(
options: AlterTableOptions<Schema>
): Table<Schema, PKey>
| Name | Type | Summary |
|---|---|---|
|
|
The alter operation to perform. Can be one of the following:
|
|
|
The timeout(s) to apply to HTTP request(s) originating from this method. |
Use the alter method, which belongs to the com.datastax.astra.client.tables.Table class.
Method signature
Table<T> alter(AlterTableOperation operation)
Table<T> alter(
AlterTableOperation operation,
AlterTableOptions options
)
<R> Table<R> alter(
AlterTableOperation operation,
AlterTableOptions options,
Class<R> clazz
)
| Name | Type | Summary |
|---|---|---|
|
|
The alter operation to perform. Can be one of the following:
|
|
Optional. The options for this operation, including the timeout. |
|
|
|
Optional. A specification of the class of the table’s row object. Default: |
Use the alterTable command.
Command signature
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
"NEW_COLUMN_NAME": "DATA_TYPE",
"NEW_COLUMN_NAME": "DATA_TYPE"
}
}
}
}
}'
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"drop": {
"columns": [ "COLUMN_NAME", "COLUMN_NAME" ]
}
}
}
}'
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "PROVIDER",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": PARAMETERS
}
}
}
}
}
}'
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"dropVectorize": {
"columns": [ "VECTOR_COLUMN_NAME", "VECTOR_COLUMN_NAME" ]
}
}
}
}'
| Name | Type | Summary |
|---|---|---|
|
|
The alter operation to perform. Can be one of the following:
|
Examples
The following examples demonstrate how to alter a table.
Add columns to a table
When you add columns, the columns are defined in the same way as they are when you create a table.
After you add a column, you should index the column if you want to filter or sort on the column. For more information, see Create an index and Create a vector index.
-
Python
-
TypeScript
-
Java
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
ColumnType,
TableScalarColumnTypeDescriptor,
)
# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")
# Add columns
table.alter(
AlterTableAddColumns(
columns={
"is_summer_reading": TableScalarColumnTypeDescriptor(
column_type=ColumnType.BOOLEAN,
),
"library_branch": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT,
),
},
),
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");
// Add columns
(async function () {
await table.alter({
operation: {
add: {
columns: {
is_summer_reading: "boolean",
library_branch: "text",
},
},
},
});
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add columns
AlterTableAddColumns alterOperation =
new AlterTableAddColumns()
.addColumnBoolean("is_summer_reading")
.addColumnText("library_branch");
table.alter(alterOperation);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
"is_summer_reading": "boolean",
"library_branch": "text"
}
}
}
}
}'
Add vector columns to a table
After you add a vector column, you should index the column if you want run vector searches on the column. For more information, Create a vector index.
-
Python
-
TypeScript
-
Java
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
)
# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")
# Add a vector column
table.alter(
AlterTableAddColumns(
columns={
"example_vector": TableVectorColumnTypeDescriptor(
dimension=1024,
),
},
)
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");
// Add a vector column
(async function () {
await table.alter({
operation: {
add: {
columns: {
example_vector: { type: "vector", dimension: 1024 },
},
},
},
});
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column
AlterTableAddColumns alterOperation =
new AlterTableAddColumns()
.addColumnVector(
"example_vector",
new TableColumnDefinitionVector().dimension(1024).metric(SimilarityMetric.COSINE));
table.alter(alterOperation);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
"example_vector": {
"type": "vector",
"dimension": 1024
}
}
}
}
}
}'
Add a vector column and configure an embedding provider integration
When you add a vector column to a table, you can configure an embedding provider integration for the column. The integration will automatically generate vector embeddings for any data inserted into the column.
The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.
The original data isn’t stored automatically. If you want to store the original data in addition to the vector embeddings that were generated from the data, then you need to create a separate column and manually store the original data in that column.
After you add a vector column, you should index the column if you want run vector searches on the column. For more information, Create a vector index.
-
Python
-
TypeScript
-
Java
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Azure OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="azureOpenAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Hugging Face Dedicated integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="huggingfaceDedicated",
model_name="endpoint-defined-model",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION_NAME",
"cloudName": "CLOUD_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Hugging Face Serverless integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="huggingface",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Jina AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="jinaAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Mistral AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="mistral",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The NVIDIA integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
service=VectorServiceOptions(
provider="nvidia",
model_name="nvidia/nv-embedqa-e5-v5",
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="openai",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Upstage integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="upstageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddColumns,
TableVectorColumnTypeDescriptor,
VectorServiceOptions,
TableScalarColumnTypeDescriptor,
ColumnType
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Add a vector column and configure an embedding provider
table.alter(
AlterTableAddColumns(
columns={
# This column will store vector embeddings.
# The Voyage AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
dimension=MODEL_DIMENSIONS,
service=VectorServiceOptions(
provider="voyageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
),
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
column_type=ColumnType.TEXT
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embedding_api_keyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'azureOpenAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
resourceName: 'RESOURCE_NAME',
deploymentId: 'DEPLOYMENT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingfaceDedicated',
modelName: 'endpoint-defined-model',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
endpointName: 'ENDPOINT_NAME',
regionName: 'REGION_NAME',
cloudName: 'CLOUD_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'huggingface',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'jinaAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'mistral',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
service: {
provider: 'nvidia',
modelName: 'nvidia/nv-embedqa-e5-v5',
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'openai',
modelName: 'MODEL_NAME}',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
organizationId: 'ORGANIZATION_ID',
projectId: 'PROJECT_ID',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'upstageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
add: {
columns: {
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
type: "vector",
dimension: MODEL_DIMENSIONS,
service: {
provider: 'voyageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
TEXT_COLUMN_NAME: "text",
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingApiKeyparameter when you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Define parameters for the embedding provider
Map<String, Object > params = new HashMap<>();
params.put("resourceName", "RESOURCE_NAME");
params.put("deploymentId", "DEPLOYMENT_ID");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("azureOpenAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
.parameters(params)
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Define parameters for the embedding provider
Map<String, Object > params = new HashMap<>();
params.put("endpointName", "ENDPOINT_NAME");
params.put("regionName", "REGION_NAME");
params.put("cloudName", "CLOUD_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("huggingfaceDedicated")
.modelName("endpoint-defined-model")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("huggingface")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("jinaAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("mistral")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.metric(SimilarityMetric.COSINE)
.service(
new VectorServiceOptions()
.provider("nvidia")
.modelName("nvidia/nv-embedqa-e5-v5")
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Define parameters for the embedding provider
Map<String, Object > params = new HashMap<>();
params.put("organizationId", "ORGANIZATION_ID");
params.put("projectId", "PROJECT_ID");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("openai")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
.parameters(params)
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("upstageAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Add a vector column and configure an embedding provider
AlterTableAddColumns alterOperation = new AlterTableAddColumns()
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
.addColumnVector(
"VECTOR_COLUMN_NAME",
new TableColumnDefinitionVector()
.dimension(MODEL_DIMENSIONS)
.metric(SimilarityMetric.SIMILARITY_METRIC)
.service(
new VectorServiceOptions()
.provider("voyageAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
)
// If you want to store the original text
// in addition to the generated embeddings
// you must create a separate column.
.addColumnText("TEXT_COLUMN_NAME");
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in the
embeddingAuthProvider()method ofCreateTableOptionswhen you instantiate aTableobject with the commands to create a table or get a table. The client will send thex-embedding-api-keyheader with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Azure OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "azureOpenAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Hugging Face Dedicated integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "huggingfaceDedicated",
"modelName": "endpoint-defined-model",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION_NAME",
"cloudName": "CLOUD_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Hugging Face Serverless integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "huggingface",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Jina AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "jinaAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Mistral AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "mistral",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The NVIDIA integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"service": {
"provider": "nvidia",
"modelName": "nvidia/nv-embedqa-e5-v5"
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The OpenAI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "openai",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Upstage integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "upstageAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"add": {
"columns": {
# This column will store vector embeddings.
# The Voyage AI integration
# will automatically generate vector embeddings
# for any text inserted to this column.
"VECTOR_COLUMN_NAME": {
"type": "vector",
"dimension": MODEL_DIMENSIONS,
"service": {
"provider": "voyageAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
},
# If you want to store the original text
# in addition to the generated embeddings
# you must create a separate column.
"TEXT_COLUMN_NAME": "text"
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings. -
API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.Alternatively, you can omit this parameter and instead provide the authentication key in an
x-embedding-api-keyheader. Header authentication overrides theAPI_KEY_NAMEparameter if you set both. If you use the header instead of specifying theAPI_KEY_NAMEparameter, you must include the header in every command that uses vectorize, including writes and vector search. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2. -
MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.If you omit
dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.
Drop columns from a table
Dropping columns produces tombstones. Excessive tombstones can impact query performance.
-
Python
-
TypeScript
-
Java
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
from astrapy import DataAPIClient
from astrapy.info import AlterTableDropColumns
# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")
# Drop columns
table.alter(
AlterTableDropColumns(
columns=["is_summer_reading", "library_branch"],
),
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");
// Drop columns
(async function () {
await table.alter({
operation: {
drop: {
columns: ["is_summer_reading", "library_branch"],
},
},
});
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableDropColumns;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Drop columns
AlterTableDropColumns alterOperation =
new AlterTableDropColumns("is_summer_reading", "library_branch");
table.alter(alterOperation);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"drop": {
"columns": ["is_summer_reading", "library_branch"]
}
}
}
}'
Add automatic embedding generation to existing vector columns
You can configure an embedding provider integration for an existing vector column. The integration will automatically generate vector embeddings for any data inserted into the column.
The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.
If your vector column already includes vector data, make sure the service options are compatible with the existing embeddings. This ensures accurate vector search results.
-
Python
-
TypeScript
-
Java
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="azureOpenAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="huggingfaceDedicated",
model_name="endpoint-defined-model",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION_NAME",
"cloudName": "CLOUD_NAME",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="huggingface",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="jinaAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="mistral",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="nvidia",
model_name="nvidia/nv-embedqa-e5-v5",
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="openai",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
parameters={
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="upstageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
import os
from astrapy import DataAPIClient
from astrapy.info import (
AlterTableAddVectorize,
VectorServiceOptions,
)
# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")
# Configure an embedding provider for a column
table.alter(
AlterTableAddVectorize(
columns={
"VECTOR_COLUMN_NAME": VectorServiceOptions(
provider="voyageAI",
model_name="MODEL_NAME",
authentication={
"providerKey": "API_KEY_NAME",
},
),
},
)
)
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Azure OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'azureOpenAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
resourceName: 'RESOURCE_NAME',
deploymentId: 'DEPLOYMENT_ID',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Hugging Face Dedicated integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'huggingfaceDedicated',
modelName: 'endpoint-defined-model',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
endpointName: 'ENDPOINT_NAME',
regionName: 'REGION_NAME',
cloudName: 'CLOUD_NAME',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Hugging Face Serverless integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'huggingface',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Jina AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'jinaAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Mistral AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'mistral',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The NVIDIA integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'nvidia',
modelName: 'nvidia/nv-embedqa-e5-v5',
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The OpenAI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'openai',
modelName: 'MODEL_NAME}',
authentication: {
providerKey: 'API_KEY_NAME',
},
parameters: {
organizationId: 'ORGANIZATION_ID',
projectId: 'PROJECT_ID',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Upstage integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'upstageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");
// Define the columns and primary key for the table
(async function () {
await table.alter({
operation: {
addVectorize: {
columns: {
// This column will store vector embeddings.
// The Voyage AI integration
// will automatically generate vector embeddings
// for any text inserted to this column.
VECTOR_COLUMN_NAME: {
provider: 'voyageAI',
modelName: 'MODEL_NAME',
authentication: {
providerKey: 'API_KEY_NAME',
},
},
},
},
},
});
})();
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Define parameters for the embedding provider
Map<String, Object > params = new HashMap<>();
params.put("resourceName", "RESOURCE_NAME");
params.put("deploymentId", "DEPLOYMENT_ID");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("azureOpenAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
.parameters(params)
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Define parameters for the embedding provider
Map<String, Object > params = new HashMap<>();
params.put("endpointName", "ENDPOINT_NAME");
params.put("regionName", "REGION_NAME");
params.put("cloudName", "CLOUD_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("huggingfaceDedicated")
.modelName("endpoint-defined-model")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("huggingface")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("jinaAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("mistral")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("nvidia")
.modelName("nvidia/nv-embedqa-e5-v5")
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Define parameters for the embedding provider
Map<String, Object > params = new HashMap<>();
params.put("organizationId", "ORGANIZATION_ID");
params.put("projectId", "PROJECT_ID");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("openai")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
.parameters(params)
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("upstageAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;
import java.util.HashMap;
import java.util.Map;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Configure an embedding provider for a column
AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
.columns(
Map.of(
"VECTOR_COLUMN_NAME",
new VectorServiceOptions()
.provider("voyageAI")
.modelName("MODEL_NAME")
.authentication(Map.of("providerKey", "API_KEY_NAME"))
)
);
table.alter(alterOperation);
}
}
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2.
Azure OpenAI
For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "azureOpenAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"resourceName": "RESOURCE_NAME",
"deploymentId": "DEPLOYMENT_ID"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002.For Azure OpenAI, you must select the model that matches the one deployed to your
DEPLOYMENT_IDin Azure. -
RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation. -
DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.
Hugging Face - Dedicated
For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "huggingfaceDedicated",
"modelName": "endpoint-defined-model",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"endpointName": "ENDPOINT_NAME",
"regionName": "REGION_NAME",
"cloudName": "CLOUD_NAME"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:endpoint-defined-model.For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.
You must set
MODEL_NAMEtoendpoint-defined-modelbecause this integration uses the model specified in your dedicated endpoint configuration. -
ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL ishttps://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name ismtp1x7muf6qyn3yh. -
REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example,us-east-2. -
CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example,aws.
Hugging Face - Serverless
For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "huggingface",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:sentence-transformers/all-MiniLM-L6-v2,intfloat/multilingual-e5-large,intfloat/multilingual-e5-large-instruct,BAAI/bge-small-en-v1.5,BAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5.
Jina AI
For more detailed instructions, see Integrate Jina AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "jinaAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:jina-embeddings-v2-base-en,jina-embeddings-v2-base-de,jina-embeddings-v2-base-es,jina-embeddings-v2-base-code,jina-embeddings-v2-base-zh.
Mistral AI
For more detailed instructions, see Integrate Mistral AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "mistral",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:mistral-embed.
NVIDIA
For more detailed instructions, see Integrate NVIDIA as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "nvidia",
"modelName": "nvidia/nv-embedqa-e5-v5"
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.
OpenAI
For more detailed instructions, see Integrate OpenAI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "openai",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
},
"parameters": {
"organizationId": "ORGANIZATION_ID",
"projectId": "PROJECT_ID"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002. -
ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference. -
PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.
Upstage
For more detailed instructions, see Integrate Upstage as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "upstageAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:solar-embedding-1-large.
Voyage AI
For more detailed instructions, see Integrate Voyage AI as an embedding provider.
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"addVectorize": {
"columns": {
"VECTOR_COLUMN_NAME": {
"provider": "voyageAI",
"modelName": "MODEL_NAME",
"authentication": {
"providerKey": "API_KEY_NAME"
}
}
}
}
}
}
}'
Replace the following:
-
APPLICATION_TOKEN: A secure reference to your application token. -
API_ENDPOINT: Your database’s endpoint. -
TABLE_NAME: The name for your table. -
VECTOR_COLUMN_NAME: The name for your vector column. -
MODEL_NAME: The model that you want to use to generate embeddings. The available models are:voyage-2,voyage-code-2,voyage-finance-2,voyage-large-2,voyage-large-2-instruct,voyage-law-2,voyage-multilingual-2.
Remove automatic embedding generation from vector columns
You can remove automatic embedding generation for one or more vector columns. Removing a vectorize integration from a column does not remove the vector embeddings stored in the column.
-
Python
-
TypeScript
-
Java
-
curl
The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.
from astrapy import DataAPIClient
from astrapy.info import AlterTableDropVectorize
# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")
# Remove automatic embedding generation
table.alter(
AlterTableDropVectorize(
columns=["plot_synopsis"],
),
)
import { DataAPIClient } from "@datastax/astra-db-ts";
// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");
// Remove automatic embedding generation
(async function () {
await table.alter({
operation: {
dropVectorize: {
columns: ["plot_synopsis"],
},
},
});
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableDropVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
public class Example {
public static void main(String[] args) {
// Get an existing table
Table<Row> table =
new DataAPIClient("APPLICATION_TOKEN")
.getDatabase("API_ENDPOINT")
.getTable("TABLE_NAME");
// Remove automatic embedding generation
AlterTableDropVectorize alterOperation = new AlterTableDropVectorize("plot_synopsis");
table.alter(alterOperation);
}
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"alterTable": {
"operation": {
"dropVectorize": {
"columns": ["plot_synopsis"]
}
}
}
}'
Client reference
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the client reference.
For more information, see the client reference.
For more information, see the client reference.
Client reference documentation is not applicable for HTTP.