Alter a table

Alters a table by doing one of the following:

  • Adding one or more columns to a table

  • Dropping one or more columns from a table

  • Adding automatic embedding generation for one or more vector columns

  • Removing automatic embedding generation for one or more vector columns

You cannot change a column’s type. Instead, you must drop the column and add a new column.

Similarly, you cannot rename a table. Instead, you must drop and recreate the table.

After you add a column, you should index the column if you want to filter or sort on the column. All indexed column names must use snake case, not camel case. For more information, see Create an index and Create a vector index.

Ready to write code? See the examples for this method to get started. If you are new to the Data API, check out the quickstart.

Result

  • Python

  • TypeScript

  • Java

  • curl

Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.

Returns a Table instance that represents the table after the modification.

Although the Table instance that you used to perform the alteration will still work, it will not reflect the updated typing if you added or dropped columns. To reflect the new typing, use the row_type parameter.

Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.

Returns a promise that resolves to a Table instance that represents the table after the modification.

Although the Table instance that you used to perform the alteration will still work, it will not reflect the updated typing if you added or dropped columns. To reflect the new typing, provide the new type of the table to the alter method. For example:

const newTable = await table.alter<NewSchema>({
    operation: {
      add: {
        columns: {
          venue: 'text',
        },
      },
    },
  });

Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.

Returns a Table<T> instance that represents the table after the schema change.

Although the Table instance that you used to perform the alteration will still work, it will not reflect the updated typing if you added or dropped columns. To reflect the new typing, use the rowClass parameter.

Adds or drops columns, or adds or removes a vectorize integration for vector columns. Removing a vectorize integration for a column does not remove the vector embeddings stored in the column.

If the command succeeds, the response indicates the success.

Example response:

{
  "status": {
    "ok": 1
  }
}

Parameters

  • Python

  • TypeScript

  • Java

  • curl

Use the alter method, which belongs to the astrapy.Table class.

Method signature
alter(
  operation: AlterTableOperation | dict[str, Any],
  *,
  row_type: type[Any] = DefaultRowType,
  table_admin_timeout_ms: int,
  request_timeout_ms: int,
  timeout_ms: int,
) -> Table[NEW_ROW]
Name Type Summary

operation

AlterTableOperation | dict[str, Any]

The alter operation to perform.

Can be one of the following:

row_type

type

Optional. A formal specifier for the type checker. If provided, row_type must match the type hint specified in the assignment. For more information, see Typing support.

table_admin_timeout_ms

int

Optional. A timeout, in milliseconds, for the underlying HTTP request. If not provided, the Database setting is used. This parameter is aliased as request_timeout_ms and timeout_ms for convenience.

Use the alter method, which belongs to the Table class.

Method signature
async alter(
  options: AlterTableOptions<Schema>
): Table<Schema, PKey>
Name Type Summary

operation

AlterTableOperations<Schema>

The alter operation to perform.

Can be one of the following:

timeout

number | TimeoutDescriptor

The timeout(s) to apply to HTTP request(s) originating from this method.

Use the alter method, which belongs to the com.datastax.astra.client.tables.Table class.

Method signature
Table<T> alter(AlterTableOperation operation)
Table<T> alter(
  AlterTableOperation operation,
  AlterTableOptions options
)
<R> Table<R> alter(
  AlterTableOperation operation,
  AlterTableOptions options,
  Class<R> clazz
)
Name Type Summary

operation

AlterTableOperation

The alter operation to perform.

Can be one of the following:

options

AlterTableOptions

Optional. The options for this operation, including the timeout.

rowClass

Class<?>

Optional. A specification of the class of the table’s row object.

Default: Row, which is close to a Map object

Use the alterTable command.

Command signature
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          "NEW_COLUMN_NAME": "DATA_TYPE",
          "NEW_COLUMN_NAME": "DATA_TYPE"
        }
      }
    }
  }
}'
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "alterTable": {
    "operation": {
      "drop": {
        "columns": [ "COLUMN_NAME", "COLUMN_NAME" ]
      }
    }
  }
}'
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "PROVIDER",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": PARAMETERS
          }
        }
      }
    }
  }
}'
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
--header "Token: APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "alterTable": {
    "operation": {
      "dropVectorize": {
        "columns": [ "VECTOR_COLUMN_NAME", "VECTOR_COLUMN_NAME" ]
      }
    }
  }
}'
Name Type Summary

operation

object

The alter operation to perform.

Can be one of the following:

Examples

The following examples demonstrate how to alter a table.

Add columns to a table

When you add columns, the columns are defined in the same way as they are when you create a table.

After you add a column, you should index the column if you want to filter or sort on the column. For more information, see Create an index and Create a vector index.

  • Python

  • TypeScript

  • Java

  • curl

The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.

from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    ColumnType,
    TableScalarColumnTypeDescriptor,
)

# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")

# Add columns
table.alter(
    AlterTableAddColumns(
        columns={
            "is_summer_reading": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.BOOLEAN,
            ),
            "library_branch": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT,
            ),
        },
    ),
)
import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
  token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");

// Add columns
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          is_summer_reading: "boolean",
          library_branch: "text",
        },
      },
    },
  });
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add columns
    AlterTableAddColumns alterOperation =
        new AlterTableAddColumns()
            .addColumnBoolean("is_summer_reading")
            .addColumnText("library_branch");
    table.alter(alterOperation);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          "is_summer_reading": "boolean",
          "library_branch": "text"
        }
      }
    }
  }
}'

Add vector columns to a table

After you add a vector column, you should index the column if you want run vector searches on the column. For more information, Create a vector index.

  • Python

  • TypeScript

  • Java

  • curl

The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.

from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
)

# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")

# Add a vector column
table.alter(
    AlterTableAddColumns(
        columns={
            "example_vector": TableVectorColumnTypeDescriptor(
                dimension=1024,
            ),
        },
    )
)
import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
  token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");

// Add a vector column
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          example_vector: { type: "vector", dimension: 1024 },
        },
      },
    },
  });
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column
    AlterTableAddColumns alterOperation =
        new AlterTableAddColumns()
            .addColumnVector(
                "example_vector",
                new TableColumnDefinitionVector().dimension(1024).metric(SimilarityMetric.COSINE));
    table.alter(alterOperation);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          "example_vector": {
            "type": "vector",
            "dimension": 1024
          }
        }
      }
    }
  }
}'

Add a vector column and configure an embedding provider integration

When you add a vector column to a table, you can configure an embedding provider integration for the column. The integration will automatically generate vector embeddings for any data inserted into the column.

The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.

The original data isn’t stored automatically. If you want to store the original data in addition to the vector embeddings that were generated from the data, then you need to create a separate column and manually store the original data in that column.

After you add a vector column, you should index the column if you want run vector searches on the column. For more information, Create a vector index.

  • Python

  • TypeScript

  • Java

  • curl

The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Azure OpenAI integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="azureOpenAI",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                    parameters={
                        "resourceName": "RESOURCE_NAME",
                        "deploymentId": "DEPLOYMENT_ID",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Hugging Face Dedicated integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="huggingfaceDedicated",
                    model_name="endpoint-defined-model",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                    parameters={
                        "endpointName": "ENDPOINT_NAME",
                        "regionName": "REGION_NAME",
                        "cloudName": "CLOUD_NAME",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Hugging Face Serverless integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="huggingface",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Jina AI integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="jinaAI",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Mistral AI integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="mistral",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The NVIDIA integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                service=VectorServiceOptions(
                    provider="nvidia",
                    model_name="nvidia/nv-embedqa-e5-v5",
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The OpenAI integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="openai",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                    parameters={
                        "organizationId": "ORGANIZATION_ID",
                        "projectId": "PROJECT_ID",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Upstage integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="upstageAI",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddColumns,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions,
    TableScalarColumnTypeDescriptor,
    ColumnType
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Add a vector column and configure an embedding provider
table.alter(
    AlterTableAddColumns(
        columns={
            # This column will store vector embeddings.
            # The Voyage AI integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
                dimension=MODEL_DIMENSIONS,
                service=VectorServiceOptions(
                    provider="voyageAI",
                    model_name="MODEL_NAME",
                    authentication={
                        "providerKey": "API_KEY_NAME",
                    },
                ),
            ),
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
                column_type=ColumnType.TEXT
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embedding_api_key parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Azure OpenAI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'azureOpenAI',
              modelName: 'MODEL_NAME',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
              parameters: {
                resourceName: 'RESOURCE_NAME',
                deploymentId: 'DEPLOYMENT_ID',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Hugging Face Dedicated integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'huggingfaceDedicated',
              modelName: 'endpoint-defined-model',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
              parameters: {
                endpointName: 'ENDPOINT_NAME',
                regionName: 'REGION_NAME',
                cloudName: 'CLOUD_NAME',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Hugging Face Serverless integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'huggingface',
              modelName: 'MODEL_NAME',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Jina AI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'jinaAI',
              modelName: 'MODEL_NAME',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Mistral AI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'mistral',
              modelName: 'MODEL_NAME',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The NVIDIA integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            service: {
              provider: 'nvidia',
              modelName: 'nvidia/nv-embedqa-e5-v5',
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The OpenAI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'openai',
              modelName: 'MODEL_NAME}',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
              parameters: {
                organizationId: 'ORGANIZATION_ID',
                projectId: 'PROJECT_ID',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Upstage integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'upstageAI',
              modelName: 'MODEL_NAME',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      add: {
        columns: {
          // This column will store vector embeddings.
          // The Voyage AI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            type: "vector",
            dimension: MODEL_DIMENSIONS,
            service: {
              provider: 'voyageAI',
              modelName: 'MODEL_NAME',
              authentication: {
                providerKey: 'API_KEY_NAME',
              },
            },
          },
          // If you want to store the original text
          // in addition to the generated embeddings
          // you must create a separate column.
          TEXT_COLUMN_NAME: "text",
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingApiKey parameter when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Define parameters for the embedding provider
    Map<String, Object > params = new HashMap<>();
    params.put("resourceName", "RESOURCE_NAME");
    params.put("deploymentId", "DEPLOYMENT_ID");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Azure OpenAI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("azureOpenAI")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                        .parameters(params)
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Define parameters for the embedding provider
    Map<String, Object > params = new HashMap<>();
    params.put("endpointName", "ENDPOINT_NAME");
    params.put("regionName", "REGION_NAME");
    params.put("cloudName", "CLOUD_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Hugging Face Dedicated integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("huggingfaceDedicated")
                        .modelName("endpoint-defined-model")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Hugging Face Serverless integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("huggingface")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Jina AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("jinaAI")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Mistral AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("mistral")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The NVIDIA integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .metric(SimilarityMetric.COSINE)
                .service(
                    new VectorServiceOptions()
                        .provider("nvidia")
                        .modelName("nvidia/nv-embedqa-e5-v5")
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Define parameters for the embedding provider
    Map<String, Object > params = new HashMap<>();
    params.put("organizationId", "ORGANIZATION_ID");
    params.put("projectId", "PROJECT_ID");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The OpenAI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("openai")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                        .parameters(params)
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Upstage integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("upstageAI")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.columns.TableColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.commands.AlterTableAddColumns;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Add a vector column and configure an embedding provider
    AlterTableAddColumns alterOperation = new AlterTableAddColumns()
        // This column will store vector embeddings.
        // The Voyage AI integration
        // will automatically generate vector embeddings
        // for any text inserted to this column.
        .addColumnVector(
            "VECTOR_COLUMN_NAME",
            new TableColumnDefinitionVector()
                .dimension(MODEL_DIMENSIONS)
                .metric(SimilarityMetric.SIMILARITY_METRIC)
                .service(
                    new VectorServiceOptions()
                        .provider("voyageAI")
                        .modelName("MODEL_NAME")
                        .authentication(Map.of("providerKey", "API_KEY_NAME"))
                )
        )
        // If you want to store the original text
        // in addition to the generated embeddings
        // you must create a separate column.
        .addColumnText("TEXT_COLUMN_NAME");

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in the embeddingAuthProvider() method of CreateTableOptions when you instantiate a Table object with the commands to create a table or get a table. The client will send the x-embedding-api-key header with the specified key to any underlying HTTP request that requires vectorize authentication. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search. You can use this authentication method only if all affected columns use the same embedding provider.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Azure OpenAI integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "azureOpenAI",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              },
              "parameters": {
                "resourceName": "RESOURCE_NAME",
                "deploymentId": "DEPLOYMENT_ID"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Azure OpenAI API key that you want to use. Must be the name of an existing Azure OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Hugging Face Dedicated integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "huggingfaceDedicated",
              "modelName": "endpoint-defined-model",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              },
              "parameters": {
                "endpointName": "ENDPOINT_NAME",
                "regionName": "REGION_NAME",
                "cloudName": "CLOUD_NAME"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Dedicated user access token that you want to use. Must be the name of an existing Hugging Face Dedicated user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Hugging Face Serverless integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "huggingface",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Hugging Face Serverless user access token that you want to use. Must be the name of an existing Hugging Face Serverless user access token in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Jina AI integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "jinaAI",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Jina AI API key that you want to use. Must be the name of an existing Jina AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Mistral AI integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "mistral",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Mistral AI API key that you want to use. Must be the name of an existing Mistral AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The NVIDIA integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "service": {
              "provider": "nvidia",
              "modelName": "nvidia/nv-embedqa-e5-v5"
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The OpenAI integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "openai",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              },
              "parameters": {
                "organizationId": "ORGANIZATION_ID",
                "projectId": "PROJECT_ID"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the OpenAI API key that you want to use. Must be the name of an existing OpenAI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Upstage integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "upstageAI",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Upstage API key that you want to use. Must be the name of an existing Upstage API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "add": {
        "columns": {
          # This column will store vector embeddings.
          # The Voyage AI integration
          # will automatically generate vector embeddings
          # for any text inserted to this column.
          "VECTOR_COLUMN_NAME": {
            "type": "vector",
            "dimension": MODEL_DIMENSIONS,
            "service": {
              "provider": "voyageAI",
              "modelName": "MODEL_NAME",
              "authentication": {
                "providerKey": "API_KEY_NAME"
              }
            }
          },
          # If you want to store the original text
          # in addition to the generated embeddings
          # you must create a separate column.
          "TEXT_COLUMN_NAME": "text"
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

  • API_KEY_NAME: The name of the Voyage AI API key that you want to use. Must be the name of an existing Voyage AI API key in the Astra Portal. For more information, see Embedding provider authentication.

    Alternatively, you can omit this parameter and instead provide the authentication key in an x-embedding-api-key header. Header authentication overrides the API_KEY_NAME parameter if you set both. If you use the header instead of specifying the API_KEY_NAME parameter, you must include the header in every command that uses vectorize, including writes and vector search.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

  • MODEL_DIMENSIONS: The number of dimensions that you want the generated vectors to have. Your chosen embedding model must support the specified number of dimensions.

    If you omit dimension, Astra can use a default dimension value. However, some models don’t have default dimensions. You can use the Data API to find supported embedding providers and their configuration parameters, including dimensions ranges and default dimensions.

Drop columns from a table

Dropping columns produces tombstones. Excessive tombstones can impact query performance.

  • Python

  • TypeScript

  • Java

  • curl

The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.

from astrapy import DataAPIClient
from astrapy.info import AlterTableDropColumns

# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")

# Drop columns
table.alter(
    AlterTableDropColumns(
        columns=["is_summer_reading", "library_branch"],
    ),
)
import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
  token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");

// Drop columns
(async function () {
  await table.alter({
    operation: {
      drop: {
        columns: ["is_summer_reading", "library_branch"],
      },
    },
  });
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableDropColumns;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Drop columns
    AlterTableDropColumns alterOperation =
        new AlterTableDropColumns("is_summer_reading", "library_branch");
    table.alter(alterOperation);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "drop": {
        "columns": ["is_summer_reading", "library_branch"]
      }
    }
  }
}'

Add automatic embedding generation to existing vector columns

You can configure an embedding provider integration for an existing vector column. The integration will automatically generate vector embeddings for any data inserted into the column.

The configuration depends on the embedding provider. For the configuration and an example for each provider, see Supported embedding providers.

If your vector column already includes vector data, make sure the service options are compatible with the existing embeddings. This ensures accurate vector search results.

  • Python

  • TypeScript

  • Java

  • curl

The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="azureOpenAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "resourceName": "RESOURCE_NAME",
                    "deploymentId": "DEPLOYMENT_ID",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
        "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="huggingfaceDedicated",
                model_name="endpoint-defined-model",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "endpointName": "ENDPOINT_NAME",
                    "regionName": "REGION_NAME",
                    "cloudName": "CLOUD_NAME",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="huggingface",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="jinaAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="mistral",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="nvidia",
                model_name="nvidia/nv-embedqa-e5-v5",
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="openai",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
                parameters={
                    "organizationId": "ORGANIZATION_ID",
                    "projectId": "PROJECT_ID",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="upstageAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import os
from astrapy import DataAPIClient
from astrapy.info import (
    AlterTableAddVectorize,
    VectorServiceOptions,
)

# Get an existing table
client = DataAPIClient("APPLICATION_TOKEN")
database = client.get_database("API_ENDPOINT")
table = database.get_table("TABLE_NAME")

# Configure an embedding provider for a column
table.alter(
    AlterTableAddVectorize(
        columns={
            "VECTOR_COLUMN_NAME": VectorServiceOptions(
                provider="voyageAI",
                model_name="MODEL_NAME",
                authentication={
                    "providerKey": "API_KEY_NAME",
                },
            ),
        },
    )
)

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Azure OpenAI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'azureOpenAI',
            modelName: 'MODEL_NAME',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
            parameters: {
              resourceName: 'RESOURCE_NAME',
              deploymentId: 'DEPLOYMENT_ID',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Hugging Face Dedicated integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'huggingfaceDedicated',
            modelName: 'endpoint-defined-model',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
            parameters: {
              endpointName: 'ENDPOINT_NAME',
              regionName: 'REGION_NAME',
              cloudName: 'CLOUD_NAME',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Hugging Face Serverless integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'huggingface',
            modelName: 'MODEL_NAME',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Jina AI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'jinaAI',
            modelName: 'MODEL_NAME',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Mistral AI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'mistral',
            modelName: 'MODEL_NAME',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The NVIDIA integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'nvidia',
            modelName: 'nvidia/nv-embedqa-e5-v5',
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The OpenAI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'openai',
            modelName: 'MODEL_NAME}',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
            parameters: {
              organizationId: 'ORGANIZATION_ID',
              projectId: 'PROJECT_ID',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Upstage integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'upstageAI',
            modelName: 'MODEL_NAME',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient("APPLICATION_TOKEN");
const database = client.db("API_ENDPOINT");
const table = database.table("TABLE_NAME");

// Define the columns and primary key for the table
(async function () {
  await table.alter({
    operation: {
      addVectorize: {
        columns: {
          // This column will store vector embeddings.
          // The Voyage AI integration
          // will automatically generate vector embeddings
          // for any text inserted to this column.
          VECTOR_COLUMN_NAME: {
            provider: 'voyageAI',
            modelName: 'MODEL_NAME',
            authentication: {
              providerKey: 'API_KEY_NAME',
            },
          },
        },
      },
    },
  });
})();

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Define parameters for the embedding provider
    Map<String, Object > params = new HashMap<>();
    params.put("resourceName", "RESOURCE_NAME");
    params.put("deploymentId", "DEPLOYMENT_ID");


    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("azureOpenAI")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    .parameters(params)
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Define parameters for the embedding provider
    Map<String, Object > params = new HashMap<>();
    params.put("endpointName", "ENDPOINT_NAME");
    params.put("regionName", "REGION_NAME");
    params.put("cloudName", "CLOUD_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("huggingfaceDedicated")
                    .modelName("endpoint-defined-model")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("huggingface")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("jinaAI")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("mistral")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("nvidia")
                    .modelName("nvidia/nv-embedqa-e5-v5")
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Define parameters for the embedding provider
    Map<String, Object > params = new HashMap<>();
    params.put("organizationId", "ORGANIZATION_ID");
    params.put("projectId", "PROJECT_ID");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("openai")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
                    .parameters(params)
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("upstageAI")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.tables.commands.AlterTableAddVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;
import com.datastax.astra.client.tables.Table;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Configure an embedding provider for a column
    AlterTableAddVectorize alterOperation = new AlterTableAddVectorize()
        .columns(
            Map.of(
                "VECTOR_COLUMN_NAME",
                new VectorServiceOptions()
                    .provider("voyageAI")
                    .modelName("MODEL_NAME")
                    .authentication(Map.of("providerKey", "API_KEY_NAME"))
            )
        );

    table.alter(alterOperation);
  }
}

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

Azure OpenAI

For more detailed instructions, see Integrate Azure OpenAI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "azureOpenAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "resourceName": "RESOURCE_NAME",
              "deploymentId": "DEPLOYMENT_ID"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

    For Azure OpenAI, you must select the model that matches the one deployed to your DEPLOYMENT_ID in Azure.

  • RESOURCE_NAME: The name of your Azure OpenAI Service resource, as defined in the resource’s Instance details. For more information, see the Azure OpenAI documentation.

  • DEPLOYMENT_ID: Your Azure OpenAI resource’s Deployment name. For more information, see the Azure OpenAI documentation.

Hugging Face - Dedicated

For more detailed instructions, see Integrate Hugging Face Dedicated as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "huggingfaceDedicated",
            "modelName": "endpoint-defined-model",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "endpointName": "ENDPOINT_NAME",
              "regionName": "REGION_NAME",
              "cloudName": "CLOUD_NAME"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: endpoint-defined-model.

    For Hugging Face Dedicated, you must deploy the model as a text embeddings inference (TEI) container.

    You must set MODEL_NAME to endpoint-defined-model because this integration uses the model specified in your dedicated endpoint configuration.

  • ENDPOINT_NAME: The programmatically-generated name of your Hugging Face Dedicated endpoint. This is the first part of the endpoint URL. For example, if your endpoint URL is https://mtp1x7muf6qyn3yh.us-east-2.aws.endpoints.huggingface.cloud, the endpoint name is mtp1x7muf6qyn3yh.

  • REGION_NAME: The cloud provider region your Hugging Face Dedicated endpoint is deployed to. For example, us-east-2.

  • CLOUD_NAME: The cloud provider your Hugging Face Dedicated endpoint is deployed to. For example, aws.

Hugging Face - Serverless

For more detailed instructions, see Integrate Hugging Face Serverless as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "huggingface",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: sentence-transformers/all-MiniLM-L6-v2, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large-instruct, BAAI/bge-small-en-v1.5, BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5.

Jina AI

For more detailed instructions, see Integrate Jina AI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "jinaAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: jina-embeddings-v2-base-en, jina-embeddings-v2-base-de, jina-embeddings-v2-base-es, jina-embeddings-v2-base-code, jina-embeddings-v2-base-zh.

Mistral AI

For more detailed instructions, see Integrate Mistral AI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "mistral",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: mistral-embed.

NVIDIA

For more detailed instructions, see Integrate NVIDIA as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "nvidia",
            "modelName": "nvidia/nv-embedqa-e5-v5"
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • TEXT_COLUMN_NAME: The name for the text column that will store the original text. Omit this column if you won’t store the original text in addition to the generated embeddings.

OpenAI

For more detailed instructions, see Integrate OpenAI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "openai",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            },
            "parameters": {
              "organizationId": "ORGANIZATION_ID",
              "projectId": "PROJECT_ID"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • ORGANIZATION_ID: Optional. The ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about organization IDs, see the OpenAI API reference.

  • PROJECT_ID: Optional. The ID of the OpenAI project that owns the API key. This cannot use the default project. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information about project IDs, see the OpenAI API reference.

Upstage

For more detailed instructions, see Integrate Upstage as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "upstageAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: solar-embedding-1-large.

Voyage AI

For more detailed instructions, see Integrate Voyage AI as an embedding provider.

curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "addVectorize": {
        "columns": {
          "VECTOR_COLUMN_NAME": {
            "provider": "voyageAI",
            "modelName": "MODEL_NAME",
            "authentication": {
              "providerKey": "API_KEY_NAME"
            }
          }
        }
      }
    }
  }
}'

Replace the following:

  • APPLICATION_TOKEN: A secure reference to your application token.

  • API_ENDPOINT: Your database’s endpoint.

  • TABLE_NAME: The name for your table.

  • VECTOR_COLUMN_NAME: The name for your vector column.

  • MODEL_NAME: The model that you want to use to generate embeddings. The available models are: voyage-2, voyage-code-2, voyage-finance-2, voyage-large-2, voyage-large-2-instruct, voyage-law-2, voyage-multilingual-2.

Remove automatic embedding generation from vector columns

You can remove automatic embedding generation for one or more vector columns. Removing a vectorize integration from a column does not remove the vector embeddings stored in the column.

  • Python

  • TypeScript

  • Java

  • curl

The following example uses untyped documents or rows, but you can define a client-side type for your collection to help statically catch errors. For examples, see Typing support.

from astrapy import DataAPIClient
from astrapy.info import AlterTableDropVectorize

# Get an existing table
client = DataAPIClient()
database = client.get_database("API_ENDPOINT", token="APPLICATION_TOKEN")
table = database.get_table("TABLE_NAME")

# Remove automatic embedding generation
table.alter(
    AlterTableDropVectorize(
        columns=["plot_synopsis"],
    ),
)
import { DataAPIClient } from "@datastax/astra-db-ts";

// Get an existing table
const client = new DataAPIClient();
const database = client.db("API_ENDPOINT", {
  token: "APPLICATION_TOKEN",
});
const table = database.table("TABLE_NAME");

// Remove automatic embedding generation
(async function () {
  await table.alter({
    operation: {
      dropVectorize: {
        columns: ["plot_synopsis"],
      },
    },
  });
})();
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.commands.AlterTableDropVectorize;
import com.datastax.astra.client.tables.definition.rows.Row;

public class Example {

  public static void main(String[] args) {
    // Get an existing table
    Table<Row> table =
        new DataAPIClient("APPLICATION_TOKEN")
            .getDatabase("API_ENDPOINT")
            .getTable("TABLE_NAME");

    // Remove automatic embedding generation
    AlterTableDropVectorize alterOperation = new AlterTableDropVectorize("plot_synopsis");
    table.alter(alterOperation);
  }
}
curl -sS -L -X POST "API_ENDPOINT/api/json/v1/KEYSPACE_NAME/TABLE_NAME" \
  --header "Token: APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "alterTable": {
    "operation": {
      "dropVectorize": {
        "columns": ["plot_synopsis"]
      }
    }
  }
}'

Client reference

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the client reference.

For more information, see the client reference.

For more information, see the client reference.

Client reference documentation is not applicable for HTTP.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM