Auto-generate embeddings with vectorize

For Serverless (Vector) databases, you can bring your own embeddings or use Astra DB vectorize to automatically generate embeddings through an embedding provider integration:

Bring your own embeddings	Generate your own embeddings on the client side, and then import them when you insert data.
Auto-generate embeddings with vectorize	Configure an embedding provider integration, and then use vectorize to automatically generate embeddings from text for any operation that requires a vector. Enabling a vectorize integration also enables the Unstructured data loader integration. Vectorize doesn’t support embeddings for images. If you need embeddings for image data, you must bring your own embeddings.

Learn more about embedding providers

Embedding providers are services that help you generate embeddings for your data to perform vector search queries.

The provider handles infrastructure, model maintenance, and other tasks necessary to generate embeddings from embedding models.

Providers may use one or more models to generate embeddings. When choosing an embedding provider, consider factors like the available embedding models, vector dimensions, supported data types, quality, accuracy, and scalability.

Supported embedding providers

Vectorize generates embeddings through integrations with supported embedding providers:

External providers	Integrate your embedding provider account with Astra DB.
Astra-hosted providers	Use a DataStax-managed embedding provider hosted within Astra DB. Databases in supported regions can configure collections and tables to automatically use these integrations.

Astra DB Serverless offers the following embedding provider integrations:

Embedding provider	External	Astra-hosted	Docs
Azure OpenAI			Get started
Hugging Face - Dedicated			Get started
Hugging Face - Serverless			Get started
Jina AI			Get started
Mistral AI			Get started
NVIDIA			Get started
OpenAI			Get started
Upstage			Get started
Voyage AI			Get started

Embedding provider

External

Astra-hosted

Docs

Azure OpenAI

Get started

Hugging Face - Dedicated

Get started

Hugging Face - Serverless

Jina AI

Mistral AI

NVIDIA

OpenAI

Upstage

Voyage AI

External embedding provider integrations

To use an external embedding provider with Astra DB vectorize, you must add the embedding provider integration to your Astra organization, and then you can select your embedding provider when you create a collection or use the Data API to add a vector column to a table.

All providers follow the same general integration process. However, each provider has specific configuration options, such as models, dimensions, credentials, and other parameters. For complete instructions, see the documentation for your embedding provider.

Your external embedding provider integration uses your embedding provider account to generate embeddings. You can incur billed charges for this use according to your agreement with your provider.

Embedding provider authentication

Astra DB needs authorization to request embeddings from your embedding provider. To do this, you provide credentials that authenticate Astra DB to your embedding provider account.

Initial configuration

Add the external embedding provider integration to your Astra organization.
Add embedding provider credentials, such as API keys or access tokens.
Add databases to the credentials' scoped databases. This makes the integration available to new collections and tables in the scoped databases.
Attach the integration to a collection or table:
- Collections: Create a new collection in a scoped database, select the embedding provider integration, and then select a credential from the pool of credentials that are available to the database. If you cannot select your integration or credential when you create a collection in the Astra Portal, see Potential scope delay.
- Tables: Use the Data API to add the embedding provider integration to a vector column in a new or existing table. You can do this when you create a table or you can alter an existing table to add a vector column or attach a vectorize integration to an existing vector column.
Insert data into the collection or table to auto-generate embeddings with vectorize. Astra DB vectorize uses the selected credential to request embeddings from your embedding provider.

For complete instructions, see the documentation for your embedding provider.

Manage credentials for existing integrations

After you configure an embedding provider integration, you can manage the integration’s credentials and scoped databases in the Astra Portal:

In the Astra Portal header, click Settings.
In the Settings navigation menu, click the name of the active organization, and then select the organization where you want to manage an integration.

If the organization belongs to an enterprise, select the enterprise, and then select the organization in the Organizations list.
In the Settings navigation menu, click Integrations.
Click the embedding provider integration that you want to manage, and then add credentials, remove credentials, or change the scoped databases for each credential.

When you remove a credential from Astra DB, the credential is not removed from your embedding provider account. Make sure to delete the credential from your embedding provider account if you no longer need it.

Removing a credential immediately disables vectorize embedding generation in all collections and tables that use the removed credential.

Multiple credentials

For greater access control, you can add multiple credentials, and each credential can have different scoped databases. You can also add the same database to multiple credential scopes.

However, regardless of the number of integrations or credentials that are available to the database, there is a one-to-one relationship between embedding provider integrations and credentials when attached to a collection or column in a table:

Collections: When you create a collection, you select only one integration and one credential for the collection, and this selection is locked for the life of the collection.
Tables: With the Data API, you can attach one integration and one credential to each vector column in a table. If the table has multiple vector columns, you can select a different integration and credential for each column. Additionally, you can add and remove embedding provider integrations from vector columns at any time.

Scoped databases

Embedding provider integrations aren’t inherently available to all databases. You decide the databases, collections, and tables that can use each integration.

To make an external embedding provider integration available to a database, you must link the database to an embedding provider credential. When you do this, you add the database to the credential’s scoped databases. When you create a collection or use the Data API to create or alter a table in a scoped database, you can choose from the integrations and credentials that are available to the database.

You can add a new database to a credential’s scope as soon as you create the database, even while the database is initializing. Once the database becomes active, you can create collections and tables that use the external embedding provider integration.

Potential scope delay

Usually, when you add a database to a credential’s scope, the integration is almost immediately available for use in the database. Rarely, Astra DB can take a few minutes to propagate a scope change. Typically, this delay occurs when you add a new embedding provider integration while creating a collection in the Astra Portal because it can take time to activate the integration in your organization and the scoped databases.

If you cannot select your integration when you create a collection or use the Data API to create/alter a vector column in a table, make sure the database is scoped to at least one of the provider’s credentials in Astra DB. If the database is in scope and you recently added the integration or credential, wait a few minutes and try again.

Change providers or credentials

Changing the provider or credentials is different for collections and tables.

Change providers or credentials for a collection

When you create a vector-enabled collection, you select the collection’s embedding generation method, which can include an embedding provider integration and credential. These selections are permanent for the life of the collection.

To change a collection’s embedding provider, you must create a new collection that uses that integration.

To rotate credentials without changing embedding providers, you must remove the credential, and then recreate it with the same name and scoped databases:

In your embedding provider account, delete the old credential and create a new credential.

Removing a credential from either your embedding provider or Astra DB immediately disables $vectorize embedding generation for any collection that uses that credential. Vectorize remains unavailable until you add the new credential to the embedding provider integration.
In the Astra Portal header, click Settings.
In the Settings navigation menu, click the name of the active organization, and then select the organization where you want to manage an integration.

If the organization belongs to an enterprise, select the enterprise, and then select the organization in the Organizations list.
In the Settings navigation menu, click Integrations.
Click the embedding provider integration that you want to manage.
In the API keys section, find the credential that you want to remove. Make a note of the credential’s name and scoped databases. When you recreate the credential, it must have the exact same name and scope.
Click More, and then select Remove API key. In the confirmation dialog, enter the API key name, and then click Remove key.
Click Add API key to add a new credential with the same API key name as the removed credential.

If the name doesn’t match, any collections that used the removed credential cannot detect the replacement.
Add all relevant databases to the new credential’s scoped databases.

At minimum, you must add all databases that used the removed credential so that the collections in those databases can detect the replacement. To ensure that you don’t miss any databases, DataStax recommends adding all of the databases that were in the removed credential’s scope.

Change providers or credentials for a table

If you want to change the embedding provider integration or credential for a column in a table, you alter the table. You must first remove the embedding provider integration from the vector column, and then reattach the embedding provider integration with the new credentials.

Troubleshoot vectorize

When working with vectorize, including the $vectorize reserved field in the Data API, errors can occur from two sources:

Astra DB: There is an issue within Astra DB, including the Astra platform, the Data API server, Data API clients, or something else.

Some of the most common Astra DB vectorize errors are related to scoped databases. In your vectorize integration settings, make sure your database is in the scope of the credential that you want to use. Scoped database errors don’t apply to the NVIDIA Astra-hosted embedding provider integration.

When using the Data API with collections, make sure you don’t use $vector and $vectorize in the same query. For more information, see the Data API reference for collections.

When using the Data API with tables, you can only run a vector search on one vector column at a time. To generate an embedding from a string, the target vector column must have a defined embedding provider integration. For more information, see the Data API tables references, such as Vector type and Sort clauses for tables.
The embedding provider: The embedding provider encountered an issue while processing the embedding generation request. Astra DB passes these errors to you through the Astra Portal or Data API with a qualifying statement such as The embedding provider returned a HTTP client error.

Possible embedding provider errors include rate limiting, billing or account funding issues, and chunk or token size limits. For more information about these errors, see the embedding provider’s documentation, including the documentation for your chosen model.

Carefully read all error messages to determine the source and possible cause for the issue.

Auto-generate embeddings with vectorize

Supported embedding providers

External embedding provider integrations

Embedding provider authentication

Initial configuration

Manage credentials for existing integrations

Multiple credentials

Scoped databases

Potential scope delay

Change providers or credentials

Change providers or credentials for a collection

Change providers or credentials for a table

Troubleshoot vectorize

See also

Was this helpful?

Give Feedback