Auto-generate embeddings with vectorize

For Serverless (Vector) databases, you can bring your own embeddings or use Astra DB vectorize to automatically generate embeddings through an embedding provider integration:

Bring your own embeddings

Generate your own embeddings on the client side, and then import them when you load data.

Auto-generate embeddings with vectorize

Configure an embedding provider integration, and then use vectorize to automatically generate embeddings from text for any operation that requires a vector. Enabling a vectorize integration also enables the Unstructured data loader integration.

Vectorize doesn’t support embeddings for images. If you need embeddings for image data, you must bring your own embeddings.

Learn more about embedding providers

Embedding providers are services that help you generate embeddings for your data to perform vector search queries.

The provider handles infrastructure, model maintenance, and other tasks necessary to generate embeddings from embedding models.

Providers may use one or more models to generate embeddings. When choosing an embedding provider, consider factors like the available embedding models, vector dimensions, supported data types, quality, accuracy, and scalability.

Supported embedding providers

Vectorize generates embeddings through integrations with supported embedding providers:

External providers

Integrate your embedding provider account with Astra DB.

Astra-hosted providers

Use a DataStax-managed embedding provider hosted within Astra DB. Databases in supported regions can configure collections and tables to automatically use these integrations.

Astra DB Serverless offers the following embedding provider integrations:

Embedding provider External Astra-hosted Docs

Azure OpenAI

Get started

Hugging Face - Dedicated

Get started

Hugging Face - Serverless

Get started

Jina AI

Get started

Mistral AI

Get started

NVIDIA

Get started

OpenAI

Get started

Upstage

Get started

Voyage AI

Get started

External embedding provider integrations

To use an external embedding provider with Astra DB vectorize, you must add the embedding provider integration to your Astra DB organization, and then you can select your embedding provider when you create a collection or use the Data API to add a vector column to a table.

All providers follow the same general integration process. However, each provider has specific configuration options, such as models, dimensions, credentials, and other parameters. For complete instructions, see the documentation for your embedding provider.

Your external embedding provider integration uses your embedding provider account to generate embeddings. You can incur billed charges for this use according to your agreement with your provider.

Embedding provider authentication

Astra DB needs authorization to request embeddings from your embedding provider. To do this, you provide credentials that authenticate Astra DB to your embedding provider account:

  1. Add the external embedding provider integration to your Astra DB organization.

  2. Add embedding provider credentials, such as API keys or access tokens.

  3. Add databases to the credentials' scoped databases. This makes the integration available to new collections and tables in the scoped databases.

  4. Attach the integration to a collection or table:

    • Collections: Create a new collection in a scoped database, select the embedding provider integration, and then select a credential from the pool of credentials that are available to the database.

      If you can’t select your integration or credential when you create a collection in the Astra Portal, see Potential scope delay.

    • Tables: Use the Data API to add the embedding provider integration to a vector column in a new or existing table. You can do this when you create a table or you can alter an existing table to add a vector column or attach a vectorize integration to an existing vector column.

  5. Load data into the collection or table to auto-generate embeddings with vectorize. Astra DB vectorize uses the selected credential to request embeddings from your embedding provider.

For complete instructions, see the documentation for your embedding provider.

After you configure an embedding provider integration, you can manage the integration’s credentials and scoped databases in the Astra Portal:

  1. In the Astra Portal header, click Settings.

  2. In the Settings navigation menu, click the name of the current organization, and then select the organization where you want to manage an integration.

  3. In the Settings navigation menu, click Integrations.

  4. Click the embedding provider integration that you want to manage, and then add credentials, remove credentials, or change the scoped databases for each credential.

When you remove a credential from Astra DB, the credential is not removed from your embedding provider account. Make sure to delete the credential from your embedding provider account if you no longer need it.

Removing a credential immediately disables vectorize embedding generation in all collections and tables that use the removed credential.

For greater access control, you can add multiple credentials, and each credential can have different scoped databases. You can also add the same database to multiple credential scopes.

However, regardless of the number of integrations or credentials that are available to the database, there is a one-to-one relationship between embedding provider integrations and credentials when attached to a collection or column in a table:

  • Collections: When you create a collection, you select only one integration and one credential for the collection, and this selection is locked for the life of the collection.

  • Tables: With the Data API, you can attach one integration and one credential to each vector column in a table. If the table has multiple vector columns, you can select a different integration and credential for each column. Additionally, you can add and remove embedding provider integrations from vector columns at any time.

Scoped databases

Embedding provider integrations aren’t inherently available to all databases. You decide the databases, collections, and tables that can use each integration.

To make an external embedding provider integration available to a database, you must link the database to an embedding provider credential. When you do this, you add the database to the credential’s scoped databases. When you create a collection or use the Data API to create or alter a table in a scoped database, you can choose from the integrations and credentials that are available to the database.

You can add a new database to a credential’s scope as soon as you create the database, even while the database is initializing. Once the database becomes active, you can create collections and tables that use the external embedding provider integration.

Potential scope delay

Usually, when you add a database to a credential’s scope, the integration is almost immediately available for use in the database. Rarely, Astra DB can take a few minutes to propagate a scope change. Typically, this delay occurs when you add a new embedding provider integration while creating a collection in the Astra Portal because it can take time to activate the integration in your organization and the scoped databases.

If you can’t select your integration when you create a collection or use the Data API to create/alter a vector column in a table, make sure the database is scoped to at least one of the provider’s credentials in Astra DB. If the database is in scope and you recently added the integration or credential, wait a few minutes and try again.

Change providers or credentials

Changing the provider or credentials is different for collections and tables.

Change providers or credentials for a collection

When you create a vector-enabled collection, you select the collection’s embedding generation method, which can include an embedding provider integration and credential. These selections are permanent for the life of the collection.

To change a collection’s embedding provider, you must create a new collection that uses that integration.

To rotate credentials without changing embedding providers, you must remove the credential, and then recreate it with the same name and scoped databases:

  1. In your embedding provider account, delete the old credential and create a new credential.

    Removing a credential from either your embedding provider or Astra DB immediately disables $vectorize embedding generation for any collection that uses that credential. Vectorize remains unavailable until you add the new credential to the embedding provider integration.

  2. In the Astra Portal header, click Settings.

  3. In the Settings navigation menu, click the name of the current organization, and then select the organization where you want to manage an integration.

  4. In the Settings navigation menu, click Integrations.

  5. Click the embedding provider integration that you want to manage.

  6. In the API keys section, locate the credential that you want to remove. Make a note of the credential’s name and scoped databases. When you recreate the credential, it must have the exact same name and scope.

  7. Click More, and then select Remove API key. In the confirmation dialog, enter the API key name, and then click Remove key.

  8. Click Add API key to add a new credential with the same API key name as the removed credential.

    If the name doesn’t match, any collections that used the removed credential can’t detect the replacement.

  9. Add all relevant databases to the new credential’s scoped databases.

    At minimum, you must add all databases that used the removed credential so that the collections in those databases can detect the replacement. To ensure that you don’t miss any databases, DataStax recommends adding all of the databases that were in the removed credential’s scope.

Change providers or credentials for a table

If you want to change the embedding provider integration or credential for a column in a table, you alter the table. You must first remove the embedding provider integration from the vector column, and then reattach the embedding provider integration with the new credentials.

Troubleshoot vectorize

When working with vectorize, including the $vectorize reserved field in the Data API, errors can occur from two sources:

Astra DB

There is an issue within Astra DB, including the Astra DB platform, the Data API server, Data API clients, or something else.

Some of the most common Astra DB vectorize errors are related to scoped databases. In your vectorize integration settings, make sure your database is in the scope of the credential that you want to use. Scoped database errors don’t apply to the NVIDIA Astra-hosted embedding provider integration.

When using the Data API with collections, make sure you don’t use $vector and $vectorize in the same query. For more information, see the Data API collections references, such as Vector and vectorize, Insert many documents, and Sort clauses for documents.

When using the Data API with tables, you can only run a vector search on one vector column at a time. To generate an embedding from a string, the target vector column must have a defined embedding provider integration. For more information, see the Data API tables references, such as Vector type and Sort clauses for tables.

The embedding provider

The embedding provider encountered an issue while processing the embedding generation request. Astra DB passes these errors to you through the Astra Portal or Data API with a qualifying statement such as The embedding provider returned a HTTP client error.

Possible embedding provider errors include rate limiting, billing or account funding issues, and chunk or token size limits. For more information about these errors, see the embedding provider’s documentation, including the documentation for your chosen model.

Carefully read all error messages to determine the source and possible cause for the issue.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com