Generate and store embeddings in Astra DB Serverless databases
There are two ways to load embeddings into Serverless (vector) databases:
-
Bring your own embeddings: Generate your own embeddings on the client side, and then import them when you insert data or perform a vector search.
-
Auto-generate embeddings with vectorize: Configure a supported embedding provider integration, and then use vectorize to automatically generate embeddings when you insert data or perform a vector search. Enabling a vectorize integration also enables the Unstructured data loader integration.
For supported embedding providers, see Astra-hosted integrations for vectorize and External integrations for vectorize. To use a provider, model, or model setting that isn’t available through Astra vectorize, you must bring your own embeddings.
Vectorize integrations generate embeddings for text data only. If you need embeddings for images or other media, you must bring your own embeddings.
|
What are embedding providers?
Embedding providers are services that help you generate embeddings for your data to perform vector search queries. The provider handles infrastructure, model maintenance, and other tasks necessary to generate embeddings from embedding models. Providers may use one or more models to generate embeddings. When choosing an embedding provider, consider factors like the available embedding models, vector dimensions, supported data types, quality, accuracy, and scalability. |
Astra-hosted integrations for vectorize
Vectorize generates embeddings through supported embedding provider integrations.
DataStax-managed embedding provider integrations are hosted within Astra. These integrations don’t require your own embedding provider account or credentials because they are managed by Astra. However, there are restrictions on the available regions and configuration options.
Databases in supported regions can configure collections and tables to automatically use these integrations:
| Embedding provider | Documentation |
|---|---|
NVIDIA |
External integrations for vectorize
|
An external embedding provider integration uses your embedding provider account to generate embeddings. You can incur billed charges for this use according to your agreement with your provider. |
To use an external embedding provider with Astra vectorize, you must attach your embedding provider account to your Astra organization by enabling the embedding provider integration in your Astra organization. Then, you can attach the embedding provider integration to a collection or table.
All providers follow the same general integration process. However, each provider has specific configuration options, such as models, dimensions, credentials, and other parameters. For complete instructions, see the documentation for your embedding provider:
| Embedding provider | Documentation |
|---|---|
Azure OpenAI |
|
Hugging Face - Dedicated |
|
Hugging Face - Serverless |
|
Jina AI |
|
Mistral AI |
|
OpenAI |
|
Upstage |
|
Voyage AI |