Create an Astra DB component in Langflow
The Astra DB component creates a vector store that uses Astra DB as the underlying database to store and retrieve documents.
The vector store can be used for similarity search, document retrieval, embeddings storage, and other vector-based operations.
The Vector Store RAG guide demonstrates how to use the Astra DB component in a vector RAG template flow, but you can use the Astra components as the vector store in any Langflow application.
This guide shows you how to create and configure an Astra DB component in Langflow to store and retrieve embeddings.
Prerequisites
-
An active Astra DB database
Open Langflow and start a new project
-
In the Astra Portal header, switch your active app from Astra DB Serverless to Langflow.
-
In Langflow, click New Flow, and then select Blank Flow. A blank workspace opens where you can build your flow.
Create an Astra DB component in Langflow
The Astra DB component initializes an Astra DB vector store with vector indices to efficiently store and retrieve documents.
To add the Astra DB component to your Langflow Workspace:
-
In Langflow’s Components menu, select Vector Stores, and then click Astra DB.
-
Click and drag the component to the Workspace.
-
Configure the Astra DB component.
-
In the Astra DB Application Token field, add your Astra DB application token. The component connects to your database and populates the menus with existing databases and collections.
-
Select your Database. If you don’t have a database, select New database.
Astra organizations on the Free plan can create up to five databases. If you reach the limit, the Create new database option becomes inactive.
To re-enable database creation, either terminate an existing database or upgrade your plan.
Complete the Name, Cloud provider, and Region fields, and then click Create. Database creation takes a few minutes.
-
Select your Collection. Collections are created in your Astra DB Serverless deployment for storing vector data.
If you select a collection embedded with NVIDIA through Astra’s vectorize service, the Embedding Model port is removed because you have already generated embeddings for this collection with the NVIDIA
NV-Embed-QA
model. The component fetches the data from the collection and uses the same embeddings for queries.
-
-
If you don’t have a collection, create a new one within the component.
-
Select New collection.
-
Complete the Name, Embedding generation method, Embedding model, and Dimensions fields, and then click Create. Your choice for the Embedding generation method and Embedding model depends on whether you want to use embeddings generated by a provider through Astra’s vectorize service, or generated by a component in Langflow.
-
To use embeddings generated by a provider through Astra’s vectorize service, select the model from the Embedding generation method dropdown menu, and then select the model from the Embedding model dropdown menu.
-
To use embeddings generated by a component in Langflow, select Bring your own for both the Embedding generation method and Embedding model fields. In this starter project, the option for the embeddings method and model is the OpenAI Embeddings component connected to the Astra DB component.
-
The Dimensions value must match the dimensions of your collection. This field is not required if you use embeddings generated through Astra’s vectorize service. You can find this value in the Collection in your Astra DB deployment. For more information, see the DataStax Astra DB Serverless documentation.
-
-
-
In the Search Input field, you can either enter a query to search for documents in the Astra DB collection, or connect an Input component to receive user queries directly from the Playground.
Load data into Astra DB
The Astra DB component can load data into the vector store from an external embeddings model, such as OpenAI, or [astra-vectorize]. To create a data flow to load data from a local file into Astra DB as embeddings:
-
Click Data, select the File component, and then drag it to the canvas. The File component loads a file from your local machine.
-
Click Processing, select the Split Text component, and then drag it to the canvas. The Split Text component splits the loaded text into smaller chunks.
-
Click Embeddings, select the OpenAI Embeddings component, and then drag it to the canvas. The OpenAI Embeddings component generates embeddings for the user’s input, which are compared to the vector data in the database.
-
Connect the new components so your flow looks like this:
-
Configure the embeddings component. The Astra DB component can store embeddings generated by an external model, such as OpenAI, Hugging Face, or any other embedding model that Langflow supports. This example uses the OpenAI Embeddings component to generate embeddings for documents and store them in Astra DB.
-
In the OpenAI Embeddings component, enter your OpenAI API key, or select it from the dropdown list if you’ve created a global variable.
-
In the OpenAI Embeddings component, select the embeddings Model. The embeddings model’s vector dimensions must match the dimensions of the Astra DB collection. If they do not match, you will receive an error when you try to store the embeddings.
Run the flow
-
In the File component, select a file to load.
The File loader has a file size limit of 100 MB. For a list of supported file formats, see the File component documentation.
-
To load the file, in the Astra DB component, click
Play. The file passes through the Split Text component, which splits the text into smaller chunks. These chunks become units of meaning when they are embedded as vectors into the database. -
In the Astra Portal header, switch your active app from Langflow to Astra DB Serverless.
-
Navigate to your database and click Data Explorer.
-
Select your collection to see the stored embeddings.