Create an Astra DB component in Langflow

The Astra DB component creates a vector store that uses Astra DB as the underlying database to store and retrieve documents.

The vector store can be used for similarity search, document retrieval, embeddings storage, and other vector-based operations.

The Vector Store RAG guide demonstrates how to use the Astra DB component in a vector RAG template flow, but you can use the Astra components as the vector store in any Langflow application.

This guide shows you how to create and configure an Astra DB component in Langflow to store and retrieve embeddings.

Open Langflow and start a new project

  1. In the Astra Portal header, switch your active app from Astra DB Serverless to Langflow.

  2. In Langflow, click New Flow, and then select Blank Flow. A blank workspace opens where you can build your flow.

Create an Astra DB component in Langflow

The Astra DB component initializes an Astra DB vector store with vector indices to efficiently store and retrieve documents.

To add the Astra DB component to your Langflow Workspace:

  1. In Langflow’s Components menu, select Vector Stores, and then click Astra DB.

  2. Click and drag the component to the Workspace.

    langflow astradb component

    In the Astra DB component, enter the following information:

  3. In the Astra DB Application Token field, enter your Astra DB authentication token, or select it from the dropdown list if you’ve created a global variable.

  4. In the Database field, select your existing database from the dropdown menu, or select Create new database. For more information, see Create a database.

    Astra organizations on the Free plan can create up to five databases. If you reach the limit, the Create new database option becomes inactive.

    To re-enable database creation, either terminate an existing database or upgrade your plan.

  5. In the Collection field, select your existing collection from the dropdown menu, or select Create new collection.

    1. To create a new collection, enter a name for the collection.

    2. Select the new collection’s vector dimensions. The collection’s vector dimensions must match the dimensions of the embeddings generated by the OpenAI Embeddings component. If you select an embeddings component other than OpenAI, the vector dimensions must match the dimensions of the embeddings generated by that provider. For more on embeddings models, see Embedding models.

    3. Select the new collection’s similarity metric. The default Similarity Metric value is cosine, but other options are supported. For more on similarity metrics, see similarity metrics.

  6. In the Search Input field, you can either enter a query to search for documents in the Astra DB collection, or connect an Input component to receive user queries directly from the Playground.

Load data into Astra DB

The Astra DB component can load data into the vector store from an external embeddings model, such as OpenAI, or Create a collection with Astra vectorize. To create a data flow to load data from a local file into Astra DB as embeddings:

  1. Click Data, select the File component, and then drag it to the canvas. The File component loads a file from your local machine.

  2. Click Processing, select the Split Text component, and then drag it to the canvas. The Split Text component splits the loaded text into smaller chunks.

  3. Click Embeddings, select the OpenAI Embeddings component, and then drag it to the canvas. The OpenAI Embeddings component generates embeddings for the user’s input, which are compared to the vector data in the database.

  4. Connect the new components so your flow looks like this:

    astra db load data flow
  5. Configure the embeddings component. The Astra DB component can store embeddings generated by an external model, such as OpenAI, Hugging Face, or any other embedding model that Langflow supports. This example uses the OpenAI Embeddings component to generate embeddings for documents and store them in Astra DB.

  6. In the OpenAI Embeddings component, enter your OpenAI API key, or select it from the dropdown list if you’ve created a global variable.

  7. In the OpenAI Embeddings component, select the embeddings Model. The embeddings model’s vector dimensions must match the dimensions of the Astra DB collection. If they do not match, you will receive an error when you try to store the embeddings.

Create a collection with Astra vectorize

You can’t change a collection’s embedding provider or embedding generation method after you create it. To use a different embedding provider, you must create a new collection with a different vectorize embedding provider integration.

The Astra DB component includes Astra vectorize, which generates server-side embeddings for documents and stores the vectors in Astra DB.

Using Astra vectorize requires you to initialize the collection in Astra DB Serverless with the Astra vectorize option enabled and the embeddings provider selected.

To create a collection with Astra vectorize and the NVIDIA provider, follow the steps in Integrate NVIDIA as an embedding provider.

In use the collection created with Astra vectorize in the the Astra DB component:

  1. In the Astra DB Application Token field, enter your Astra DB authentication token, or select it from the dropdown list if you’ve created a global variable.

  2. In the Database field, select your existing database from the dropdown menu, or select Create new database.

  3. In the Collection field, select the collection you created with Astra vectorize.

    If you lose track of your collection’s embedding provider, model, or vector dimensions, you can find this information in the Astra DB console under Data Explorer.

  4. In the Embedding Model or Astra Vectorize field, select Astra Vectorize.

  5. In the Embedding Provider field, select the provider you used to create the collection, which in this case is NVIDIA.

  6. In the Model field, select the model you used to create the collection, which in this case is NV-Embed-QA.

    The Astra DB component is now configured to load data into the collection created with Astra vectorize.

    AstraDB component with vectorize

Run the flow

  1. In the File component, select a file to load.

    The File loader has a file size limit of 100 MB. For a list of supported file formats, see the File component documentation.

  2. To load the file, in the Astra DB component, click play_arrow Play. The file passes through the Split Text component, which splits the text into smaller chunks. These chunks become units of meaning when they are embedded as vectors into the database.

  3. In the Astra Portal header, switch your active app from Langflow to Astra DB Serverless.

  4. Navigate to your database and click Data Explorer.

  5. Select your collection to see the stored embeddings.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com