Quickstart for DataStax Langflow

Learn how to build a Retrieval Augmented Generation (RAG) application using Astra DB and Langflow.

In this quickstart, you create a document ingestion flow using Astra DB as a vector store, and a RAG Application flow that uses the documents stored in Astra DB to generate responses to your queries.

This quickstart highlights the use of Astra DB and Astra DB Search in a vector RAG project, but these components are adaptable as a vector database for any Langflow project. For more information, see Astra components in Langflow.

Create an Astra DB database

  1. Create an Astra account or sign in to an existing Astra account.

  2. In the Astra Portal, select Databases in the main navigation.

  3. Select Create Database.

  4. In the Create Database dialog, select the Serverless (Vector) deployment type.

  5. In Configuration, enter a meaningful Database name.

    You can’t change database names. Make sure the name is human-readable and meaningful. Database names must start and end with an alphanumeric character, and can contain the following special characters: & + - _ ( ) < > . , @.

  6. Select your preferred Provider and Region.

    You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.

  7. Click Create Database.

    You are redirected to your new database’s Overview screen. Your database starts in Pending status before transitioning to Initializing. You’ll receive a notification once your database is initialized.

Create an OpenAI API key

Create an OpenAI API key and save it for later use.

Enable Langflow and start a new project

  1. In the Astra Portal, select Langflow > Enable Langflow.

  2. Click New Project, and then choose the Vector Store RAG project.

This opens a starter project with the necessary components to run a RAG application using Astra DB.

This project consists of two flows: the document ingestion flow and the RAG application flow.

Run the document ingestion flow

The ingestion flow is responsible for ingesting documents into the Astra DB database.

The ingestion flow consists of the following:

  • A Files component that uploads a text file to Langflow.

  • A Recursive Character Text Splitter component that splits the text into smaller chunks.

  • An OpenAIEmbeddings component that generates embeddings for the text chunks.

  • An Astra DB component that stores the text chunks in the Astra DB database. For more information, see Astra components in Langflow.

To create the document ingestion flow:

  1. Add your credentials to the OpenAI components. The fastest way to complete these fields is with Langflow’s Global Variables.

    1. In the OpenAI API Key field, click the language icon > Add New Variable, or click your username in the top right corner and select Settings.

  2. Click Global Variables > Add New.

    1. Name your variable. Paste your OpenAI API key (sk-…​) in the Value field.

    2. In the Apply To Fields field, select the OpenAI API Key field to apply this variable to all OpenAI Embeddings components.

  3. Add your credentials to the AstraDB and Astra DB Search components using the same Global Variables process.

    1. In the Token field, click the language > Add New Variable, or click your username in the top right corner and select Settings.

    2. Name your variable. Paste your Astra token (AstraCS:…​) in the Value field.

    3. In the Apply To Fields field, select the Astra DB Application Token field to apply this variable to all Astra components.

  4. Select the Database for the Astra DB and Astra DB Search components.

  5. Choose the Collection for the Astra DB and Astra DB Search components. If you don’t have a collection, click the Create Collection button to create one.

  6. Select more_horiz Advanced and paste your API Endpoint value into the API Endpoint field.

  7. In the File component, upload a text file from your local machine with data you want to ingest into the Astra DB database.

  8. Click the play_arrow Play button in the Astra DB component to start the ingestion flow. Your file passes through the Recursive Character Text Splitter component, which splits the text into smaller chunks. These chunks are then passed to the OpenAI Embeddings component, which generates embeddings for each chunk. The embeddings are then stored in the Astra DB database.

Run the RAG application flow

The RAG application flow generates responses to your queries from the embedded documents. This application defines all of the steps from getting the user’s input, to generating a response, and finally displaying it in the Playground.

The RAG application flow consists of the following:

  • A Chat Input component that defines where to put the user input coming from the playground.

  • An OpenAI Embeddings component that generates embeddings from the user input.

  • An Astra DB Search component that retrieves the most relevant records from the Astra DB database.

  • A Text Output component that turns the records into text by concatenating them and also displays it in the playground. This component is named Extracted Chunks in the example, and that is how it appears in the playground, but it is a Text Output component in the flow.

  • A Prompt component that takes in the user input and the retrieved records as text and builds a prompt for the OpenAI model.

  • An OpenAI component that generates a response to the prompt.

  • A Chat Output component that displays the response in the Playground.

The RAG application flow components should already be set up with the necessary credentials because you used Langflow’s Global Variables feature.

To create the RAG application flow:

  1. Click the play_arrow Play button in the Chat Output component to start the RAG application flow.

  2. Once the flow has run, click the Playground Playground icon to start a chat session. Because this flow has a Chat Input and a Text Output component, the Playground displays a chat input field and an Extracted Chunks output section.

  3. Type a query into the chat session, and see how much your bot knows about your uploaded data. With each query, the Extracted Chunks section is updated to display the retrieved records.

Troubleshooting

If something goes wrong:

  1. To view Logs, click Options > Logs.

  2. Check the Inputs and Outputs tabs in the Playground and ensure that the data is being passed correctly between components.

Next steps

For more information, see the Langflow OSS documentation.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com