Forward-Looking Active REtrieval (FLARE)

FLARE is an advanced retrieval technique that combines retrieval and generation in LLMs. It enhances the accuracy of responses by iteratively predicting the upcoming sentence to anticipate future content when the model encounters a token it is uncertain about.

For more, see the FLARE GitHub repository.

The basic workflow is:

  1. Send a query.

  2. The model generates tokens while iteratively predicting the upcoming sentence.

  3. If the model sees a token with a low confidence level, it uses the predicted sentence as a query to retrieve new, relevant documents.

  4. The upcoming sentence is regenerated using the retrieved documents.

  5. Repeat steps 2-4 until the response is complete.

In this tutorial, you will use an Astra DB Serverless vector store, an OpenAI embedding model, an OpenAI LLM, and LangChain to orchestrate FLARE in a RAG pipeline.

Prerequisites

You will need an vector-enabled Astra DB Serverless database and an OpenAI Account.

See the Notebook Prerequisites page for more details.

  1. Create an vector-enabled Astra DB Serverless database.

  2. Create an OpenAI account

  3. Within your database, create an Astra DB keyspace

  4. Within your database, create an Astra DB Access Token with Database Administrator permissions.

  5. Get your Astra DB Serverless API Endpoint: https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com

  6. Initialize the environment variables in a .env file.

    ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
    ASTRA_DB_API_ENDPOINT=https://9d9b9999-999e-9999-9f9a-9b99999dg999-us-east-2.apps.astra.datastax.com
    ASTRA_DB_COLLECTION=test
    OPENAI_API_KEY=sk-f99...
  7. Enter your settings for Astra DB Serverless and OpenAI:

    astra_token = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
    astra_endpoint = os.getenv("ASTRA_DB_API_ENDPOINT")
    collection = os.getenv("ASTRA_DB_COLLECTION")
    openai_api_key = os.getenv("OPENAI_API_KEY")

Setup

ragstack-ai includes all the packages you need to build a FLARE pipeline.

  1. Install the following libraries.

    pip install ragstack-ai
  2. Import dependencies.

    import os
    from dotenv import load_dotenv
    from langchain_astradb import AstraDBVectorStore
    from langchain_openai import OpenAIEmbeddings
    from langchain_community.document_loaders import TextLoader
    from langchain.globals import set_verbose
    from langchain_openai import ChatOpenAI
    from langchain_openai import OpenAI
    from langchain.chains import FlareChain
    from langchain.chains.flare.base import QuestionGeneratorChain

Configure embedding model and load vector store

  1. Configure your embedding model and vector store:

    embedding = OpenAIEmbeddings()
    vstore = AstraDBVectorStore(
            collection_name=collection,
            embedding=embedding,
            token=astra_token,
            api_endpoint=astra_endpoint
        )
    print("Astra vector store configured")
  2. Retrieve the text of a short story that will be indexed in the vector store:

    curl https://raw.githubusercontent.com/CassioML/cassio-website/main/docs/frameworks/langchain/texts/amontillado.txt --output amontillado.txt
    input = "amontillado.txt"
  3. Create embeddings by inserting your documents into the vector store. The final print statement verifies that the documents were embedded.

    loader = TextLoader(input)
    documents = loader.load_and_split()
    
    inserted_ids = vstore.add_documents(documents)
    print(f"\nInserted {len(inserted_ids)} documents.")
    
    print(vstore.astra_db.collection(collection).find())

Create a FLARE chain

Using LangChain’s FLARE chain with verbose mode on, we can see exactly what is happening under the hood.

  1. Set verbose mode and configure FLARE chain:

    from langchain.globals import set_verbose # already imported, just for clarity
    set_verbose(True)
    
    retriever = vstore.as_retriever()
    
    flare = FlareChain.from_llm(
        llm=ChatOpenAI(temperature=0),
        retriever=retriever,
        max_generation_len=256,
        min_prob=0.3,
    )
  2. Run the FLARE chain with a query:

    query = "Who is Luchesi in relation to Antonio?"
    flare.run(query)

You now have a fully functioning RAG pipeline using the FLARE technique! FLARE is one of many ways to improve RAG.

See our other examples for advanced RAG techniques, as well as evaluation examples that compare results using multiple RAG techniques.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com