Forward-Looking Active REtrieval (FLARE)

Run this on Colab

FLARE is an advanced retrieval technique that combines retrieval and generation in LLMs. It enhances the accuracy of responses by iteratively predicting the upcoming sentence to anticipate future content when the model encounters a token it is uncertain about.

For more, see the FLARE GitHub repository.

The basic workflow is:

Send a query.
The model generates tokens while iteratively predicting the upcoming sentence.
If the model sees a token with a low confidence level, it uses the predicted sentence as a query to retrieve new, relevant documents.
The upcoming sentence is regenerated using the retrieved documents.
Repeat steps 2-4 until the response is complete.

In this tutorial, you will use an Astra DB Serverless vector store, an OpenAI embedding model, an OpenAI LLM, and LangChain to orchestrate FLARE in a RAG pipeline.

Prerequisites

You will need an vector-enabled Astra DB Serverless database and an OpenAI Account.

See the Notebook Prerequisites page for more details.

Create an vector-enabled Astra DB Serverless database.
Create an OpenAI account
Within your database, create an Astra DB keyspace
Within your database, create an Astra DB Access Token with Database Administrator permissions.
Get your Astra DB Serverless API Endpoint: https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com

Initialize the environment variables in a .env file.

ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
ASTRA_DB_API_ENDPOINT=https://9d9b9999-999e-9999-9f9a-9b99999dg999-us-east-2.apps.astra.datastax.com
ASTRA_DB_COLLECTION=test
OPENAI_API_KEY=sk-f99...

Enter your settings for Astra DB Serverless and OpenAI:

astra_token = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
astra_endpoint = os.getenv("ASTRA_DB_API_ENDPOINT")
collection = os.getenv("ASTRA_DB_COLLECTION")
openai_api_key = os.getenv("OPENAI_API_KEY")

Setup

ragstack-ai includes all the packages you need to build a FLARE pipeline.

Install the following libraries.
```
pip install ragstack-ai
```

Import dependencies.

import os
from dotenv import load_dotenv
from langchain_astradb import AstraDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain.globals import set_verbose
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAI
from langchain.chains import FlareChain
from langchain.chains.flare.base import QuestionGeneratorChain

Configure embedding model and load vector store

Configure your embedding model and vector store:

embedding = OpenAIEmbeddings()
vstore = AstraDBVectorStore(
        collection_name=collection,
        embedding=embedding,
        token=astra_token,
        api_endpoint=astra_endpoint
    )
print("Astra vector store configured")

Retrieve the text of a short story that will be indexed in the vector store:

curl https://raw.githubusercontent.com/CassioML/cassio-website/main/docs/frameworks/langchain/texts/amontillado.txt --output amontillado.txt
input = "amontillado.txt"

Create embeddings by inserting your documents into the vector store. The final print statement verifies that the documents were embedded.

loader = TextLoader(input)
documents = loader.load_and_split()

inserted_ids = vstore.add_documents(documents)
print(f"\nInserted {len(inserted_ids)} documents.")

print(vstore.astra_db.collection(collection).find())

Create a FLARE chain

Using LangChain’s FLARE chain with verbose mode on, we can see exactly what is happening under the hood.

Set verbose mode and configure FLARE chain:

from langchain.globals import set_verbose # already imported, just for clarity
set_verbose(True)

retriever = vstore.as_retriever()

flare = FlareChain.from_llm(
    llm=ChatOpenAI(temperature=0),
    retriever=retriever,
    max_generation_len=256,
    min_prob=0.3,
)

Run the FLARE chain with a query:

query = "Who is Luchesi in relation to Antonio?"
flare.run(query)

You now have a fully functioning RAG pipeline using the FLARE technique! FLARE is one of many ways to improve RAG.

See our other examples for advanced RAG techniques, as well as evaluation examples that compare results using multiple RAG techniques.

Forward-Looking Active REtrieval (FLARE)

Prerequisites

Setup

Configure embedding model and load vector store

Create a FLARE chain

Was this helpful?

Give Feedback