RAG with LlamaIndex and Astra DB Serverless

Build a RAG pipeline with RAGStack, Astra DB Serverless, and LlamaIndex.

Prerequisites

You will need an vector-enabled Astra DB Serverless database.

Create an vector-enabled Astra DB Serverless database.
Within your database, create an Astra DB Access Token with Database Administrator permissions.
Get your Astra DB Serverless API Endpoint:
- https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com

Install the following dependencies:

pip install ragstack-ai python-dotenv

See the Prerequisites page for more details.

Set up your local environment

Create a .env file in your application directory with the following environment variables:

ASTRA_DB_APPLICATION_TOKEN=AstraCS: ...
ASTRA_DB_API_ENDPOINT=https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com
OPENAI_API_KEY=sk-...

If you’re using Google Colab, you’ll be prompted for these values in the Colab environment.

See the Prerequisites page for more details.

Create a RAG pipeline with LlamaIndex

Import dependencies and load environment variables.

import os
from dotenv import load_dotenv
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.vector_stores.astra_db import AstraDBVectorStore
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)

load_dotenv()

The dataset will be downloaded to the /data directory.

dataset = download_llama_dataset(
  "PaulGrahamEssayDataset", "./data"
)

Create the vector store, populate the vector store with the dataset, and create the index.

documents = SimpleDirectoryReader("./data/source_files").load_data()
print(f"Total documents: {len(documents)}")
print(f"First document, id: {documents[0].doc_id}")
print(f"First document, hash: {documents[0].hash}")
print(
    "First document, text"
    f" ({len(documents[0].text)} characters):\n"
    f"{'=' * 20}\n"
    f"{documents[0].text[:360]} ..."
)

# Create a vector store instance
astra_db_store = AstraDBVectorStore(
    token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"),
    collection_name="test",
    embedding_dimension=1536,
)

# Create a default storage context for the vector store
storage_context = StorageContext.from_defaults(vector_store=astra_db_store)

# Create a vector index from your documents
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

Query the vector store index for the most relevant answer to your prompt, "Why did the author choose to work on AI?"

# single query for most relevant result
query_engine = index.as_query_engine()
query_string_1 = "Why did the author choose to work on AI?"
response = query_engine.query(query_string_1)

print(query_string_1)
print(response.response)

Retrieve results from your vector store index based on your prompt. This will retrieve three nodes with their relevance scores.

# similarity search with scores
retriever = index.as_retriever(
    vector_store_query_mode="default",
    similarity_top_k=3,
)

nodes_with_scores = retriever.retrieve(query_string_1)

print(query_string_1)
print(f"Found {len(nodes_with_scores)} nodes.")
for idx, node_with_score in enumerate(nodes_with_scores):
    print(f"    [{idx}] score = {node_with_score.score}")
    print(f"        id    = {node_with_score.node.node_id}")
    print(f"        text  = {node_with_score.node.text[:90]} ...")

Set the retriever to sort results by Maximal Marginal Relevance, or MMR, instead of the default similarity search.

# MMR
retriever = index.as_retriever(
    vector_store_query_mode="mmr",
    similarity_top_k=3,
    vector_store_kwargs={"mmr_prefetch_factor": 4},
)

nodes_with_scores = retriever.retrieve(query_string_1)

print(query_string_1)
print(f"Found {len(nodes_with_scores)} nodes.")
for idx, node_with_score in enumerate(nodes_with_scores):
    print(f"    [{idx}] score = {node_with_score.score}")
    print(f"        id    = {node_with_score.node.node_id}")
    print(f"        text  = {node_with_score.node.text[:90]} ...")

Send the prompt again. The top result is the most relevant (positive number), while the other results are the least relevant (negative numbers).

Cleanup

Be a good digital citizen and clean up after yourself.

To clear data from your vector database but keep the collection, use the vstore.clear() method.

To delete the collection from your vector database, use the vstore.delete_collection() method. Alternatively, you can use the Data API to delete the collection:

curl -v -s --location \
--request POST https://${ASTRA_DB_ID}-${ASTRA_DB_REGION}.apps.astra.datastax.com/api/json/v1/default_keyspace \
--header "X-Cassandra-Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "deleteCollection": {
    "name": "test"
  }
}'

Complete code

Python

import os
from dotenv import load_dotenv
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.vector_stores.astra_db import AstraDBVectorStore
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)

load_dotenv()

dataset = download_llama_dataset(
  "PaulGrahamEssayDataset", "./data"
)

# Load the documents from the dataset into memory
documents = SimpleDirectoryReader("./data/source_files").load_data()
print(f"Total documents: {len(documents)}")
print(f"First document, id: {documents[0].doc_id}")
print(f"First document, hash: {documents[0].hash}")
print(
    "First document, text"
    f" ({len(documents[0].text)} characters):\n"
    f"{'=' * 20}\n"
    f"{documents[0].text[:360]} ..."
)

# Create a vector store instance
astra_db_store = AstraDBVectorStore(
    token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"),
    collection_name="test",
    embedding_dimension=1536,
)

# Create a default storage context for the vector store
storage_context = StorageContext.from_defaults(vector_store=astra_db_store)

# Create a vector index from your documents
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

query_engine = index.as_query_engine()
query_string_1 = "Why did the author choose to work on AI?"
response = query_engine.query(query_string_1)

print(query_string_1)
print(response.response)

retriever = index.as_retriever(
    vector_store_query_mode="mmr",
    similarity_top_k=3,
    vector_store_kwargs={"mmr_prefetch_factor": 4},
)

nodes_with_scores = retriever.retrieve(query_string_1)

print(query_string_1)
print(f"Found {len(nodes_with_scores)} nodes.")
for idx, node_with_score in enumerate(nodes_with_scores):
    print(f"    [{idx}] score = {node_with_score.score}")
    print(f"        id    = {node_with_score.node.node_id}")
    print(f"        text  = {node_with_score.node.text[:90]} ...")