RAG with LlamaParse and Astra DB Serverless

Build a RAG pipeline with RAGStack, Astra DB Serverless, and LlamaIndex.

This example demonstrates loading and parsing a PDF document with LLamaParse into an Astra DB Serverless vector store, then querying the index with LlamaIndex.

Prerequisites

You will need an vector-enabled Astra DB Serverless database.

  • Create an Astra vector database.

  • Within your database, create an Astra DB Access Token with Database Administrator permissions.

  • Get your Astra DB Serverless API Endpoint:

    • https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com

  • Create an API key at LlamaIndex.ai. Install the following dependencies:

pip install ragstack-ai

See the Prerequisites page for more details.

Set up your local environment

Create a .env file in your application directory with the following environment variables:

LLAMA_CLOUD_API_KEY=llx-...
ASTRA_DB_API_ENDPOINT=https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com
ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
OPENAI_API_KEY=sk-...

If you’re using Google Colab, you’ll be prompted for these values in the Colab environment.

See the Prerequisites page for more details.

Create RAG pipeline

  1. Import dependencies and load environment variables.

    import os
    import requests
    from dotenv import load_dotenv
    from llama_parse import LlamaParse
    from llama_index.vector_stores.astra_db import AstraDBVectorStore
    from llama_index.core.node_parser import SimpleNodeParser
    from llama_index.core import VectorStoreIndex, StorageContext, Settings
    from llama_index.llms.openai import OpenAI
    from llama_index.embeddings.openai import OpenAIEmbedding
    
    load_dotenv()
    
    llama_cloud_api_key = os.getenv("LLAMA_CLOUD_API_KEY")
    api_endpoint = os.getenv("ASTRA_DB_API_ENDPOINT")
    token = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
    openai_api_key = os.getenv("OPENAI_API_KEY")
  2. Configure global settings for LlamaIndex. (As of LlamaIndex v.0.10.0, the Settings component replaces ServiceContext. For more information, see the LlamaIndex documentation).

    Settings.llm = OpenAI(model="gpt-4", temperature=0.1)
    Settings.embed_model = OpenAIEmbedding(
        model="text-embedding-3-small", embed_batch_size=100
    )
  3. Download a PDF about attention mechanisms in transformer model architectures.

    url = "https://arxiv.org/pdf/1706.03762.pdf"
    file_path = "./attention.pdf"
    
    response = requests.get(url, timeout=30)
    if response.status_code == 200:
        with open(file_path, "wb") as file:
            file.write(response.content)
        print("Download complete.")
    else:
        print("Error downloading the file.")
  4. Load the downloaded PDF with LlamaParse as a text Document for indexing. LlamaParse also supports Markdown-type Documents with (result_type=markdown).

    documents = LlamaParse(result_type="text").load_data(file_path)
    print(documents[0].get_content()[10000:11000])
  5. Create an Astra DB Serverless vector store instance.

    astra_db_store = AstraDBVectorStore(
        token=token,
        api_endpoint=api_endpoint,
        collection_name="astra_v_table_llamaparse",
        embedding_dimension=1536
    )
  6. Parse Documents into nodes and set up storage context to use Astra DB Serverless.

    node_parser = SimpleNodeParser()
    nodes = node_parser.get_nodes_from_documents(documents)
    print(nodes[0].get_content())
    
    storage_context = StorageContext.from_defaults(vector_store=astra_db_store)
  7. Create a vector store index and query engine from your nodes and contexts.

    index = VectorStoreIndex(nodes=nodes, storage_context=storage_context)
    query_engine = index.as_query_engine(similarity_top_k=15)

Execute a query

  1. Query the Astra DB Serverless vector store for an example with expected context - this query should return a relevant response.

    query = "What is Multi-Head Attention also known as?"
    response_1 = query_engine.query(query)
    print("\n***********New LlamaParse+ Basic Query Engine***********")
    print(response_1)
  2. Query the Astra DB Serverless vector store for an example with expected lack of context. This query should return The context does not provide information about the color of the sky because your document does not contain information about the color of the sky.

    query = "What is the color of the sky?"
    response_2 = query_engine.query(query)
    print("\n***********New LlamaParse+ Basic Query Engine***********")
    print(response_2)

Complete code

Python
import os
import requests
from dotenv import load_dotenv
from llama_parse import LlamaParse
from llama_index.vector_stores.astra_db import AstraDBVectorStore
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core import VectorStoreIndex, StorageContext, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Load environment variables
load_dotenv()

# Get all required API keys and parameters
llama_cloud_api_key = os.getenv("LLAMA_CLOUD_API_KEY")
api_endpoint = os.getenv("ASTRA_DB_API_ENDPOINT")
token = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Configure global Settings
Settings.llm = OpenAI(model="gpt-4", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small", embed_batch_size=100
)

# Download a PDF for indexing
url = "https://arxiv.org/pdf/1706.03762.pdf"
file_path = "./attention.pdf"
response = requests.get(url, timeout=30)
if response.status_code == 200:
    with open(file_path, "wb") as file:
        file.write(response.content)
    print("Download complete.")
else:
    print("Error downloading the file.")

# Load and parse the document
documents = LlamaParse(result_type="text").load_data(file_path)
print(documents[0].get_content()[10000:11000])

# Setup for storing in AstraDB
astra_db_store = AstraDBVectorStore(
    token=token,
    api_endpoint=api_endpoint,
    collection_name="astra_v_table_llamaparse",
    embedding_dimension=1536
)

# Parse nodes from documents and output a snippet for verification
node_parser = SimpleNodeParser()
nodes = node_parser.get_nodes_from_documents(documents)
print(nodes[0].get_content())

# Setup storage context
storage_context = StorageContext.from_defaults(vector_store=astra_db_store)

# Indexing and query engine setup
index = VectorStoreIndex(nodes=nodes, storage_context=storage_context)
query_engine = index.as_query_engine(similarity_top_k=15)

# Execute a query
query = "What is Multi-Head Attention also known as?"
response_1 = query_engine.query(query)
print("\n***********New LlamaParse+ Basic Query Engine***********")
print(response_1)

# Query for an example with expected lack of context
query = "What is the color of the sky?"
response_2 = query_engine.query(query)
print("\n***********New LlamaParse+ Basic Query Engine***********")
print(response_2)

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com