Nvidia Embeddings and Models
This notebook demonstrates how to set up a simple RAG pipeline using NVIDIA AI Foundation Models. At the end of this notebook, you will have a functioning Question/Answer pipeline that can answer questions using your supplied documents, powered by Astra DB Serverless, LangChain, and NVIDIA.
Prerequisites
You will need an vector-enabled Astra DB Serverless database and an NVIDIA NGC Account.
-
Create an Astra vector database.
-
Within your database, create an Astra DB Access Token with Database Administrator permissions.
-
Get your Astra DB Serverless API Endpoint:
-
https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com
-
-
Create an NVIDIA NGC Account.
-
Once signed in, navigate to Catalog > AI Foundation Models > (Model).
-
In the model page, select the
API
tab, thenGenerate Key
.
-
-
Install the following dependencies:
pip install -qU ragstack-ai langchain-nvidia-ai-endpoints datasets
ragstack-ai
includes all the packages you need to build a RAG pipeline.langchain-nvidia-ai-endpoints
includes the NVIDIA models.datasets
is used to import a sample dataset.See the Prerequisites page for more details.
Configure Astra DB Serverless and Nvidia NGC credentials
Export these values in the terminal where you’re running this application. If you’re using Google Colab, you’ll be prompted for these values in the Colab environment.
export ASTRA_DB_APPLICATION_TOKEN=AstraCS: ...
export ASTRA_DB_API_ENDPOINT=https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com
export NVIDIA_API_KEY=nvapi-...
Create a RAG pipeline
Configure embedding model and populate vector store
-
Create an embedding model using the NVIDIA API key you generated in the prerequisites.
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings embedding = NVIDIAEmbeddings( nvidia_api_key=os.getenv("NVIDIA_API_KEY"), model="nvolveqa_40k")
-
Create a vector store using the embedding model and Astra DB Serverless credentials.
import os from langchain_astradb import AstraDBVectorStore vstore = AstraDBVectorStore( collection_name=collection, embedding=embedding, token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"), api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"), ) print("Astra vector store configured")
-
Load a sample dataset and construct documents from the dataset.
from datasets import load_dataset philo_dataset = load_dataset("datastax/philosopher-quotes")["train"] print("An example entry:") print(philo_dataset[16])
-
Construct a set of
Documents
from your data.Documents
can be used as inputs to your vector store.from langchain.schema import Document docs = [] for entry in philo_dataset: metadata = {"author": entry["author"]} if entry["tags"]: # Add metadata tags to the metadata dictionary for tag in entry["tags"].split(";"): metadata[tag] = "y" # Create a LangChain document with the quote and metadata tags doc = Document(page_content=entry["quote"], metadata=metadata) docs.append(doc)
-
Create embeddings by inserting your documents into the vector store. Print your collection to verify the documents are embedded.
inserted_ids = vstore.add_documents(docs) print(f"\nInserted {len(inserted_ids)} documents.") print(vstore.astra_db.collection(collection).find())
Create a QA retrieval chain
-
Retrieve context from your vector database, and pass it to the NVIDIA model with a prompt.
from langchain.prompts import ChatPromptTemplate from langchain.chat_models import ChatOpenAI from langchain.schema.output_parser import StrOutputParser from langchain.schema.runnable import RunnablePassthrough from langchain_nvidia_ai_endpoints import ChatNVIDIA retriever = vstore.as_retriever(search_kwargs={"k": 3}) prompt_template = """ Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer. Context: {context} Question: {question} Your answer: """ prompt = ChatPromptTemplate.from_template(prompt_template) model = ChatNVIDIA(model="mixtral_8x7b") chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | model | StrOutputParser() ) result = chain.invoke("In the given context, what subject are philosophers most concerned with?") print(result)
-
Optionally, modify the prompt invocation to ask your own question.
# Add your questions here! result = chain.invoke("<your question>")
-
Run the code you created previously. It should print the following output:
Astra vector store configured An example entry: {'author': 'aristotle', 'quote': 'Love well, be loved and do something of value.', 'tags': 'love;ethics'} Inserted 450 documents. Based on the provided context, philosophers are most concerned with the subject of wonder. This is mentioned twice in documents attributed to Aristotle, stating 'Philosophy begins with wonder.' There is no information provided in the context that suggests philosophers are more concerned with any other subject.
You now have a functional RAG pipeline powered by NVIDIA! NVIDIA offers many different model types suited for different problems. Check out the catalog for more.
Cleanup
Add the following code to the end of your script to delete the collection and all documents in the collection.
vstore.delete_collection()
Complete code
Python
from datasets import load_dataset
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA
from langchain_astradb import AstraDBVectorStore
from langchain.schema import Document
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
import os
# Configuration for NVIDIA Embeddings
nvidia_api_key = os.getenv("NVIDIA_API_KEY")
embedding = NVIDIAEmbeddings(nvidia_api_key=nvidia_api_key, model="nvolveqa_40k")
# AstraDB Vector Store setup
collection_name = "test"
astra_token = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
astra_api_endpoint = os.getenv("ASTRA_DB_API_ENDPOINT")
vstore = AstraDBVectorStore(collection_name=collection_name, embedding=embedding,
token=astra_token, api_endpoint=astra_api_endpoint)
print("Astra vector store configured")
# Load a sample dataset
philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
print("An example entry:")
print(philo_dataset[16])
# Construct documents from dataset
docs = []
for entry in philo_dataset:
metadata = {"author": entry["author"]}
if entry["tags"]:
for tag in entry["tags"].split(";"):
metadata[tag] = "y"
doc = Document(page_content=entry["quote"], metadata=metadata)
docs.append(doc)
# Insert documents into vector store
inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len(inserted_ids)} documents.")
# Setup LangChain Chat Prompt
retriever = vstore.as_retriever(search_kwargs={"k": 3})
prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)
model = ChatNVIDIA(model="mixtral_8x7b", nvidia_api_key=nvidia_api_key)
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
# Invoke the chain with a query and print result
result = chain.invoke("In the given context, what subject are philosophers most concerned with?")
print(result)