Integrate LangChain with Astra DB Serverless (Vector)
The DataStax Astra DB Serverless (Vector) documentation site is currently in Public Preview and is provided on an “AS IS” basis, without warranty or indemnity of any kind. For more, see the DataStax Preview Terms. |
LangChain can use Astra DB Serverless (Vector) to store and retrieve vectors for ML applications.
To get started, you need an active Astra account.
Set up the integration
Install dependencies
-
Verify that pip is version 23.0 or higher.
pip --version
-
Upgrade pip if needed.
python -m pip install --upgrade pip
-
Install all of the dependencies. You must have Python 3.7 or higher.
pip install "langchain==0.0.339" "astrapy==0.6.0" \ "datasets==2.14.7" "openai==1.3.0" "pypdf==3.17.1" \ "tiktoken==0.5.1"
Create a vector database
-
Create a vector-enabled Astra database at Astra Portal. Note your database’s API endpoint URL.
-
Create a token with Database Administrator permissions in the Astra Connect tab.
-
Set your environment variables.
-
Local install
-
Google Colab
Create a
.env
file in the root of your program with the values from your Astra Connect tab..envASTRA_DB_APPLICATION_TOKEN="<AstraCS:...>" ASTRA_DB_API_ENDPOINT="<Astra DB API endpoint>" OPENAI_API_KEY="sk-..."
import os from getpass import getpass os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("ASTRA_DB_APPLICATION_TOKEN = ") os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT = ") os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY = ")
The endpoint format is
https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com
. -
-
Import your dependencies.
-
Local install
-
Google Colab
integrate.pyimport os import langchain.vectorstores from langchain.schema import Document from langchain.embeddings import OpenAIEmbeddings from datasets import load_dataset from dotenv import load_dotenv
import langchain.vectorstores from langchain.schema import Document from langchain.embeddings import OpenAIEmbeddings from datasets import load_dataset
-
-
Load your environment variables.
-
Local install
-
Google Colab
load_dotenv() ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN") ASTRA_DB_API_ENDPOINT = os.environ.get("ASTRA_DB_API_ENDPOINT") OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN") ASTRA_DB_API_ENDPOINT = os.environ.get("ASTRA_DB_API_ENDPOINT") OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
See Advanced configuration for Azure OpenAI values.
Don’t name the file
langchain.py
to avoid a namespace collision. -
Create embeddings from text
-
Use LangChain to create embeddings from text.
integrate.pyembedding = OpenAIEmbeddings() vstore = langchain.vectorstores.AstraDB( embedding=embedding, collection_name="test", token=os.environ["ASTRA_DB_APPLICATION_TOKEN"], api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"], )
-
Load a small dataset of philosophical quotes with the Python dataset module.
integrate.pyphilo_dataset = load_dataset("datastax/philosopher-quotes")["train"] print("An example entry:") print(philo_dataset[16])
-
Process metadata and convert to LangChain documents.
integrate.pydocs = [] for entry in philo_dataset: metadata = {"author": entry["author"]} if entry["tags"]: # Add metadata tags to the metadata dictionary for tag in entry["tags"].split(";"): metadata[tag] = "y" # Add a LangChain document with the quote and metadata tags doc = Document(page_content=entry["quote"], metadata=metadata) docs.append(doc)
-
Compute embeddings for each document and store in the vector database.
integrate.pyinserted_ids = vstore.add_documents(docs) print(f"\nInserted {len(inserted_ids)} documents.")
Verify integration
Show quotes that are similar to a specific quote.
results = vstore.similarity_search("Our life is what we make of it", k=3)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
Run the code
Run the code you defined above.
python integrate.py
Advanced configuration
If you’re using Azure OpenAI, include these additional environment variables:
OPENAI_API_TYPE="azure"
OPENAI_API_VERSION="2023-05-15"
OPENAI_API_BASE="https://<your resource name>.openai.azure.com"
OPENAI_API_KEY="<openai-api-key>"
Next steps
-
Build a chatbot with LangChain Tutorial
Learn how to use Astra DB Serverless (Vector) with LangChain to do retrieval augmented generation (RAG) on a documentation site.