Integrate LangChain with Astra DB Serverless
LangChain can use Astra DB Serverless to store documents and run similarity searches for GenAI applications.
You can build and run a script locally or use this guide’s Colab notebook. For complete script examples, see the |
Prerequisites
This guide requires the following:
-
An active Serverless (Vector) database.
-
An OpenAI account and an OpenAI API key.
This guide uses OpenAI to generate embeddings. You can get embeddings directly from OpenAI, or you can use Astra DB’s built-in OpenAI embedding provider integration.
If you want to use the built-in OpenAI integration, you must configure the OpenAI embedding provider integration before you begin. In the integration settings, note the API key name, and make sure that your database is in the key’s scope.
-
Python 3.9 or later, pip 23.0 or later, and the required LangChain Python packages:
pip install \ "langchain>=0.3,<0.4" \ "langchain-astradb>=0.6,<0.7" \ "langchain-openai>=0.3,<0.4"
Connect to your Astra DB database
-
Get an application token and Data API endpoint for your database:
-
In the Astra Portal navigation menu, click Databases, and then click the name of your Serverless (Vector) database.
-
On the Overview tab, find the Database Details section.
-
In API Endpoint, click content_paste Copy to get your database’s Data API endpoint in the form of
https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com
. -
Click Generate Token to create an application token scoped to your database.
-
-
Set secrets and connection parameters
-
Local install
-
Google Colab
Create a
.env
file in the folder where you will create your Python script:.envASTRA_DB_APPLICATION_TOKEN="TOKEN" ASTRA_DB_API_ENDPOINT="API_ENDPOINT" # Optional. A keyspace that exists in the database. ASTRA_DB_KEYSPACE="default_keyspace" # At least one of: OPENAI_API_KEY="API_KEY" ASTRA_DB_API_KEY_NAME="ASTRA_KMS_API_KEY_NAME"
Replace the following:
-
TOKEN
: Your Astra DB application token -
API_ENDPOINT
: Your database’s Data API endpoint -
API_KEY
orASTRA_KMS_API_KEY_NAME
: Provide at least one of these values to connect to OpenAI and generate embeddings.-
If you want to get embeddings directly from OpenAI, replace
API_KEY
with a secure reference to your OpenAI API key. -
If you want to use Astra DB’s built-in OpenAI integration, replace
ASTRA_KMS_API_KEY_NAME
with the API Key Name from the OpenAI embedding provider integration settings.
-
-
default_keyspace
: If you want to use a different keyspace, replace this with the name of another keyspace in your database.
import os from getpass import getpass os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT =") os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("ASTRA_DB_APPLICATION_TOKEN =") if _keyspace := input("ASTRA_DB_KEYSPACE (optional) ="): os.environ["ASTRA_DB_KEYSPACE"] = _keyspace os.environ["ASTRA_DB_API_KEY_NAME"] = input("ASTRA_DB_API_KEY_NAME (required for 'vectorize') =") os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY (required for explicit embeddings) =")
-
-
Import your dependencies.
-
Local install
-
Google Colab
To avoid a namespace collision, don’t name the file
langchain.py
.integrate.pyimport os import requests from astrapy.info import VectorServiceOptions from langchain_astradb import AstraDBVectorStore from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings from dotenv import load_dotenv
import requests from astrapy.info import VectorServiceOptions from langchain_astradb import AstraDBVectorStore from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings
-
-
Load your environment variables.
-
Local install
-
Google Colab
integrate.pyload_dotenv() ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"] ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"] ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE") ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None
ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"] ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"] ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE") ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None
-
Create embeddings from text
-
Create a vector store.
You must specify your database and a collection name. The collection is created automatically if it does not exist.
-
Use the built-in OpenAI integration
-
Get embeddings directly from OpenAI
The following statements create a collection that uses Astra DB’s built-in OpenAI integration to generate embeddings:
integrate.pyvectorize_options = VectorServiceOptions( provider="openai", model_name="text-embedding-3-small", authentication={"providerKey": ASTRA_DB_API_KEY_NAME}, ) vector_store = AstraDBVectorStore( collection_name="langchain_integration_demo_vectorize", token=ASTRA_DB_APPLICATION_TOKEN, api_endpoint=ASTRA_DB_API_ENDPOINT, namespace=ASTRA_DB_KEYSPACE, collection_vector_service_options=vectorize_options, )
If you want to use a different collection name, replace
langchain_integration_demo_vectorize
with your desired collection name.The following statements create a collection that doesn’t use Astra DB’s built-in OpenAI integration:
integrate.pyembedding = OpenAIEmbeddings() vector_store = AstraDBVectorStore( collection_name="langchain_integration_demo", embedding=embedding, token=ASTRA_DB_APPLICATION_TOKEN, api_endpoint=ASTRA_DB_API_ENDPOINT, namespace=ASTRA_DB_KEYSPACE, )
-
-
Load a small dataset of philosophical quotes from the
mini-demo-astradb-langchain
repository:integrate.pyphilo_dataset = requests.get( "https://raw.githubusercontent.com/" "datastaxdevs/mini-demo-astradb-langchain/" "refs/heads/main/data/philosopher-quotes.json" ).json() print("An example entry:") print(philo_dataset[16])
-
Process the dataset, transforming it into ready-to-insert LangChain
Document
objects:integrate.pydocuments_to_insert = [] for entry_idx, entry in enumerate(philo_dataset): metadata = { "author": entry["author"], **entry["metadata"], } # Construct the Document, with the quote and metadata tags new_document = Document( id=entry["_id"], page_content=entry["quote"], metadata=metadata, ) documents_to_insert.append(new_document) print(f"Ready to insert {len(documents_to_insert)} documents.") print(f"Example document: {documents_to_insert[16]}")
-
Insert the documents. This step generates vector embedding, and then saves all entries in the vector store (your collection).
integrate.pyinserted_ids = vector_store.add_documents(documents_to_insert) print(f"\nInserted {len(inserted_ids)} documents: {', '.join(inserted_ids[:3])} ...")
Verify the integration
To test the integration, find quotes semantically similar to a given input query:
results = vector_store.similarity_search("Our life is what we make of it", k=3)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
Run the code
Run your complete script:
python integrate.py
If you encounter an error, make sure your credentials are valid, and compare your code to the sample scripts in the mini-demo-astradb-langchain
GitHub repository.
If you are using the built-in OpenAI integration, make sure your database is in the API key’s scope in the integration settings.