Integrate LangChain with Astra DB Serverless

query_builder 15 min

LangChain can use Astra DB Serverless to store documents and run similarity searches for GenAI applications.

You can build and run a script locally or use this guide’s Colab notebook.

For complete script examples, see the mini-demo-astradb-langchain GitHub repository. Two demo scrips are provided. One script uses Astra DB’s built-in embedding provider integration, also known as a vectorize integration. The other script gets embeddings directly from the embedding provider.

Prerequisites

This guide requires the following:

An active Serverless (Vector) database.
An OpenAI account and an OpenAI API key.

This guide uses OpenAI to generate embeddings. You can get embeddings directly from OpenAI, or you can use Astra DB’s built-in OpenAI embedding provider integration.

If you want to use the built-in OpenAI integration, you must configure the OpenAI embedding provider integration before you begin. In the integration settings, note the API key name, and make sure that your database is in the key’s scope.

Python 3.9 or later, pip 23.0 or later, and the required LangChain Python packages:

pip install \
    "langchain>=0.3,<0.4" \
    "langchain-astradb>=0.6,<0.7" \
    "langchain-openai>=0.3,<0.4"

Connect to your Astra DB database

Get an application token and Data API endpoint for your database. For instructions, see Generate an application token scoped to a database.
Set secrets and connection parameters.
- Local install
- Google Colab
Create a .env file in the folder where you will create your Python script:
.env

ASTRA_DB_APPLICATION_TOKEN="TOKEN" ASTRA_DB_API_ENDPOINT="API_ENDPOINT" # Optional. A keyspace that exists in the database. ASTRA_DB_KEYSPACE="default_keyspace" # At least one of: OPENAI_API_KEY="API_KEY" ASTRA_DB_API_KEY_NAME="ASTRA_KMS_API_KEY_NAME"
Replace the following:
TOKEN: Your Astra DB application token

API_ENDPOINT: Your database’s Data API endpoint

API_KEY or ASTRA_KMS_API_KEY_NAME: Provide at least one of these values to connect to OpenAI and generate embeddings.

If you want to get embeddings directly from OpenAI, replace API_KEY with a secure reference to your OpenAI API key.

If you want to use Astra DB’s built-in OpenAI integration, replace ASTRA_KMS_API_KEY_NAME with the API Key Name from the OpenAI embedding provider integration settings.

default_keyspace: If you want to use a different keyspace, replace this with the name of another keyspace in your database.
import os from getpass import getpass os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT =") os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("ASTRA_DB_APPLICATION_TOKEN =") if _keyspace := input("ASTRA_DB_KEYSPACE (optional) ="): os.environ["ASTRA_DB_KEYSPACE"] = _keyspace os.environ["ASTRA_DB_API_KEY_NAME"] = input("ASTRA_DB_API_KEY_NAME (required for 'vectorize') =") os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY (required for explicit embeddings) =")

Import your dependencies.

Local install
Google Colab

To avoid a namespace collision, don’t name the file langchain.py.

integrate.py

import os
import requests

from astrapy.info import VectorServiceOptions
from langchain_astradb import AstraDBVectorStore

from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

from dotenv import load_dotenv

import requests

from astrapy.info import VectorServiceOptions
from langchain_astradb import AstraDBVectorStore

from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

Load your environment variables.

Local install
Google Colab

integrate.py

load_dotenv()

ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"]
ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"]
ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None

ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"]
ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"]
ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None

Create embeddings from text

Create a vector store.

You must specify your database and a collection name. The collection is created automatically if it does not exist.

Use the built-in OpenAI integration
Get embeddings directly from OpenAI

The following statements create a collection that uses Astra DB’s built-in OpenAI integration to generate embeddings:

integrate.py

vectorize_options = VectorServiceOptions(
    provider="openai",
    model_name="text-embedding-3-small",
    authentication={"providerKey": ASTRA_DB_API_KEY_NAME},
)
vector_store = AstraDBVectorStore(
    collection_name="langchain_integration_demo_vectorize",
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
    namespace=ASTRA_DB_KEYSPACE,
    collection_vector_service_options=vectorize_options,
)

If you want to use a different collection name, replace langchain_integration_demo_vectorize with your desired collection name.

The following statements create a collection that doesn’t use Astra DB’s built-in OpenAI integration:

integrate.py

embedding = OpenAIEmbeddings()
vector_store = AstraDBVectorStore(
    collection_name="langchain_integration_demo",
    embedding=embedding,
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
    namespace=ASTRA_DB_KEYSPACE,
)

Load a small dataset of philosophical quotes from the mini-demo-astradb-langchain repository:

integrate.py

philo_dataset = requests.get(
    "https://raw.githubusercontent.com/"
    "datastaxdevs/mini-demo-astradb-langchain/"
    "refs/heads/main/data/philosopher-quotes.json"
).json()

print("An example entry:")
print(philo_dataset[16])

Process the dataset, transforming it into ready-to-insert LangChain Document objects:

integrate.py

documents_to_insert = []

for entry_idx, entry in enumerate(philo_dataset):
    metadata = {
        "author": entry["author"],
        **entry["metadata"],
    }
    # Construct the Document, with the quote and metadata tags
    new_document = Document(
        id=entry["_id"],
        page_content=entry["quote"],
        metadata=metadata,
    )
    documents_to_insert.append(new_document)

print(f"Ready to insert {len(documents_to_insert)} documents.")
print(f"Example document: {documents_to_insert[16]}")

Insert the documents. This step generates vector embedding, and then saves all entries in the vector store (your collection).

integrate.py

inserted_ids = vector_store.add_documents(documents_to_insert)

print(f"\nInserted {len(inserted_ids)} documents: {', '.join(inserted_ids[:3])} ...")

Verify the integration

To test the integration, find quotes semantically similar to a given input query:

integrate.py

results = vector_store.similarity_search("Our life is what we make of it", k=3)

for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

Run the code

Run your complete script:

python integrate.py

If you encounter an error, make sure your credentials are valid, and compare your code to the sample scripts in the mini-demo-astradb-langchain GitHub repository.

If you are using the built-in OpenAI integration, make sure your database is in the API key’s scope in the integration settings.

Integrate LangChain with Astra DB Serverless

Prerequisites

Connect to your Astra DB database

Create embeddings from text

Verify the integration

Run the code

See also

Was this helpful?

Give Feedback