Integrate LangChain with Astra DB Serverless

query_builder 15 min

LangChain can use Astra DB Serverless to store documents and run similarity searches for GenAI applications.

You can build and run a script locally or use this guide’s Colab notebook.

For complete script examples, see the mini-demo-astradb-langchain GitHub repository. Two demo scrips are provided. One script uses Astra DB’s built-in embedding provider integration, also known as a vectorize integration. The other script gets embeddings directly from the embedding provider.

Prerequisites

This guide requires the following:

  • An active Serverless (Vector) database.

  • An OpenAI account and an OpenAI API key.

    This guide uses OpenAI to generate embeddings. You can get embeddings directly from OpenAI, or you can use Astra DB’s built-in OpenAI embedding provider integration.

    If you want to use the built-in OpenAI integration, you must configure the OpenAI embedding provider integration before you begin. In the integration settings, note the API key name, and make sure that your database is in the key’s scope.

  • Python 3.9 or later, pip 23.0 or later, and the required LangChain Python packages:

    pip install \
        "langchain>=0.3,<0.4" \
        "langchain-astradb>=0.6,<0.7" \
        "langchain-openai>=0.3,<0.4"

Connect to your Astra DB database

  1. Get an application token and Data API endpoint for your database:

    1. In the Astra Portal navigation menu, click Databases, and then click the name of your Serverless (Vector) database.

    2. On the Overview tab, find the Database Details section.

    3. In API Endpoint, click Copy to get your database’s Data API endpoint in the form of https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com.

    4. Click Generate Token to create an application token scoped to your database.

  2. Set secrets and connection parameters

    • Local install

    • Google Colab

    Create a .env file in the folder where you will create your Python script:

    .env
    ASTRA_DB_APPLICATION_TOKEN="TOKEN"
    ASTRA_DB_API_ENDPOINT="API_ENDPOINT"
    
    # Optional. A keyspace that exists in the database.
    ASTRA_DB_KEYSPACE="default_keyspace"
    
    # At least one of:
    OPENAI_API_KEY="API_KEY"
    ASTRA_DB_API_KEY_NAME="ASTRA_KMS_API_KEY_NAME"

    Replace the following:

    • TOKEN: Your Astra DB application token

    • API_ENDPOINT: Your database’s Data API endpoint

    • API_KEY or ASTRA_KMS_API_KEY_NAME: Provide at least one of these values to connect to OpenAI and generate embeddings.

      • If you want to get embeddings directly from OpenAI, replace API_KEY with a secure reference to your OpenAI API key.

      • If you want to use Astra DB’s built-in OpenAI integration, replace ASTRA_KMS_API_KEY_NAME with the API Key Name from the OpenAI embedding provider integration settings.

    • default_keyspace: If you want to use a different keyspace, replace this with the name of another keyspace in your database.

    import os
    from getpass import getpass
    
    os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT =")
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("ASTRA_DB_APPLICATION_TOKEN =")
    
    if _keyspace := input("ASTRA_DB_KEYSPACE (optional) ="):
        os.environ["ASTRA_DB_KEYSPACE"] = _keyspace
    
    os.environ["ASTRA_DB_API_KEY_NAME"] = input("ASTRA_DB_API_KEY_NAME (required for 'vectorize') =")
    os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY (required for explicit embeddings) =")
  3. Import your dependencies.

    • Local install

    • Google Colab

    To avoid a namespace collision, don’t name the file langchain.py.

    integrate.py
    import os
    import requests
    
    from astrapy.info import VectorServiceOptions
    from langchain_astradb import AstraDBVectorStore
    
    from langchain_core.documents import Document
    from langchain_openai import OpenAIEmbeddings
    
    from dotenv import load_dotenv
    import requests
    
    from astrapy.info import VectorServiceOptions
    from langchain_astradb import AstraDBVectorStore
    
    from langchain_core.documents import Document
    from langchain_openai import OpenAIEmbeddings
  4. Load your environment variables.

    • Local install

    • Google Colab

    integrate.py
    load_dotenv()
    
    ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"]
    ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"]
    ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
    ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None
    OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None
    ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"]
    ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"]
    ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
    ASTRA_DB_API_KEY_NAME = os.environ.get("ASTRA_DB_API_KEY_NAME") or None
    OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") or None

Create embeddings from text

  1. Create a vector store.

    You must specify your database and a collection name. The collection is created automatically if it does not exist.

    • Use the built-in OpenAI integration

    • Get embeddings directly from OpenAI

    The following statements create a collection that uses Astra DB’s built-in OpenAI integration to generate embeddings:

    integrate.py
    vectorize_options = VectorServiceOptions(
        provider="openai",
        model_name="text-embedding-3-small",
        authentication={"providerKey": ASTRA_DB_API_KEY_NAME},
    )
    vector_store = AstraDBVectorStore(
        collection_name="langchain_integration_demo_vectorize",
        token=ASTRA_DB_APPLICATION_TOKEN,
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        namespace=ASTRA_DB_KEYSPACE,
        collection_vector_service_options=vectorize_options,
    )

    If you want to use a different collection name, replace langchain_integration_demo_vectorize with your desired collection name.

    The following statements create a collection that doesn’t use Astra DB’s built-in OpenAI integration:

    integrate.py
    embedding = OpenAIEmbeddings()
    vector_store = AstraDBVectorStore(
        collection_name="langchain_integration_demo",
        embedding=embedding,
        token=ASTRA_DB_APPLICATION_TOKEN,
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        namespace=ASTRA_DB_KEYSPACE,
    )
  2. Load a small dataset of philosophical quotes from the mini-demo-astradb-langchain repository:

    integrate.py
    philo_dataset = requests.get(
        "https://raw.githubusercontent.com/"
        "datastaxdevs/mini-demo-astradb-langchain/"
        "refs/heads/main/data/philosopher-quotes.json"
    ).json()
    
    print("An example entry:")
    print(philo_dataset[16])
  3. Process the dataset, transforming it into ready-to-insert LangChain Document objects:

    integrate.py
    documents_to_insert = []
    
    for entry_idx, entry in enumerate(philo_dataset):
        metadata = {
            "author": entry["author"],
            **entry["metadata"],
        }
        # Construct the Document, with the quote and metadata tags
        new_document = Document(
            id=entry["_id"],
            page_content=entry["quote"],
            metadata=metadata,
        )
        documents_to_insert.append(new_document)
    
    print(f"Ready to insert {len(documents_to_insert)} documents.")
    print(f"Example document: {documents_to_insert[16]}")
  4. Insert the documents. This step generates vector embedding, and then saves all entries in the vector store (your collection).

    integrate.py
    inserted_ids = vector_store.add_documents(documents_to_insert)
    
    print(f"\nInserted {len(inserted_ids)} documents: {', '.join(inserted_ids[:3])} ...")

Verify the integration

To test the integration, find quotes semantically similar to a given input query:

integrate.py
results = vector_store.similarity_search("Our life is what we make of it", k=3)

for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

Run the code

Run your complete script:

python integrate.py

If you encounter an error, make sure your credentials are valid, and compare your code to the sample scripts in the mini-demo-astradb-langchain GitHub repository.

If you are using the built-in OpenAI integration, make sure your database is in the API key’s scope in the integration settings.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com