Store Embeddings

We recommend LangChain’s OpenAIEmbeddings class for storing your embeddings in a vector store.

We recommend DataStax Astra DB Serverless to store your embeddings. Astra DB Serverless integrates with LangChain as a vector store using the AstraPy client.

Prerequisites

You will need an vector-enabled Astra DB Serverless database and an OpenAI Account.

See the Notebook Prerequisites page for more details.

  1. Create an vector-enabled Astra DB Serverless database.

  2. Create an OpenAI account

  3. Within your database, create an Astra DB keyspace

  4. Within your database, create an Astra DB Access Token with Database Administrator permissions.

  5. Get your Astra DB Serverless API Endpoint: https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com

  6. Initialize the environment variables in a .env file.

    ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
    ASTRA_DB_API_ENDPOINT=https://9d9b9999-999e-9999-9f9a-9b99999dg999-us-east-2.apps.astra.datastax.com
    ASTRA_DB_COLLECTION=test
    OPENAI_API_KEY=sk-f99...
  7. Enter your settings for Astra DB Serverless and OpenAI:

    astra_token = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
    astra_endpoint = os.getenv("ASTRA_DB_API_ENDPOINT")
    collection = os.getenv("ASTRA_DB_COLLECTION")
    openai_api_key = os.getenv("OPENAI_API_KEY")

Store embeddings in the vector-enabled Astra DB Serverless database

This code embeds the loaded Documents from the Split Documents example and stores the embeddings in the Astra DB Serverless vector store.

import os
from dotenv import load_dotenv
from langchain_astradb import AstraDBVectorStore
from langchain_openai import OpenAIEmbeddings

load_dotenv()

ASTRA_DB_COLLECTION = os.environ.get("ASTRA_DB_COLLECTION")

embedding = OpenAIEmbeddings()
vstore = AstraDBVectorStore(
    embedding=embedding,
    collection_name="test",
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)
docs = []
inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len(inserted_ids)} documents.")

print(vstore.astra_db.collection(ASTRA_DB_COLLECTION).find())

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com