Integrate LangChain.js with Astra DB Serverless

query_builder 15 min

LangChain.js can use Astra DB Serverless to store and retrieve vectors for ML applications.

Prerequisites

This guide requires the following:

Connect to the database

  1. Export the environment variables.

    export ASTRA_DB_API_ENDPOINT=API_ENDPOINT
    export ASTRA_DB_APPLICATION_TOKEN=TOKEN
    export ASTRA_DB_KEYSPACE=default_keyspace # A keyspace that exists in your database
    export ASTRA_DB_COLLECTION=COLLECTION_NAME # Your database collection
    export OPENAI_API_KEY=API_KEY

    The endpoint format is https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com.

  2. Create load.ts in the src/ directory.

  3. Import your dependencies.

    load.ts
    import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
    import { TextLoader } from "langchain/document_loaders/fs/text";
    import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
    import { OpenAIEmbeddings } from "@langchain/openai";
    import {
      AstraDBVectorStore,
      AstraLibArgs,
    } from "@langchain/community/vectorstores/astradb";

Split and embed text files

  1. Create a load_docs function to load and split text files with LangChain.

    Wrapping the function in async allows the use of await to execute non-blocking calls to the database.

    This function loads .txt files from the FILE_PATH directory and splits them into chunks of 1000 characters with a 15-character overlap.

  2. Download the text of Edgar Allen Poe’s "The Cask of Amontillado" for a sample document.

    curl https://raw.githubusercontent.com/CassioML/cassio-website/main/docs/frameworks/langchain/texts/amontillado.txt \
      --output amontillado.txt
  3. Complete load.ts with the following code.

    load.ts
    const FILE_PATH = "./src/sample";
    const OPENAI_API_KEY = process.env['OPENAI_API_KEY'];
    
    async function load_docs() {
      const loader = new DirectoryLoader(FILE_PATH, {
        ".txt": (path) => new TextLoader(path),
      });
      const docs = await loader.load();
    
      const splitter = new RecursiveCharacterTextSplitter({
        chunkSize: 1000,
        chunkOverlap: 15,
      });
    
      const texts = await splitter.splitDocuments(docs);
      console.log("Loaded ", texts.length, " documents.");
      return texts;
    }
    
    load_docs().catch(console.error);
  4. Create a getVectorStore function to embed the splits in a vector store.

    Exporting the getVectorStore function and storing vectorStore as a promise allows you to use the vectorStore later without initializing it again.

    load.ts
    let vectorStorePromise;
    
    export async function getVectorStore() {
      if (!vectorStorePromise) {
        vectorStorePromise = (async () => {
          try {
            const texts = await load_docs();
    
            // Specify the database and collection to use.
            // If the collection does not exist, it is created automatically.
            const astraConfig: AstraLibArgs = {
              token: process.env.ASTRA_DB_APPLICATION_TOKEN as string,
              endpoint: process.env.ASTRA_DB_API_ENDPOINT as string,
              namespace: process.env.ASTRA_DB_KEYSPACE as string,
              collection: process.env.ASTRA_DB_COLLECTION ?? "vector_test",
              collectionOptions: {
                vector: {
                  dimension: 1536,
                  metric: "cosine",
                },
              },
            };
    
            // Initialize the vector store.
            const vectorStore = await AstraDBVectorStore.fromDocuments(
              texts,
              new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY, batchSize: 512 }),
              astraConfig
            );
    
            // Generate embeddings from the documents and store them.
            vectorStore.addDocuments(texts);
            console.log(vectorStore);
            return vectorStore;
          } catch (error) {
            console.error("Error initializing vector store:", error);
            throw error;
          }
        })();
      }
      return vectorStorePromise;
    }
  5. Compile and run the code.

    tsc src/load.ts
    node src/load.js

Query your documents

  1. Create query.ts in the src/ directory.

  2. Create a query function so you can ask your documents a question. This query is defined in a separate file so you can tune your query and prompt separately from data loading.

  3. Import the required libraries. The getVectorStore function you created earlier is used to add your documents and their embeddings to the vector database.

    query.ts
    import { ChatOpenAI } from "@langchain/openai";
    import { formatDocumentsAsString } from "langchain/util/document";
    import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
    import { StringOutputParser } from "@langchain/core/output_parsers";
    import {
      ChatPromptTemplate,
      HumanMessagePromptTemplate,
      SystemMessagePromptTemplate,
    } from "@langchain/core/prompts";
    
    import { getVectorStore } from './load';
  4. Complete the query function. This function sets the vector store as the retriever, defines a prompt, runs the chain with your query and context from your vector database, and logs the response.

    query.ts
    async function query() {
      const vectorStore = await getVectorStore();
      const vectorStoreRetriever = vectorStore.asRetriever();
      const model = new ChatOpenAI({});
      const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
      If you don't know the answer, just say that you don't know, don't try to make up an answer.
      ----------------
      {context}`;
    
      const messages = [
        SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
        HumanMessagePromptTemplate.fromTemplate("{question}"),
      ];
    
      const prompt = ChatPromptTemplate.fromMessages(messages);
    
      const chain = RunnableSequence.from([
        {
          context: vectorStoreRetriever.pipe(formatDocumentsAsString),
          question: new RunnablePassthrough(),
        },
        prompt,
        model,
        new StringOutputParser(),
      ]);
    
      const answer = await chain.invoke("What is this story about?");
    
      console.log({ answer });
    }
    
    query().catch(console.error);
  5. Compile and run the code.

    tsc src/query.ts
    node src/query.js

    If you get a TOO_MANY_COLLECTIONS error, use the Data API command below or see delete an existing collection to delete a collection and make room.

    curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \
    --header "Token: ASTRA_DB_APPLICATION_TOKEN" \
    --header "Content-Type: application/json" \
    --data '{
      "deleteCollection": {
        "name": "COLLECTION_NAME"
      }
    }'

Complete code examples

load.ts
load.ts
.// Importing necessary modules and classes
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { AstraDBVectorStore, AstraLibArgs } from "@langchain/community/vectorstores/astradb";

// Constants and Environment Variables
const FILE_PATH = "./src/sample";
const OPENAI_API_KEY = process.env['OPENAI_API_KEY'];

/**
 * Load and split documents from the local directory.
 * @returns {Promise<Array<Document>>} An array of split documents.
 */
async function loadDocs() {
  try {
    const loader = new DirectoryLoader(FILE_PATH, { ".txt": path => new TextLoader(path) });
    const docs = await loader.load();

    const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 15 });
    const texts = await splitter.splitDocuments(docs);

    console.log(`Loaded ${texts.length} documents.`);
    return texts;
  } catch (error) {
    console.error('Error loading documents:', error);
    throw error;
  }
}

// Load documents and handle any errors
loadDocs().catch(error => console.error('Failed to load documents:', error));

// Variable to store the vector store promise
let vectorStorePromise;

/**
 * Initialize and get the vector store as a promise.
 * @returns {Promise<AstraDBVectorStore>} A promise that resolves to the AstraDBVectorStore.
 */
export async function getVectorStore() {
  if (!vectorStorePromise) {
    vectorStorePromise = initVectorStore();
  }
  return vectorStorePromise;
}

async function initVectorStore() {
  try {
    const texts = await loadDocs();
    const astraConfig = getAstraConfig();

    // Initialize the vector store.
    const vectorStore = await AstraDBVectorStore.fromDocuments(
      texts, new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY, batchSize: 512 }), astraConfig
    );

    // Generate embeddings from the documents and store them.
    vectorStore.addDocuments(texts);
    console.log(vectorStore);
    return vectorStore;
  } catch (error) {
    console.error('Error initializing vector store:', error);
    throw error;
  }
}

// Specify the database and collection to use.
// If the collection does not exist, it is created automatically.
function getAstraConfig() {
  return {
    token: process.env.ASTRA_DB_APPLICATION_TOKEN as string,
    endpoint: process.env.ASTRA_DB_API_ENDPOINT as string,
    collection: process.env.ASTRA_DB_COLLECTION ?? "vector_test",
    collectionOptions: {
      vector: {
        dimension: 1536,
        metric: "cosine",
      },
    },
  } as AstraLibArgs;
}
query.ts
query.ts
// Importing necessary modules and classes
import { ChatOpenAI } from "@langchain/openai";
import { formatDocumentsAsString } from "langchain/util/document";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import {
  ChatPromptTemplate,
  HumanMessagePromptTemplate,
  SystemMessagePromptTemplate,
} from "@langchain/core/prompts";
import { getVectorStore } from './load';

// Constants for templates
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
  If you don't know the answer, just say that you don't know, don't try to make up an answer.
  ----------------
  {context}`;

/**
 * Query the vector store for the context and pass the context with your query to the LLM.
 */
async function queryVectorStore() {
  try {
    const vectorStore = await getVectorStore();
    const retriever = vectorStore.asRetriever();
    const openAIModel = new ChatOpenAI({});

    const messages = [
      SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
      HumanMessagePromptTemplate.fromTemplate("{question}"),
    ];

    const prompt = ChatPromptTemplate.fromMessages(messages);

    const chain = RunnableSequence.from([
      {
        context: retriever.pipe(formatDocumentsAsString),
        question: new RunnablePassthrough(),
      },
      prompt,
      openAIModel,
      new StringOutputParser(),
    ]);

    const query = "What is this story about?";
    const answer = await chain.invoke(query);

    console.log({ answer });
  } catch (error) {
    console.error('Error during vector store query:', error);
    throw error;
  }
}

// Run the query function and handle errors
queryVectorStore().catch(error => console.error('Failed to run query:', error));

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com