Integrate LangChain.js with Astra DB Serverless
LangChain.js can use Astra DB Serverless to store and retrieve vectors for ML applications.
Prerequisites
This guide requires the following:
-
An active Astra account
-
An active Serverless (Vector) database
-
An application token with the Database Administrator role
-
Node.js 16.20.2 or later, and the required dependencies:
npm install @datastax/astra-db-ts@latest \ @langchain/openai@0.0.10 \ langchain@0.1.1 \ tsx
Connect to the database
-
Export the environment variables.
export ASTRA_DB_API_ENDPOINT=API_ENDPOINT export ASTRA_DB_APPLICATION_TOKEN=TOKEN export ASTRA_DB_KEYSPACE=default_keyspace # A keyspace that exists in your database export ASTRA_DB_COLLECTION=COLLECTION_NAME # Your database collection export OPENAI_API_KEY=API_KEY
The endpoint format is
https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com
. -
Create
load.ts
in thesrc/
directory. -
Import your dependencies.
load.tsimport { DirectoryLoader } from "langchain/document_loaders/fs/directory"; import { TextLoader } from "langchain/document_loaders/fs/text"; import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"; import { OpenAIEmbeddings } from "@langchain/openai"; import { AstraDBVectorStore, AstraLibArgs, } from "@langchain/community/vectorstores/astradb";
Split and embed text files
-
Create a
load_docs
function to load and split text files with LangChain.Wrapping the function in
async
allows the use ofawait
to execute non-blocking calls to the database.This function loads .txt files from the
FILE_PATH
directory and splits them into chunks of 1000 characters with a 15-character overlap. -
Download the text of Edgar Allen Poe’s "The Cask of Amontillado" for a sample document.
curl https://raw.githubusercontent.com/CassioML/cassio-website/main/docs/frameworks/langchain/texts/amontillado.txt \ --output amontillado.txt
-
Complete
load.ts
with the following code.load.tsconst FILE_PATH = "./src/sample"; const OPENAI_API_KEY = process.env['OPENAI_API_KEY']; async function load_docs() { const loader = new DirectoryLoader(FILE_PATH, { ".txt": (path) => new TextLoader(path), }); const docs = await loader.load(); const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 15, }); const texts = await splitter.splitDocuments(docs); console.log("Loaded ", texts.length, " documents."); return texts; } load_docs().catch(console.error);
-
Create a
getVectorStore
function to embed the splits in a vector store.Exporting the
getVectorStore
function and storingvectorStore
as a promise allows you to use thevectorStore
later without initializing it again.load.tslet vectorStorePromise; export async function getVectorStore() { if (!vectorStorePromise) { vectorStorePromise = (async () => { try { const texts = await load_docs(); // Specify the database and collection to use. // If the collection does not exist, it is created automatically. const astraConfig: AstraLibArgs = { token: process.env.ASTRA_DB_APPLICATION_TOKEN as string, endpoint: process.env.ASTRA_DB_API_ENDPOINT as string, namespace: process.env.ASTRA_DB_KEYSPACE as string, collection: process.env.ASTRA_DB_COLLECTION ?? "vector_test", collectionOptions: { vector: { dimension: 1536, metric: "cosine", }, }, }; // Initialize the vector store. const vectorStore = await AstraDBVectorStore.fromDocuments( texts, new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY, batchSize: 512 }), astraConfig ); // Generate embeddings from the documents and store them. vectorStore.addDocuments(texts); console.log(vectorStore); return vectorStore; } catch (error) { console.error("Error initializing vector store:", error); throw error; } })(); } return vectorStorePromise; }
-
Compile and run the code.
tsc src/load.ts node src/load.js
Query your documents
-
Create
query.ts
in thesrc/
directory. -
Create a
query
function so you can ask your documents a question. This query is defined in a separate file so you can tune your query and prompt separately from data loading. -
Import the required libraries. The
getVectorStore
function you created earlier is used to add your documents and their embeddings to the vector database.query.tsimport { ChatOpenAI } from "@langchain/openai"; import { formatDocumentsAsString } from "langchain/util/document"; import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables"; import { StringOutputParser } from "@langchain/core/output_parsers"; import { ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate, } from "@langchain/core/prompts"; import { getVectorStore } from './load';
-
Complete the
query
function. This function sets the vector store as the retriever, defines a prompt, runs the chain with your query and context from your vector database, and logs the response.query.tsasync function query() { const vectorStore = await getVectorStore(); const vectorStoreRetriever = vectorStore.asRetriever(); const model = new ChatOpenAI({}); const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. ---------------- {context}`; const messages = [ SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE), HumanMessagePromptTemplate.fromTemplate("{question}"), ]; const prompt = ChatPromptTemplate.fromMessages(messages); const chain = RunnableSequence.from([ { context: vectorStoreRetriever.pipe(formatDocumentsAsString), question: new RunnablePassthrough(), }, prompt, model, new StringOutputParser(), ]); const answer = await chain.invoke("What is this story about?"); console.log({ answer }); } query().catch(console.error);
-
Compile and run the code.
tsc src/query.ts node src/query.js
If you get a
TOO_MANY_COLLECTIONS
error, use the Data API command below or see delete an existing collection to delete a collection and make room.curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE" \ --header "Token: ASTRA_DB_APPLICATION_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "deleteCollection": { "name": "COLLECTION_NAME" } }'
Complete code examples
load.ts
.// Importing necessary modules and classes
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { AstraDBVectorStore, AstraLibArgs } from "@langchain/community/vectorstores/astradb";
// Constants and Environment Variables
const FILE_PATH = "./src/sample";
const OPENAI_API_KEY = process.env['OPENAI_API_KEY'];
/**
* Load and split documents from the local directory.
* @returns {Promise<Array<Document>>} An array of split documents.
*/
async function loadDocs() {
try {
const loader = new DirectoryLoader(FILE_PATH, { ".txt": path => new TextLoader(path) });
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 15 });
const texts = await splitter.splitDocuments(docs);
console.log(`Loaded ${texts.length} documents.`);
return texts;
} catch (error) {
console.error('Error loading documents:', error);
throw error;
}
}
// Load documents and handle any errors
loadDocs().catch(error => console.error('Failed to load documents:', error));
// Variable to store the vector store promise
let vectorStorePromise;
/**
* Initialize and get the vector store as a promise.
* @returns {Promise<AstraDBVectorStore>} A promise that resolves to the AstraDBVectorStore.
*/
export async function getVectorStore() {
if (!vectorStorePromise) {
vectorStorePromise = initVectorStore();
}
return vectorStorePromise;
}
async function initVectorStore() {
try {
const texts = await loadDocs();
const astraConfig = getAstraConfig();
// Initialize the vector store.
const vectorStore = await AstraDBVectorStore.fromDocuments(
texts, new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY, batchSize: 512 }), astraConfig
);
// Generate embeddings from the documents and store them.
vectorStore.addDocuments(texts);
console.log(vectorStore);
return vectorStore;
} catch (error) {
console.error('Error initializing vector store:', error);
throw error;
}
}
// Specify the database and collection to use.
// If the collection does not exist, it is created automatically.
function getAstraConfig() {
return {
token: process.env.ASTRA_DB_APPLICATION_TOKEN as string,
endpoint: process.env.ASTRA_DB_API_ENDPOINT as string,
collection: process.env.ASTRA_DB_COLLECTION ?? "vector_test",
collectionOptions: {
vector: {
dimension: 1536,
metric: "cosine",
},
},
} as AstraLibArgs;
}
query.ts
// Importing necessary modules and classes
import { ChatOpenAI } from "@langchain/openai";
import { formatDocumentsAsString } from "langchain/util/document";
import { RunnablePassthrough, RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import {
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
} from "@langchain/core/prompts";
import { getVectorStore } from './load';
// Constants for templates
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}`;
/**
* Query the vector store for the context and pass the context with your query to the LLM.
*/
async function queryVectorStore() {
try {
const vectorStore = await getVectorStore();
const retriever = vectorStore.asRetriever();
const openAIModel = new ChatOpenAI({});
const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{question}"),
];
const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
openAIModel,
new StringOutputParser(),
]);
const query = "What is this story about?";
const answer = await chain.invoke(query);
console.log({ answer });
} catch (error) {
console.error('Error during vector store query:', error);
throw error;
}
}
// Run the query function and handle errors
queryVectorStore().catch(error => console.error('Failed to run query:', error));