Integrate Griptape with Astra DB Serverless

query_builder 15 min

Griptape can use the vector capabilities of Astra DB Serverless with the dedicated Astra DB Vector Store Driver.

Prerequisites

This guide requires the following:

An active Serverless (Vector) database.

To run the sample code in this guide as written, your database must have a vector-enabled collection named griptape_integration, with the Embedding generation method set to Bring my own, and Dimensions of 1536.
An application token with the Database Administrator role and your database’s API endpoint in the form of https://DATABASE_ID-REGION.apps.astra.datastax.com. For more information about getting these values, see Generate an application token for a database.
An OpenAI API key.

Python 3.9 or later, pip 23.0 or later, and the required Python packages:

pip install --upgrade pip
pip install --upgrade setuptools

pip install \
    "griptape[drivers-vector-astra-db,drivers-web-scraper-trafilatura]" \
    "python-dotenv==1.0.1"

Connect to your database

You can build and run a Python script locally or use this tutorial’s Colab notebook.

Import libraries and connect to the database.

Local installation
Google Colab

Create a .env file in the folder where you will create your Python script:

.env

APPLICATION_TOKEN="APPLICATION_TOKEN"
API_ENDPOINT="API_ENDPOINT"
KEYSPACE_NAME="default_keyspace"
GRIPTAPE_COLLECTION_NAME="griptape_integration"
OPENAI_API_KEY="API_KEY"

Define the secrets in the Google Colab environment:

import os
from getpass import getpass


os.environ["APPLICATION_TOKEN"] = getpass(
    "APPLICATION_TOKEN = "
)
os.environ["API_ENDPOINT"] = input("API_ENDPOINT = ")
if _desired_keyspace := input("KEYSPACE_NAME (optional) = "):
    os.environ["KEYSPACE_NAME"] = _desired_keyspace

default_collection_name = "griptape_integration"
os.environ["GRIPTAPE_COLLECTION_NAME"] = (
    input("GRIPTAPE_COLLECTION_NAME (empty for default) = ")
    or default_collection_name
)
os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY = ")

The KEYSPACE_NAME is the keyspace associated with your griptape_integration collection. The default keyspace for collections in Serverless (Vector) databases is default_keyspace.

For information about the other environment variable values, see the Prerequisites.

Create a Python script file. To avoid a namespace collision, don’t name the file griptape.py.

Import the dependencies.

Local installation
Google Colab

integrate.py

import os
from dotenv import load_dotenv

from griptape.drivers import (
    AstraDbVectorStoreDriver,
    OpenAiChatPromptDriver,
    OpenAiEmbeddingDriver,
)
from griptape.engines.rag import RagEngine
from griptape.engines.rag.modules import (
    PromptResponseRagModule,
    VectorStoreRetrievalRagModule,
)
from griptape.engines.rag.stages import ResponseRagStage, RetrievalRagStage
from griptape.loaders import WebLoader
from griptape.structures import Agent
from griptape.tools import RagTool

from griptape.drivers import (
    AstraDbVectorStoreDriver,
    OpenAiChatPromptDriver,
    OpenAiEmbeddingDriver,
)
from griptape.engines.rag import RagEngine
from griptape.engines.rag.modules import (
    PromptResponseRagModule,
    VectorStoreRetrievalRagModule,
)
from griptape.engines.rag.stages import ResponseRagStage, RetrievalRagStage
from griptape.loaders import WebLoader
from griptape.structures import Agent
from griptape.tools import RagTool

Load the environment variables.

Local installation
Google Colab

integrate.py

load_dotenv()

APPLICATION_TOKEN = os.environ["APPLICATION_TOKEN"]
API_ENDPOINT = os.environ["API_ENDPOINT"]
KEYSPACE_NAME = os.environ.get("KEYSPACE_NAME")
GRIPTAPE_COLLECTION_NAME = os.environ["GRIPTAPE_COLLECTION_NAME"]

APPLICATION_TOKEN = os.environ["APPLICATION_TOKEN"]
API_ENDPOINT = os.environ["API_ENDPOINT"]
KEYSPACE_NAME = os.environ.get("KEYSPACE_NAME")
GRIPTAPE_COLLECTION_NAME = os.environ["GRIPTAPE_COLLECTION_NAME"]

Initialize the vector store and RAG engine

Initialize the vector store driver and pass it to the RAG Engine, which is a Griptape component that drives RAG pipelines.

When working with the Griptape astradb_vector_store_driver, the Griptape namespace is a label for entries in a vector store (within an Astra DB collection). The astra_db_namespace attribute is your Astra DB keyspace.

integrate.py

namespace = "datastax_blog"

vector_store_driver = AstraDbVectorStoreDriver(
    embedding_driver=OpenAiEmbeddingDriver(),
    api_endpoint=API_ENDPOINT,
    token=APPLICATION_TOKEN,
    collection_name=GRIPTAPE_COLLECTION_NAME,
    astra_db_namespace=KEYSPACE_NAME,
)

engine = RagEngine(
    retrieval_stage=RetrievalRagStage(
        retrieval_modules=[
            VectorStoreRetrievalRagModule(
                vector_store_driver=vector_store_driver,
                query_params={
                    "count": 2,
                    "namespace": namespace,
                },
            )
        ]
    ),
    response_stage=ResponseRagStage(
        response_modules=[
            PromptResponseRagModule(
                prompt_driver=OpenAiChatPromptDriver(model="gpt-4o"),
            ),
        ],
    ),
)

Ingest a web page into the vector store:

integrate.py

input_blogpost = (
    "www.datastax.com/blog/indexing-all-of-wikipedia-on-a-laptop"
)

vector_store_driver.upsert_text_artifacts(
    {namespace: WebLoader(max_tokens=256).load(input_blogpost)}
)

Wrap the RAG Engine in a RAGClient, and then pass it to a Griptape agent as a tool:

integrate.py

rag_tool = RagTool(
    description="A DataStax blog post",
    rag_engine=engine,
)
agent = Agent(tools=[rag_tool])

Run a RAG-powered question-and-answer process based on the ingested content:

integrate.py

agent.run(
    "what engine did DataStax develop to index such an amount of data on a "
    "laptop? Please summarize its main features."
)

answer = agent.output_task.output.value

print(answer)

Run the code

Run the code:

python integrate.py

Result

This sample response is truncated for clarity.

$> python integrate.py
[08/21/24 00:47:28] INFO     ToolkitTask 09ca0fb83cd24f2590155aab415af651
                             Input: what engine did DataStax develop to index such an amount of data on a laptop?
                             Please summarize its main features.
[08/21/24 00:47:29] INFO     Subtask bec12f6a6d72467bbf2ab059f5f5a59a
                             Actions: [
                               {
                                 "tag": "call_JuuW9d9xQxf6M5DRcGDGNhqW",
                                 "name": "RagTool",
                                 "path": "search",
                                 "input": {
                                   "values": {
                                     "query": "DataStax engine to index large amounts of data on a laptop"
                                   }
                                 }
                               }
                             ]
[08/21/24 00:47:31] INFO     Subtask bec12f6a6d72467bbf2ab059f5f5a59a
                             Response: DataStax Astra DB uses the [...]
[08/21/24 00:47:33] INFO     ToolkitTask 09ca0fb83cd24f2590155aab415af651
                             Output: DataStax developed the [...]

DataStax developed the **JVector library** to index large amounts of data
on a laptop. Here are its main features:

1. **Support for Larger-than-Memory Datasets**: JVector can handle datasets that
exceed the available memory by using compressed vectors.
2. **Efficient Construction-Related Searches**: It performs searches efficiently
even with compressed data.
3. **Memory Optimization**: The edge lists fit in memory, while the uncompressed
vectors do not, optimizing the use of available memory resources.

This approach makes it feasible to index large datasets, such as Wikipedia,
on a laptop.

Integrate Griptape with Astra DB Serverless

Prerequisites

Connect to your database

Initialize the vector store and RAG engine

Run the code

See also

Was this helpful?

Give Feedback