Integrate Griptape with Astra DB Serverless
Griptape can use the vector capabilities of Astra DB Serverless with the dedicated Astra DB Vector Store Driver.
Prerequisites
This guide requires the following:
-
An active Astra account.
-
An active Serverless (Vector) database.
To run the sample code in this guide as written, your database must have a vector-enabled collection named
griptape_integration
, with the Embedding generation method set to Bring my own, and Dimensions of1536
. -
An application token with the Database Administrator role, and your database’s API endpoint in the form of
https://DATABASE_ID-REGION.apps.astra.datastax.com
.You can get a token and the API endpoint from your database’s Overview page. In the Astra Portal navigation menu, select your database, and then locate the Database Details section where you can copy the API Endpoint and generate a token.
-
An OpenAI API key.
-
Python 3.9 or later, pip 23.0 or later, and the required Python packages:
pip install --upgrade pip pip install --upgrade setuptools pip install \ "griptape[drivers-vector-astra-db,drivers-web-scraper-trafilatura]" \ "python-dotenv==1.0.1"
Connect to your database
You can build and run a Python script locally or use this tutorial’s Colab notebook.
-
Import libraries and connect to the database.
The
ASTRA_DB_NAMESPACE
is the namespace associated with yourgriptape_integration
collection. The default namespace isdefault_keyspace
.For information about the other environment variable values, see the Prerequisites.
-
Local installation
-
Google Colab
Create a
.env
file in the folder where you will create your Python script:.envASTRA_DB_APPLICATION_TOKEN="APPLICATION_TOKEN" ASTRA_DB_API_ENDPOINT="DATABASE_API_ENDPOINT" ASTRA_DB_NAMESPACE="default_keyspace" GRIPTAPE_COLLECTION_NAME="griptape_integration" OPENAI_API_KEY="API_KEY"
Define the secrets in the Google Colab environment:
import os from getpass import getpass os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass( "ASTRA_DB_APPLICATION_TOKEN = " ) os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT = ") if _desired_namespace := input("ASTRA_DB_NAMESPACE (optional) = "): os.environ["ASTRA_DB_NAMESPACE"] = _desired_namespace default_collection_name = "griptape_integration" os.environ["GRIPTAPE_COLLECTION_NAME"] = ( input("GRIPTAPE_COLLECTION_NAME (empty for default) = ") or default_collection_name ) os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY = ")
-
-
Create a Python script file. To avoid a namespace collision, don’t name the file
griptape.py
. -
Import the dependencies.
-
Local installation
-
Google Colab
integrate.pyimport os from dotenv import load_dotenv from griptape.drivers import ( AstraDbVectorStoreDriver, OpenAiChatPromptDriver, OpenAiEmbeddingDriver, ) from griptape.engines.rag import RagEngine from griptape.engines.rag.modules import ( PromptResponseRagModule, VectorStoreRetrievalRagModule, ) from griptape.engines.rag.stages import ResponseRagStage, RetrievalRagStage from griptape.loaders import WebLoader from griptape.structures import Agent from griptape.tools import RagTool
from griptape.drivers import ( AstraDbVectorStoreDriver, OpenAiChatPromptDriver, OpenAiEmbeddingDriver, ) from griptape.engines.rag import RagEngine from griptape.engines.rag.modules import ( PromptResponseRagModule, VectorStoreRetrievalRagModule, ) from griptape.engines.rag.stages import ResponseRagStage, RetrievalRagStage from griptape.loaders import WebLoader from griptape.structures import Agent from griptape.tools import RagTool
-
-
Load the environment variables.
-
Local installation
-
Google Colab
integrate.pyload_dotenv() ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"] ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"] ASTRA_DB_NAMESPACE = os.environ.get("ASTRA_DB_NAMESPACE") GRIPTAPE_COLLECTION_NAME = os.environ["GRIPTAPE_COLLECTION_NAME"]
ASTRA_DB_APPLICATION_TOKEN = os.environ["ASTRA_DB_APPLICATION_TOKEN"] ASTRA_DB_API_ENDPOINT = os.environ["ASTRA_DB_API_ENDPOINT"] ASTRA_DB_NAMESPACE = os.environ.get("ASTRA_DB_NAMESPACE") GRIPTAPE_COLLECTION_NAME = os.environ["GRIPTAPE_COLLECTION_NAME"]
-
Initialize the vector store and RAG engine
-
Initialize the vector store driver and pass it to the RAG Engine, which is a Griptape component that drives RAG pipelines.
A Griptape namespace is not the same as an Astra DB namespace. When working with the Griptape
astradb_vector_store_driver
, the Griptapenamespace
is a label for entries in a vector store (within an Astra DB collection), and theastra_db_namespace
attribute is your Astra DB namespace.integrate.pynamespace = "datastax_blog" vector_store_driver = AstraDbVectorStoreDriver( embedding_driver=OpenAiEmbeddingDriver(), api_endpoint=ASTRA_DB_API_ENDPOINT, token=ASTRA_DB_APPLICATION_TOKEN, collection_name=GRIPTAPE_COLLECTION_NAME, astra_db_namespace=ASTRA_DB_NAMESPACE, ) engine = RagEngine( retrieval_stage=RetrievalRagStage( retrieval_modules=[ VectorStoreRetrievalRagModule( vector_store_driver=vector_store_driver, query_params={ "count": 2, "namespace": namespace, }, ) ] ), response_stage=ResponseRagStage( response_modules=[ PromptResponseRagModule( prompt_driver=OpenAiChatPromptDriver(model="gpt-4o"), ), ], ), )
-
Ingest a web page into the vector store:
integrate.pyinput_blogpost = ( "www.datastax.com/blog/indexing-all-of-wikipedia-on-a-laptop" ) vector_store_driver.upsert_text_artifacts( {namespace: WebLoader(max_tokens=256).load(input_blogpost)} )
-
Wrap the RAG Engine in a
RAGClient
, and then pass it to a Griptape agent as a tool:integrate.pyrag_tool = RagTool( description="A DataStax blog post", rag_engine=engine, ) agent = Agent(tools=[rag_tool])
-
Run a RAG-powered question-and-answer process based on the ingested content:
integrate.pyagent.run( "what engine did DataStax develop to index such an amount of data on a " "laptop? Please summarize its main features." ) answer = agent.output_task.output.value print(answer)
Run the code
Run the code:
python integrate.py
Response
This sample response is truncated for clarity.
$> python integrate.py
[08/21/24 00:47:28] INFO ToolkitTask 09ca0fb83cd24f2590155aab415af651
Input: what engine did DataStax develop to index such an amount of data on a laptop?
Please summarize its main features.
[08/21/24 00:47:29] INFO Subtask bec12f6a6d72467bbf2ab059f5f5a59a
Actions: [
{
"tag": "call_JuuW9d9xQxf6M5DRcGDGNhqW",
"name": "RagTool",
"path": "search",
"input": {
"values": {
"query": "DataStax engine to index large amounts of data on a laptop"
}
}
}
]
[08/21/24 00:47:31] INFO Subtask bec12f6a6d72467bbf2ab059f5f5a59a
Response: DataStax Astra DB uses the [...]
[08/21/24 00:47:33] INFO ToolkitTask 09ca0fb83cd24f2590155aab415af651
Output: DataStax developed the [...]
DataStax developed the **JVector library** to index large amounts of data
on a laptop. Here are its main features:
1. **Support for Larger-than-Memory Datasets**: JVector can handle datasets that
exceed the available memory by using compressed vectors.
2. **Efficient Construction-Related Searches**: It performs searches efficiently
even with compressed data.
3. **Memory Optimization**: The edge lists fit in memory, while the uncompressed
vectors do not, optimizing the use of available memory resources.
This approach makes it feasible to index large datasets, such as Wikipedia,
on a laptop.