OpenAI Assistants with persistent vector store
The Astra Assistants API is a drop in replacement for the OpenAI Assistants API. The Astra Assistants API uses a Serverless (Vector) database for persistence, and it supports the following features:
-
Full compatibility with the OpenAI Assistants API v2, including messages, assistants, threads, runs, vector_stores, and files.
-
Third-party embeddings and completion models with hundreds of LLMs, including Anthropic, Gemini, Mistral, Groq, LLama, and Cohere, powered by liteLLM.
-
Ollama support for local models.
-
Open source with options for managed service or self hosting.
-
Function calling and file search.
The database stores and queries embeddings for retrieval augmented generation (RAG). For large language model (LLM) tasks, such as embedding generation and chat completion, the database calls OpenAI or other LLMs.
Users interact with the service through the OpenAI SDKs. Store your proprietary data and run assistant API examples on your own Astra DB Serverless database, which can be managed, accessed, and secured.
Prerequisites
This tutorial requires the following:
-
An active Astra account
-
A paid OpenAI account
-
Python 3.10 or later
-
An application token with the Database Administrator role
-
Familiarity with running Python scripts
Run an Assistant API example
-
Create a
.envfile with the environment variables for your selected model.-
OpenAI
-
Perplexity
-
Cohere
-
Bedrock
-
Vertex
-
Other models
.env#!/bin/bash # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token. export APPLICATION_TOKEN= # Go to https://platform.openai.com/api-keys to create a secret key. export OPENAI_API_KEY= # The following might be required for certain models export OPENAI_ORGANIZATION="" export OPENAI_API_BASE="".env#!/bin/bash # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token. export APPLICATION_TOKEN= # Go to https://platform.openai.com/api-keys to create a secret key. export OPENAI_API_KEY= # Go to https://www.perplexity.ai/settings/api to generate a secret key. export PERPLEXITYAI_API_KEY=.env#!/bin/bash # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token. export APPLICATION_TOKEN= # Go to https://platform.openai.com/api-keys to create a secret key. export OPENAI_API_KEY= # Go to https://dashboard.cohere.com/api-keys to create an API key. export COHERE_API_KEY=.env#!/bin/bash # Go to https://astra.datastax.com > Tokens to generate an Administrator User token. export APPLICATION_TOKEN= # Go to https://platform.openai.com/api-keys to create a secret key. export OPENAI_API_KEY= # Bedrock models: https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html export AWS_REGION_NAME= export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY=.env#!/bin/bash # Go to https://astra.datastax.com > Tokens to generate an Administrator User token. export APPLICATION_TOKEN= # Go to https://platform.openai.com/api-keys to create a secret key. export OPENAI_API_KEY= # Required environment variables depend on your project configuration and the model you want to use. # Some variables only apply when accessing private models or models hosted by third-party providers. # Core variables for https://console.cloud.google.com/vertex-ai export GOOGLE_JSON_PATH= export GOOGLE_PROJECT_ID= # If using a third-party SDK that doesn't recognize GOOGLE_PROJECT_ID: export VERTEXAI_PROJECT="" # If required by a third-party SDK or you need to specify a region-specific Vertex endpoint: export VERTEXAI_LOCATION="" # If required, not auto-detected from your environment, or not using GOOGLE_JSON_PATH: export GOOGLE_APPLICATION_CREDENTIALS=""env# Anthropic Claude models - https://console.anthropic.com/settings/keys export ANTHROPIC_API_KEY="" # AI21 models export AI21_API_KEY="" # Aleph Alpha models export ALEPHALPHA_API_KEY="" # Anyscale models export ANYSCALE_API_KEY="" # Azure models export AZURE_API_KEY="" export AZURE_API_BASE="" export AZURE_API_VERSION="" export AZURE_AD_TOKEN="" export AZURE_API_TYPE="" # Baseten models export BASETEN_API_KEY="" # Cloudflare Workers models export CLOUDFLARE_API_KEY="" export CLOUDFLARE_ACCOUNT_ID="" # DeepInfra models export DEEPINFRA_API_KEY="" # DeepSeek models export DEEPSEEK_API_KEY="" # Fireworks AI models export FIREWORKS_AI_API_KEY="" # Gemini models - https://makersuite.google.com/app/apikey export GEMINI_API_KEY="" # Groq models - https://console.groq.com/keys export GROQ_API_KEY="" # Hugging Face models export HUGGINGFACE_API_KEY="" export HUGGINGFACE_API_BASE="" # Mistral models export MISTRAL_API_KEY="" # NLP Cloud models export NLP_CLOUD_API_KEY="" # OpenRouter models export OPENROUTER_API_KEY="" export OR_SITE_URL="" export OR_APP_NAME="" # PaLM models export PALM_API_KEY="" # Replicate models export REPLICATE_API_KEY="" # TogetherAI models export TOGETHERAI_API_KEY="" # Voyage models export VOYAGE_API_KEY="" # WatsonX models export WATSONX_URL="" export WATSONX_APIKEY="" export WATSONX_TOKEN="" export WATSONX_PROJECT_ID="" export WATSONX_DEPLOYMENT_SPACE_ID="" # XInference models export XINFERENCE_API_BASE="" export XINFERENCE_API_KEY="" -
-
Install
poetry:curl -sSL https://install.python-poetry.org | python3 - -
Install the dependencies:
poetry install astra-assistants openai python-dotenv
Build the Assistants API-powered application
-
Import and patch your client:
from openai import OpenAI from astra_assistants import patch client = patch(OpenAI())
Using your token, the system creates an Astra DB Serverless database named assistant_api_db.
The first request can take a few minutes to create your database.
-
Create your assistant:
assistant = client.beta.assistants.create( instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.", model="gpt-4o", )
By default, the service uses Astra DB Serverless as the vector store and OpenAI for embeddings and chat completion.
Third-party LLM support
Astra DB supports many third-party models for embeddings and completion with litellm.
You must pass your service’s API key using api-key and embedding-model headers.
You can pass different models with the corresponding API key in your environment:
-
OpenAI GPT-4o
-
OpenAI GPT-4o-mini
-
Cohere Command
-
Perplexity mistral-7B
-
Perplexity llama2-70B
-
Anthropic Claude
-
Google Gemini
model="gpt-4o"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
model="openai/gpt-4o-mini"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
model="cohere/command-r-plus"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
model="perplexity/mixtral-8x7b-instruct"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
model="perplexity/pplx-70b-online"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
model="anthropic/claude-3-5-sonnet"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
model="gemini/gemini-1.5-flash"
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model=model,
)
For third-party embedding models, DataStax supports the embedding_model in client.files.create:
file = client.files.create(
file=open(
"./test/language_models_are_unsupervised_multitask_learners.pdf",
"rb",
),
purpose="assistants",
embedding_model="text-embedding-3-large",
)
By default, the API uses your Astra DB Serverless database as the vector store and OpenAI for the embeddings and chat completion.