OpenAI Assistants with persistent vector store

query_builder 20 min

The Astra Assistants API is a drop in replacement for the OpenAI Assistants API. The Astra Assistants API uses a Serverless (vector) database for persistence, and it supports the following features:

Full compatibility with the OpenAI Assistants API v2, including messages, assistants, threads, runs, vector_stores, and files.
Third-party embeddings and completion models with hundreds of LLMs, including Anthropic, Gemini, Mistral, Groq, LLama, and Cohere, powered by liteLLM.
Ollama support for local models.
Open source with options for managed service or self hosting.
Function calling and file search.
Data privacy and protection.

The database stores and queries embeddings for retrieval augmented generation (RAG). For large language model (LLM) tasks, such as embedding generation and chat completion, the database calls OpenAI or other LLMs.

Users interact with the service through the OpenAI SDKs. Store your proprietary data and run assistant API examples on your own Astra DB Serverless database, which can be managed, accessed, and secured.

Prerequisites

This tutorial requires the following:

An active Astra account.
A paid OpenAI account and an OpenAI API key.
Python 3.10 or later.
An application token with the Database Administrator role. For programmatic access only, create a token with the API Administrator User role.
Familiarity with running Python scripts.

Set environment variables

Create a .env file for this tutorial.
In your .env file, set an environment variable for your Astra application token:
.env
```
#!/bin/bash

export APPLICATION_TOKEN=
```
Set an environment variable for your OpenAI API key:
.env
```
export OPENAI_API_KEY=
```

Set any additional environment variables required to authorize specific models:

.env

# Amazon Bedrock models: https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html
export AWS_REGION_NAME=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

# Anthropic Claude models - https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY=""

# AI21 models
export AI21_API_KEY=""

# Aleph Alpha models
export ALEPHALPHA_API_KEY=""

# Anyscale models
export ANYSCALE_API_KEY=""

# Azure models
export AZURE_API_KEY=""
export AZURE_API_BASE=""
export AZURE_API_VERSION=""
export AZURE_AD_TOKEN=""
export AZURE_API_TYPE=""

# Baseten models
export BASETEN_API_KEY=""

# Cloudflare Workers models
export CLOUDFLARE_API_KEY=""
export CLOUDFLARE_ACCOUNT_ID=""

# Cohere - Go to https://dashboard.cohere.com/api-keys to create an API key.
export COHERE_API_KEY=

# DeepInfra models
export DEEPINFRA_API_KEY=""

# DeepSeek models
export DEEPSEEK_API_KEY=""

# Fireworks AI models
export FIREWORKS_AI_API_KEY=""

# Gemini models - https://makersuite.google.com/app/apikey
export GEMINI_API_KEY=""

# Groq models - https://console.groq.com/keys
export GROQ_API_KEY=""

# Hugging Face models
export HUGGINGFACE_API_KEY=""
export HUGGINGFACE_API_BASE=""

# Mistral models
export MISTRAL_API_KEY=""

# NLP Cloud models
export NLP_CLOUD_API_KEY=""

# OpenAI
# The following might be required for certain models
export OPENAI_ORGANIZATION=""
export OPENAI_API_BASE=""

# OpenRouter models
export OPENROUTER_API_KEY=""
export OR_SITE_URL=""
export OR_APP_NAME=""

# PaLM models
export PALM_API_KEY=""

# Perplexity - Go to https://www.perplexity.ai/settings/api to generate a secret key.
export PERPLEXITYAI_API_KEY=

# Replicate models
export REPLICATE_API_KEY=""

# TogetherAI models
export TOGETHERAI_API_KEY=""

# Vertex AI - required environment variables depend on your project configuration and the model you want to use.
# Some variables only apply when accessing private models or models hosted by third-party providers.
# Core variables for https://console.cloud.google.com/vertex-ai:
export GOOGLE_JSON_PATH=
export GOOGLE_PROJECT_ID=
# If using a third-party SDK that doesn't recognize GOOGLE_PROJECT_ID:
export VERTEXAI_PROJECT=""
# If required by a third-party SDK or you need to specify a region-specific Vertex endpoint:
export VERTEXAI_LOCATION=""
# If required, not auto-detected from your environment, or not using GOOGLE_JSON_PATH:
export GOOGLE_APPLICATION_CREDENTIALS=""

# Voyage models
export VOYAGE_API_KEY=""

# WatsonX models
export WATSONX_URL=""
export WATSONX_APIKEY=""
export WATSONX_TOKEN=""
export WATSONX_PROJECT_ID=""
export WATSONX_DEPLOYMENT_SPACE_ID=""

# XInference models
export XINFERENCE_API_BASE=""
export XINFERENCE_API_KEY=""

Install dependencies

Install poetry:

curl -sSL https://install.python-poetry.org | python3 -

Install the dependencies:

poetry install astra-assistants openai python-dotenv

Build the Assistants API-powered application

Import and patch your client:
```
from openai import OpenAI
from astra_assistants import patch
client = patch(OpenAI())
```
Using your token, the system creates an Astra DB Serverless database named assistant_api_db. The first request can take a few minutes to create your database.

Create your assistant:

assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4o",
)

By default, the service uses Astra DB Serverless as the vector store and OpenAI for embeddings and chat completion.

Enable third-party model support

By default, the API uses your Astra DB Serverless database as the vector store and OpenAI for the embeddings and chat completion.

However, Astra DB supports many third-party models for embeddings and completion with litellm.

To enable these models, pass the provider’s API key using api-key and embedding-model headers, as shown in the following examples. You can set the required API keys as environment variables.

Language models

Anthropic Claude

model="anthropic/claude-3-5-sonnet"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

Cohere Command

model="cohere/command-r-plus"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

Google Gemini

model="gemini/gemini-1.5-flash"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

OpenAI GPT-4o

model="gpt-4o"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

OpenAI GPT-4o-mini

model="openai/gpt-4o-mini"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

Perplexity mistral-7B

model="perplexity/mixtral-8x7b-instruct"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

Perplexity llama2-70B

model="perplexity/pplx-70b-online"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

Embedding models

For third-party embedding models, DataStax supports the embedding_model in client.files.create:

file = client.files.create(
    file=open(
        "./test/language_models_are_unsupervised_multitask_learners.pdf",
        "rb",
    ),
    purpose="assistants",
    embedding_model="text-embedding-3-large",
)