OpenAI Assistants with persistent vector store

query_builder 20 min

The Astra Assistants API is a drop in replacement for the OpenAI Assistants API. The Astra Assistants API uses a Serverless (Vector) database for persistence, and it supports the following features:

Full compatibility with the OpenAI Assistants API v2, including messages, assistants, threads, runs, vector_stores, and files.
Third-party embeddings and completion models with hundreds of LLMs, including Anthropic, Gemini, Mistral, Groq, LLama, and Cohere, powered by liteLLM.
Ollama support for local models.
Open source with options for managed service or self hosting.
Function calling and file search.
Data privacy and protection.

The database stores and queries embeddings for retrieval augmented generation (RAG). For large language model (LLM) tasks, such as embedding generation and chat completion, the database calls OpenAI or other LLMs.

Users interact with the service through the OpenAI SDKs. Store your proprietary data and run assistant API examples on your own Astra DB Serverless database, which can be managed, accessed, and secured.

Prerequisites

This tutorial requires the following:

An active Astra account
A paid OpenAI account
Python 3.10 or later
An application token with the Database Administrator role
Familiarity with running Python scripts

Run an Assistant API example

Create a .env file with the environment variables for your selected model.

.env

#!/bin/bash

# Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
export APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# The following might be required for certain models
export OPENAI_ORGANIZATION=""
export OPENAI_API_BASE=""

.env

#!/bin/bash

# Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
export APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Go to https://www.perplexity.ai/settings/api to generate a secret key.
export PERPLEXITYAI_API_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
export APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Go to https://dashboard.cohere.com/api-keys to create an API key.
export COHERE_API_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
export APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Bedrock models: https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html
export AWS_REGION_NAME=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
export APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Required environment variables depend on your project configuration and the model you want to use.
# Some variables only apply when accessing private models or models hosted by third-party providers.

# Core variables for https://console.cloud.google.com/vertex-ai
export GOOGLE_JSON_PATH=
export GOOGLE_PROJECT_ID=

# If using a third-party SDK that doesn't recognize GOOGLE_PROJECT_ID:
export VERTEXAI_PROJECT=""

# If required by a third-party SDK or you need to specify a region-specific Vertex endpoint:
export VERTEXAI_LOCATION=""

# If required, not auto-detected from your environment, or not using GOOGLE_JSON_PATH:
export GOOGLE_APPLICATION_CREDENTIALS=""

env

# Anthropic Claude models - https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY=""

# AI21 models
export AI21_API_KEY=""

# Aleph Alpha models
export ALEPHALPHA_API_KEY=""

# Anyscale models
export ANYSCALE_API_KEY=""

# Azure models
export AZURE_API_KEY=""
export AZURE_API_BASE=""
export AZURE_API_VERSION=""
export AZURE_AD_TOKEN=""
export AZURE_API_TYPE=""

# Baseten models
export BASETEN_API_KEY=""

# Cloudflare Workers models
export CLOUDFLARE_API_KEY=""
export CLOUDFLARE_ACCOUNT_ID=""

# DeepInfra models
export DEEPINFRA_API_KEY=""

# DeepSeek models
export DEEPSEEK_API_KEY=""

# Fireworks AI models
export FIREWORKS_AI_API_KEY=""

# Gemini models - https://makersuite.google.com/app/apikey
export GEMINI_API_KEY=""

# Groq models - https://console.groq.com/keys
export GROQ_API_KEY=""

# Hugging Face models
export HUGGINGFACE_API_KEY=""
export HUGGINGFACE_API_BASE=""

# Mistral models
export MISTRAL_API_KEY=""

# NLP Cloud models
export NLP_CLOUD_API_KEY=""

# OpenRouter models
export OPENROUTER_API_KEY=""
export OR_SITE_URL=""
export OR_APP_NAME=""

# PaLM models
export PALM_API_KEY=""

# Replicate models
export REPLICATE_API_KEY=""

# TogetherAI models
export TOGETHERAI_API_KEY=""

# Voyage models
export VOYAGE_API_KEY=""

# WatsonX models
export WATSONX_URL=""
export WATSONX_APIKEY=""
export WATSONX_TOKEN=""
export WATSONX_PROJECT_ID=""
export WATSONX_DEPLOYMENT_SPACE_ID=""

# XInference models
export XINFERENCE_API_BASE=""
export XINFERENCE_API_KEY=""

Install poetry:

curl -sSL https://install.python-poetry.org | python3 -

Install the dependencies:

poetry install astra-assistants openai python-dotenv

Build the Assistants API-powered application

Import and patch your client:

from openai import OpenAI
from astra_assistants import patch
client = patch(OpenAI())

Using your token, the system creates an Astra DB Serverless database named assistant_api_db. The first request can take a few minutes to create your database.

Create your assistant:

assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4o",
)

By default, the service uses Astra DB Serverless as the vector store and OpenAI for embeddings and chat completion.

Third-party LLM support

Astra DB supports many third-party models for embeddings and completion with litellm.

You must pass your service’s API key using api-key and embedding-model headers.

You can pass different models with the corresponding API key in your environment:

model="gpt-4o"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="openai/gpt-4o-mini"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="cohere/command-r-plus"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="perplexity/mixtral-8x7b-instruct"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="perplexity/pplx-70b-online"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="anthropic/claude-3-5-sonnet"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="gemini/gemini-1.5-flash"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

For third-party embedding models, DataStax supports the embedding_model in client.files.create:

file = client.files.create(
    file=open(
        "./test/language_models_are_unsupervised_multitask_learners.pdf",
        "rb",
    ),
    purpose="assistants",
    embedding_model="text-embedding-3-large",
)

By default, the API uses your Astra DB Serverless database as the vector store and OpenAI for the embeddings and chat completion.

OpenAI Assistants with persistent vector store

Prerequisites

Run an Assistant API example

Build the Assistants API-powered application

Third-party LLM support

Was this helpful?

Give Feedback