OpenAI Assistants with persistent vector store

query_builder 20 min

The Astra Assistants API is a drop in replacement for the OpenAI Assistants API. The Astra Assistants API uses a Serverless (Vector) database for persistence, and it supports the following features:

  • Full compatibility with the OpenAI Assistants API v2, including messages, assistants, threads, runs, vector_stores, and files.

  • Third-party embeddings and completion models with hundreds of LLMs, including Anthropic, Gemini, Mistral, Groq, LLama, and Cohere, powered by liteLLM.

  • Ollama support for local models.

  • Open source with options for managed service or self hosting.

  • Function calling and file search.

  • Data privacy and protection.

The database stores and queries embeddings for retrieval augmented generation (RAG). For large language model (LLM) tasks, such as embedding generation and chat completion, the database calls OpenAI or other LLMs.

Users interact with the service through the OpenAI SDKs. Store your proprietary data and run assistant API examples on your own Astra DB Serverless database, which can be managed, accessed, and secured.

Prerequisites

This tutorial requires the following:

Run an Assistant API example

  1. Create a .env file with the environment variables for your selected model.

    • OpenAI

    • Perplexity

    • Cohere

    • Bedrock

    • Vertex

    • Other models

    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
    export APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # The following might be required for certain models
    export OPENAI_ORGANIZATION=""
    export OPENAI_API_BASE=""
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
    export APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Go to https://www.perplexity.ai/settings/api to generate a secret key.
    export PERPLEXITYAI_API_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
    export APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Go to https://dashboard.cohere.com/api-keys to create an API key.
    export COHERE_API_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
    export APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Bedrock models: https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html
    export AWS_REGION_NAME=
    export AWS_ACCESS_KEY_ID=
    export AWS_SECRET_ACCESS_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
    export APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Required environment variables depend on your project configuration and the model you want to use.
    # Some variables only apply when accessing private models or models hosted by third-party providers.
    
    # Core variables for https://console.cloud.google.com/vertex-ai
    export GOOGLE_JSON_PATH=
    export GOOGLE_PROJECT_ID=
    
    # If using a third-party SDK that doesn't recognize GOOGLE_PROJECT_ID:
    export VERTEXAI_PROJECT=""
    
    # If required by a third-party SDK or you need to specify a region-specific Vertex endpoint:
    export VERTEXAI_LOCATION=""
    
    # If required, not auto-detected from your environment, or not using GOOGLE_JSON_PATH:
    export GOOGLE_APPLICATION_CREDENTIALS=""
    env
    # Anthropic Claude models - https://console.anthropic.com/settings/keys
    export ANTHROPIC_API_KEY=""
    
    # AI21 models
    export AI21_API_KEY=""
    
    # Aleph Alpha models
    export ALEPHALPHA_API_KEY=""
    
    # Anyscale models
    export ANYSCALE_API_KEY=""
    
    # Azure models
    export AZURE_API_KEY=""
    export AZURE_API_BASE=""
    export AZURE_API_VERSION=""
    export AZURE_AD_TOKEN=""
    export AZURE_API_TYPE=""
    
    # Baseten models
    export BASETEN_API_KEY=""
    
    # Cloudflare Workers models
    export CLOUDFLARE_API_KEY=""
    export CLOUDFLARE_ACCOUNT_ID=""
    
    # DeepInfra models
    export DEEPINFRA_API_KEY=""
    
    # DeepSeek models
    export DEEPSEEK_API_KEY=""
    
    # Fireworks AI models
    export FIREWORKS_AI_API_KEY=""
    
    # Gemini models - https://makersuite.google.com/app/apikey
    export GEMINI_API_KEY=""
    
    # Groq models - https://console.groq.com/keys
    export GROQ_API_KEY=""
    
    # Hugging Face models
    export HUGGINGFACE_API_KEY=""
    export HUGGINGFACE_API_BASE=""
    
    # Mistral models
    export MISTRAL_API_KEY=""
    
    # NLP Cloud models
    export NLP_CLOUD_API_KEY=""
    
    # OpenRouter models
    export OPENROUTER_API_KEY=""
    export OR_SITE_URL=""
    export OR_APP_NAME=""
    
    # PaLM models
    export PALM_API_KEY=""
    
    # Replicate models
    export REPLICATE_API_KEY=""
    
    # TogetherAI models
    export TOGETHERAI_API_KEY=""
    
    # Voyage models
    export VOYAGE_API_KEY=""
    
    # WatsonX models
    export WATSONX_URL=""
    export WATSONX_APIKEY=""
    export WATSONX_TOKEN=""
    export WATSONX_PROJECT_ID=""
    export WATSONX_DEPLOYMENT_SPACE_ID=""
    
    # XInference models
    export XINFERENCE_API_BASE=""
    export XINFERENCE_API_KEY=""
  2. Install poetry:

    curl -sSL https://install.python-poetry.org | python3 -
  3. Install the dependencies:

    poetry install astra-assistants openai python-dotenv

Build the Assistants API-powered application

  1. Import and patch your client:

    from openai import OpenAI
    from astra_assistants import patch
    client = patch(OpenAI())

Using your token, the system creates an Astra DB Serverless database named assistant_api_db. The first request can take a few minutes to create your database.

  1. Create your assistant:

    assistant = client.beta.assistants.create(
      instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
      model="gpt-4o",
    )

By default, the service uses Astra DB Serverless as the vector store and OpenAI for embeddings and chat completion.

Third-party LLM support

Astra DB supports many third-party models for embeddings and completion with litellm.

You must pass your service’s API key using api-key and embedding-model headers.

You can pass different models with the corresponding API key in your environment:

  • OpenAI GPT-4o

  • OpenAI GPT-4o-mini

  • Cohere Command

  • Perplexity mistral-7B

  • Perplexity llama2-70B

  • Anthropic Claude

  • Google Gemini

model="gpt-4o"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="openai/gpt-4o-mini"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="cohere/command-r-plus"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="perplexity/mixtral-8x7b-instruct"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="perplexity/pplx-70b-online"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="anthropic/claude-3-5-sonnet"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="gemini/gemini-1.5-flash"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

For third-party embedding models, DataStax supports the embedding_model in client.files.create:

file = client.files.create(
    file=open(
        "./test/language_models_are_unsupervised_multitask_learners.pdf",
        "rb",
    ),
    purpose="assistants",
    embedding_model="text-embedding-3-large",
)

By default, the API uses your Astra DB Serverless database as the vector store and OpenAI for the embeddings and chat completion.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM