OpenAI Assistants with persistent vector store

query_builder 20 min

The Astra Assistants API is a drop in replacement for OpenAI’s Assistants API that uses Serverless (Vector) database for persistence and supports the following features:

  • Full compatibility with the OpenAI Assistants API v2, including messages, assistants, threads, runs, vector_stores, and files.

  • Third-party embeddings and completion models with hundreds of LLMs, including Anthropic, Gemini, Mistral, Groq, LLama, and Cohere, powered by liteLLM.

  • Ollama support for local models.

  • Open source with options for managed service or self hosting.

  • Function calling and file search.

  • Data privacy and protection.

The database stores and queries embeddings for retrieval augmented generation (RAG). For large language model (LLM) tasks, such as embedding generation and chat completion, the database calls OpenAI or other LLMs.

Users interact with the service through the OpenAI SDKs. Store your proprietary data and run assistant API examples on your own Astra DB Serverless database, which can be managed, accessed, and secured.

Prerequisites

To complete this tutorial, you’ll need the following:

You should also be proficient in the following tasks:

  • Running a basic Python script.

Run an Assistant API example

  1. Create a .env file with the environment variables for your selected model:

    • OpenAI

    • Perplexity

    • Cohere

    • Bedrock

    • Vertex

    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
    export ASTRA_DB_APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
    export ASTRA_DB_APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Go to https://www.perplexity.ai/settings/api to generate a secret key.
    export PERPLEXITYAI_API_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
    export ASTRA_DB_APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Go to https://dashboard.cohere.com/api-keys to create an API key.
    export COHERE_API_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
    export ASTRA_DB_APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Bedrock models: https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html
    export AWS_REGION_NAME=
    export AWS_ACCESS_KEY_ID=
    export AWS_SECRET_ACCESS_KEY=
    .env
    #!/bin/bash
    
    # Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
    export ASTRA_DB_APPLICATION_TOKEN=
    # Go to https://platform.openai.com/api-keys to create a secret key.
    export OPENAI_API_KEY=
    
    # Vertex AI models: https://console.cloud.google.com/vertex-ai
    export GOOGLE_JSON_PATH=
    export GOOGLE_PROJECT_ID=
  2. Install poetry:

    curl -sSL https://install.python-poetry.org | python3 -
  3. Install the dependencies:

    poetry install astra-assistants openai python-dotenv

Build the Assistants API-powered application

  1. Import and patch your client:

    from openai import OpenAI
    from astra_assistants import patch
    client = patch(OpenAI())

Using your token, the system creates an Astra DB Serverless database named assistant_api_db. The first request can take a few minutes to create your database.

  1. Create your assistant:

    assistant = client.beta.assistants.create(
      instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
      model="gpt-4o",
    )

By default, the service uses Astra DB Serverless as the vector store and OpenAI for embeddings and chat completion.

Third-party LLM support

Astra DB supports many third-party models for embeddings and completion with litellm.

You must pass your service’s API key using api-key and embedding-model headers.

You can pass different models with the corresponding API key in your environment:

  • OpenAI GPT-4o

  • OpenAI GPT-4o-mini

  • Cohere Command

  • Perplexity mistral-7B

  • Perplexity llama2-70B

  • Anthropic Claude

  • Google Gemini

model="gpt-4o"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="openai/gpt-4o-mini"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="cohere/command-r-plus"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="perplexity/mixtral-8x7b-instruct"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="perplexity/pplx-70b-online"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="anthropic/claude-3-5-sonnet"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)
model="gemini/gemini-1.5-flash"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

For third-party embedding models, DataStax supports the embedding_model in client.files.create:

file = client.files.create(
    file=open(
        "./test/language_models_are_unsupervised_multitask_learners.pdf",
        "rb",
    ),
    purpose="assistants",
    embedding_model="text-embedding-3-large",
)

By default, the API uses your Astra DB Serverless database as the vector store and OpenAI for the embeddings and chat completion.

See also

For more details, see this tutorial’s Colab notebook.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com