Integrate OpenAI as an embedding provider

Astra vectorize integrations can automatically generate embeddings for data in collections and tables in Serverless (vector) databases. For more information about how vectorize works, see Manage embedding provider integrations for vectorize.

This guide explains how to configure the OpenAI integration, and then add it to a collection or table.

Prepare your OpenAI account

To use the OpenAI embedding provider integration, you need the following:

  • A paid OpenAI account.

  • Your OpenAI organization ID and project ID, if your OpenAI account belongs to multiple organizations or you use a legacy user API key to access projects.

    You cannot use the default project for this integration. If necessary, create a project in your OpenAI account to use for this integration.

  • At least one OpenAI API key with unrestricted access to the API.

    Astra supports OpenAI user, project, and service account API keys. A service account API key is recommended for better security and control in production environments.

    Don’t remove or restrict the API key in your OpenAI account after you add it to Astra. This can break the integration. For more information about managing, rotating, and removing credentials, see Manage embedding provider integrations for vectorize.

Enable the integration in your Astra organization

Before you can use the OpenAI integration in a collection or table, you must enable the integration in your Astra organization and authorize a Serverless (vector) database to use it.

  1. Recommended: Create a Serverless (vector) database if you don’t already have one.

    You can enable the integration without any Serverless (vector) databases. However, you cannot add the integration to any collections or tables until you authorize a Serverless (vector) database to use the integration. This is handled through the integration’s settings, as described in the following steps. To fully enable and test a vectorize integration, you need at least one Serverless (vector) database.

  2. In the Astra Portal header, click Settings.

  3. In the Settings navigation menu, make sure the enterprise/organization filter is set to the organization that you want to manage.

    If the organization belongs to an enterprise, you must filter on the enterprise, and then click the organization name in the Organizations list.

  4. In the Settings navigation menu, click Integrations.

  5. Click the OpenAI Embedding provider tile.

  6. Click Add integration.

  7. For API key name, enter a unique, meaningful label that provides a brief, clear description of the credential. For example, prod-support-chat-created-oct-2026 or webstore-test-jira-1234.

    The name is immutable, and it is the unique identifier for the credential. You can add more credentials after you enable the integration.

    API key names must follow these rules:

    • Must start and end with a letter or number

    • Can contain letters, numbers underscores, and hyphens

    • Must contain at least 2 characters, but no more than 50 characters

    • Must be unique within the embedding provider integration’s settings.

  8. Enter your OpenAI API key.

  9. In the Add databases to scope section, select a Serverless (vector) database that is authorized to use the integration.

    Specifically, this authorizes the database to use the OpenAI API key to call the embedding provider’s API. This is required to add this integration to a collection or table. If a database isn’t authorized to use any credentials, then the integration isn’t available to that database.

    You can add multiple databases to each API key, you can add multiple API keys, and you can add databases after you enable the integration. New databases aren’t added to credential scopes automatically; you must add them to a credential’s scope manually. For more information, see Manage embedding provider integrations for vectorize.

  10. Click Add Integration to save the credential and activate the integration.

Add the integration to a collection

To use the OpenAI integration to generate embeddings for data in a collection, you must select the integration when you create the collection.

You cannot add a vectorize integration to an existing collection.

You cannot change a collection’s embedding provider or embedding generation method after you create it. To use a different embedding provider, you must create a new collection with a different embedding provider integration.

If you get a Collection Limit Reached or TOO_MANY_INDEXES error, you must delete a collection before you can create a new one.

Serverless (vector) databases created after June 24, 2024 can have approximately 10 collections. Databases created before this date can have approximately 5 collections. The collection limit is based on the number of indexes.

You can create a collection in the Astra Portal or with the Data API.

Use the Astra Portal

  1. In the Astra Portal, click the name of the Serverless (vector) database where you want to use the integration.

  2. Click Data Explorer.

  3. In the Keyspace field, select the keyspace where you want to create the collection.

  4. Click Create Collection.

  5. Enter a name for the collection.

    For collection name rules and more information about creating collections, see Manage collections and tables.

  6. Make sure Vector-enabled collection is enabled.

  7. For Embedding generation method, select the OpenAI embedding provider integration.

    If the integration isn’t listed, see Manage scoped databases and Troubleshoot vectorize integrations.

  8. Configure the integration settings for this collection. For information about each setting, see OpenAI settings for collections and tables.

  9. Click Create collection.

To learn how to generate embeddings and perform vector searches on your integrated collection, see Next steps.

Use the Data API

You can use the Data API to create a collection that uses the OpenAI integration.

The following example uses curl. For Data API client examples and more information, see the Data API reference documentation: Create a collection

Create a collection, specifying the new collection’s name and the embedding provider integration settings:

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "dimension": MODEL_DIMENSIONS,
        "metric": "SIMILARITY_METRIC",
        "service": {
          "provider": "openai",
          "modelName": "MODEL_NAME",
          "authentication": {
            "providerKey": "API_KEY_NAME"
          },
          "parameters": {
            "organizationId": "ORGANIZATION_ID",
            "projectId": "PROJECT_ID"
          }
        }
      }
    }
  }
}'

To learn how to generate embeddings and perform vector searches on your integrated collection, see Next steps.

Add the integration to a table

You can use the Data API to add the OpenAI integration to a table in multiple ways:

  • Create a table that has a vector column with a vectorize integration.

  • Alter a table to add a vector column with a vectorize integration.

  • Alter a table to add or change a vectorize integration on an existing vector column.

The following example uses curl to create a table with a vector column that has a vectorize integration. For Data API client examples and more information, see the Data API reference documentation:

  1. Create a table with a vector column, specifying the table name, schema, and the embedding provider integration settings:

    curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
      --header "Token: $APPLICATION_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
      "createTable": {
        "name": "TABLE_NAME",
        "definition": {
          "columns": {
            # This column will store vector embeddings.
            # The OpenAI integration
            # will automatically generate vector embeddings
            # for any text inserted to this column.
            "VECTOR_COLUMN_NAME": {
              "type": "vector",
              "dimension": MODEL_DIMENSIONS,
              "service": {
                "provider": "openai",
                "modelName": "MODEL_NAME",
                "authentication": {
                  "providerKey": "API_KEY_NAME"
                },
                "parameters": {
                  "organizationId": "ORGANIZATION_ID",
                  "projectId": "PROJECT_ID"
                }
              }
            },
            # If you want to store the original text
            # in addition to the generated embeddings
            # you must create a separate column.
            "TEXT_COLUMN_NAME": "text"
          },
          # You should change the primary key definition to meet the needs of your data.
          "primaryKey": "TEXT_COLUMN_NAME"
        }
      }
    }'

    The same embedding provider integration settings are used to configure a vectorize integration when you create or alter a table.

  2. Index the vector column so that you can perform a vector search on it:

    curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
      --header "Token: $APPLICATION_TOKEN" \
      --header "Content-Type: application/json" \
      --data '{
      "createVectorIndex": {
        "name": "INDEX_NAME",
        "definition": {
          "column": "VECTOR_COLUMN_NAME",
          "options": {
            "metric": "SIMILARITY_METRIC"
          }
        }
      }
    }'

To learn how to generate embeddings and perform vector searches on your integrated vector column, see Next steps.

OpenAI settings for collections and tables

When you add the OpenAI integration to a collection or table, the following settings are available. These settings apply to the scope of one collection or one vector column on a table. Each collection and column can have different values for each setting if needed.

  • Embedding generation method (provider): The embedding provider integration to use to automatically generate embeddings.

  • API key (providerKey): Select a API key to use to call the embedding provider’s API when generating embeddings for this collection. If only one API key is available, that API key is selected by default and this field cannot be changed. If multiple API keys are available, select one API key. For more information, see Manage scoped databases.

  • Organization ID (organizationId): Optional ID of the OpenAI organization that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. For more information, see the OpenAI API reference.

  • Project ID (projectId): Optional ID of the OpenAI project that owns the API key. Only required if your OpenAI account belongs to multiple organizations or if you are using a legacy user API key to access projects. You cannot use the default project. For more information, see the OpenAI API reference.

  • Embedding model (modelName): The model that you want to use to generate embeddings. The available models are: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.

  • Dimensions (dimension): The dimensions of the generated vectors. This field can be set automatically based on the embedding model. You can edit this field if the model supports a range of dimensions or the integration uses an endpoint-defined model. For supported dimensions, see the documentation for your embedding model.

  • Similarity metric (metric): The method to use to calculate vector similarity scores: Cosine, Dot Product, or Euclidean.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM