Integrate NVIDIA as an embedding provider

Integrate the Astra-hosted NVIDIA embedding provider to enable Astra DB vectorize.

Prerequisites

To use NVIDIA as an embedding provider, you need the following:

Permission to create collections in your Astra organization, such as those granted by the Database Administrator role.
A Serverless (Vector) database in AWS us-east-2 or Google Cloud us-east1.

This integration is available only for databases in AWS us-east-2 or Google Cloud us-east1.

If this is your first time using Astra DB, follow the quickstart for collections (schema-less data) or the quickstart for tables (data with a schema) to create a database and connect to it with an API client.

Add the NVIDIA integration to collections and tables

You can use the NVIDIA integration in collections and tables.

Add the NVIDIA integration to a new collection

Before you can use the NVIDIA integration to generate embeddings, you must add the integration to a new collection.

You cannot change a collection’s embedding provider or embedding generation method after you create it. To use a different embedding provider, you must create a new collection with a different embedding provider integration.

Astra Portal
Python
TypeScript
Java
curl

In the Astra Portal, click the name of your Serverless (Vector) database.
Click Data Explorer.
In the Keyspace field, select the keyspace where you want to create the collection or use default_keyspace.
Click Create Collection.
In the Create collection dialog, enter a name for the collection.
Rules for collection names
- Can contain letters, numbers, and underscores
- Cannot exceed 48 characters
- Must be unique within the keyspace
Enable Vector-enabled collection if it is not already enabled.
For Embedding generation method, select the NVIDIA embedding provider integration.
Complete the following fields:
- Embedding model: The model that you want to use to generate embeddings. If only one model is available, it is selected by default.
- Dimensions: The number of dimensions that you want the generated vectors to have. Typically, the number of dimensions is automatically determined by the model you select.
- Similarity metric: The method you want to use to calculate vector similarity scores. The available metrics are Cosine, Dot Product, and Euclidean.
Click Create collection.

Use the Python client to create a collection that uses the NVIDIA integration.

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the collection
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        metric=VectorMetric.COSINE,
        service=VectorServiceOptions(
            provider="nvidia",
            model_name="NV-Embed-QA",
        )
    )
)

# Create the collection
collection = database.create_collection(
    "COLLECTION_NAME",
    definition=collection_definition,
)

print(f"* Collection: {collection.full_name}\n")

Use the TypeScript client to create a collection that uses the NVIDIA integration.

import { DataAPIClient } from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the collection
const collection_definition = {
  vector: {
    metric: "cosine",
    service: {
      provider: "nvidia",
      modelName: "NV-Embed-QA",
    },
  },
};

(async function () {
  // Create the collection
  const collection = await database.createCollection(
    "COLLECTION_NAME",
    collection_definition
  );
})();

Use the Java client to create a collection that uses the NVIDIA integration.

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;


public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));

    // Define the collection
    CollectionDefinition collectionDefinition =
        new CollectionDefinition()
            .vectorSimilarity(SimilarityMetric.COSINE)
            .vectorize(
                "nvidia",
                "NV-Embed-QA");

    // Create the collection
    Collection<Document> collection = database.createCollection("COLLECTION_NAME", collectionDefinition);
  }
}

Use the Data API to create a collection that uses the NVIDIA integration:

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "metric": "cosine",
        "service": {
          "provider": "nvidia",
          "modelName": "NV-Embed-QA"
        }
      }
    }
  }
}'

If you get a Collection Limit Reached or TOO_MANY_INDEXES message, you must delete a collection before you can create a new one.

Serverless (Vector) databases created after June 24, 2024 can have approximately 10 collections. Databases created before this date can have approximately 5 collections. The collection limit is based on the number of indexes.

Add the NVIDIA integration to a table

You can use the Data API to add the NVIDIA integration to a table in multiple ways:

Add the integration to a vector column when you create a table.
Add the integration to an existing vector column in a table.
Add the integration when you add a vector column to an existing table.

If you are new to the Data API, see Get started with the Data API.

Add the integration to a new table

Python
TypeScript
Java
curl

Use the Python client to create a table with a column that is integrated with NVIDIA:

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.info import (
    CreateTableDefinition,
    ColumnType,
    TableScalarColumnTypeDescriptor,
    TablePrimaryKeyDescriptor,
    TableVectorColumnTypeDescriptor,
    VectorServiceOptions
)

# Instantiate the client
client = DataAPIClient()

# Connect to a database
database = client.get_database(
    os.environ["API_ENDPOINT"],
    token=os.environ["APPLICATION_TOKEN"]
)

# Define the columns and primary key for the table
table_definition = CreateTableDefinition(
    columns={
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": TableVectorColumnTypeDescriptor(
            service=VectorServiceOptions(
                provider="nvidia",
                model_name="NV-Embed-QA",
            ),
        ),
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": TableScalarColumnTypeDescriptor(
            column_type=ColumnType.TEXT
        ),
    },
    # You should change the primary key definition to meet the needs of your data.
    primary_key=TablePrimaryKeyDescriptor(
        partition_by=["TEXT_COLUMN_NAME"],
        partition_sort={}
    ),
)

# Create the table
table = database.create_table(
    "TABLE_NAME",
    definition=table_definition,
)

# Index the vector column so that you can perform a vector search on it.
table.create_vector_index(
    "INDEX_NAME",
    column="VECTOR_COLUMN_NAME",
    options=TableVectorIndexOptions(
        metric=VectorMetric.COSINE,
    ),
)

Use the TypeScript client to create a table with a column that is integrated with NVIDIA:

import {
  DataAPIClient,
  InferTablePrimaryKey,
  InferTableSchema,
  Table,
} from "@datastax/astra-db-ts";

// Instantiate the client
const client = new DataAPIClient();

// Connect to a database
const database = client.db(process.env.API_ENDPOINT, {
  token: process.env.APPLICATION_TOKEN,
});

// Define the columns and primary key for the table
const tableDefinition = Table.schema({
  columns: {
    // This column will store vector embeddings.
    // The NVIDIA integration
    // will automatically generate vector embeddings
    // for any text inserted to this column.
    VECTOR_COLUMN_NAME: {
      type: "vector",
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
    // If you want to store the original text
    // in addition to the generated embeddings
    // you must create a separate column.
    TEXT_COLUMN_NAME: "text",
  },
  // You should change the primary key definition to meet the needs of your data.
  primaryKey: {
    partitionBy: ["TEXT_COLUMN_NAME"],
  },
});

// Infer the TypeScript-equivalent type of the table's schema and primary key
type TableSchema = InferTableSchema<typeof tableDefinition>;
type TablePrimaryKey = InferTablePrimaryKey<typeof tableDefinition>;

(async function () {
  const table = await database.createTable<TableSchema, TablePrimaryKey>(
    'TABLE_NAME',
    { definition: tableDefinition },
  );

  // Index the vector column so that you can perform a vector search on it
  await table.createVectorIndex(
    "INDEX_NAME",
    "VECTOR_COLUMN_NAME",
    {
      options: {
        metric: 'cosine',
      },
    },
  );
})();

Use the Java client to create a table with a column that is integrated with NVIDIA:

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.core.vectorize.VectorServiceOptions;
import com.datastax.astra.client.databases.Database;
import com.datastax.astra.client.tables.Table;
import com.datastax.astra.client.tables.definition.TableDefinition;
import com.datastax.astra.client.tables.definition.columns.ColumnDefinitionVector;
import com.datastax.astra.client.tables.definition.indexes.TableVectorIndexDefinition;
import com.datastax.astra.client.tables.definition.rows.Row;

import java.util.HashMap;
import java.util.Map;

public class Example {

  public static void main(String[] args) {
    // Instantiate the client
    DataAPIClient client = new DataAPIClient(new DataAPIClientOptions());

    // Connect to a database
    Database database =
        client.getDatabase(
            System.getenv("API_ENDPOINT"),
            new DatabaseOptions(System.getenv("APPLICATION_TOKEN"), new DataAPIClientOptions()));
    // Define the columns and primary key for the table
    TableDefinition tableDefinition =
        new TableDefinition()
            // This column will store vector embeddings.
            // The NVIDIA integration
            // will automatically generate vector embeddings
            // for any text inserted to this column.
            .addColumnVector(
                "VECTOR_COLUMN_NAME",
                new ColumnDefinitionVector()
                    .metric(SimilarityMetric.COSINE)
                    .service(
                        new VectorServiceOptions()
                            .provider("nvidia")
                            .modelName("NV-Embed-QA")
                    )
            )
            // If you want to store the original text
            // in addition to the generated embeddings
            // you must create a separate column.
            .addColumnText("TEXT_COLUMN_NAME")
            // You should change the primary key definition to meet the needs of your data.
            .addPartitionBy("TEXT_COLUMN_NAME");

    // Create the table
    Table<Row> table = database.createTable("TABLE_NAME", tableDefinition);

    // Index the vector column so that you can perform a vector search on it.
    TableVectorIndexDefinition definition =
        new TableVectorIndexDefinition().column("VECTOR_COLUMN_NAME").metric(SimilarityMetric.COSINE);

    table.createVectorIndex("INDEX_NAME", definition);
  }
}

Use the Data API to add the NVIDIA integration to a vector column in a table.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createTable": {
    "name": "TABLE_NAME",
    "definition": {
      "columns": {
        # This column will store vector embeddings.
        # The NVIDIA integration
        # will automatically generate vector embeddings
        # for any text inserted to this column.
        "VECTOR_COLUMN_NAME": {
          "type": "vector",
          "service": {
            "provider": "nvidia",
            "modelName": "NV-Embed-QA"
          }
        },
        # If you want to store the original text
        # in addition to the generated embeddings
        # you must create a separate column.
        "TEXT_COLUMN_NAME": "text"
      },
      # You should change the primary key definition to meet the needs of your data.
      "primaryKey": "TEXT_COLUMN_NAME"
    }
  }
}'

Index the vector column so that you can perform a vector search on it.

curl -sS -L -X POST "$API_ENDPOINT/api/json/v1/default_keyspace" \
  --header "Token: $APPLICATION_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "createVectorIndex": {
    "name": "INDEX_NAME",
    "definition": {
      "column": "VECTOR_COLUMN_NAME",
      "options": {
        "metric": "cosine",
        "sourceModel": "NV-Embed-QA"
      }
    }
  }
}'

Add the integration to an existing table

If you want to change the embedding provider integration or credential for a column in a table, you alter the table. Use the same column parameters as demonstrated in the previous example.

Automatically generate vector embeddings

After you add the NVIDIA integration to a collection or table, vector embeddings are automatically generated when you insert data.

For collections, vector embeddings are automatically generated when you insert a document with a $vectorize field.

For tables, vector embeddings are automatically generated when you insert a row with a string value for the column that has the NVIDIA integration added.

For more information, see Ways to insert data in Astra DB Serverless.

Perform a vector search

After loading data, you can perform a vector search by providing a natural-language text string. Vectorize generates an embedding from your text string, and then runs the vector search.

Troubleshoot vectorize

When working with vectorize, including the $vectorize reserved field in the Data API, errors can occur from two sources:

Astra DB: There is an issue within Astra DB, including the Astra platform, the Data API server, Data API clients, or something else.

Some of the most common Astra DB vectorize errors are related to scoped databases. In your vectorize integration settings, make sure your database is in the scope of the credential that you want to use. Scoped database errors don’t apply to the NVIDIA Astra-hosted embedding provider integration.

When using the Data API with collections, make sure you don’t use $vector and $vectorize in the same query. For more information, see the Data API reference for collections.

When using the Data API with tables, you can only run a vector search on one vector column at a time. To generate an embedding from a string, the target vector column must have a defined embedding provider integration. For more information, see the Data API tables references, such as Vector type and Sort clauses for tables.
The embedding provider: The embedding provider encountered an issue while processing the embedding generation request. Astra DB passes these errors to you through the Astra Portal or Data API with a qualifying statement such as The embedding provider returned a HTTP client error.

Possible embedding provider errors include rate limiting, billing or account funding issues, and chunk or token size limits. For more information about these errors, see the embedding provider’s documentation, including the documentation for your chosen model.

Carefully read all error messages to determine the source and possible cause for the issue.

NVIDIA token limit

The model for the NVIDIA integration has a token limit of 512. When loading or querying data in collections or tables that use this integration, your $vectorize strings must not exceed this limit.