Manage collections and tables

You use collections in Serverless (Vector) databases, and you use tables in Serverless (Non-Vector) databases. Typically, collections contain structured vector data, and tables contain non-vector data.

However, you can store non-vector data, including CQL table data, in collections in Serverless (Vector) databases. If you use a collection for non-vector data, don’t mix vector data in the same collection. Store vector and non-vector data in separate collections, namespaces, or databases.

Collections

To manage collections, you need an active Serverless (Vector) database, and you or your application token must have the Organization Administrator or Database Administrator role.

If you use CQL to load table data into a Serverless (Vector) database, you must continue to use CQL to manage that data. You can’t use the Data Explorer or the Data API to manage CQL data.

Create a collection

When you create a collection, you must decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.

You can create a collection in the Astra Portal or with the Data API.

For multi-region databases, you must use the Data API to create collections or load data into regions other than the primary region. In the Astra Portal, you can create collections and load data into the primary region only, which is the region you selected when you created the database. However, because multi-region databases follow an eventual consistent model, data loaded into any region is eventually replicated to the database’s other regions.

  • Astra Portal

  • Python

  • TypeScript

  • Java

  • curl

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. In the Namespace field, select the namespace where you want to create the collection, or use the default namespace, which is named default_keyspace.

  4. Click Create Collection.

  5. In the Create collection dialog, enter a name for the collection. Collection names can have no more than 50 characters.

  6. To store vector data in this collection, turn on Vector-enabled collection, and then select an Embedding generation method.

    If you turn off Vector-enabled collection, your collection is not vector-enabled. You can’t load vector data into a non-vector collection.

    • Bring my own embeddings

    • Use an Astra-hosted provider

    • Use an external provider

    1. If you want to generate your own embeddings and import them when you load data into your collection, select Bring my own.

    2. Enter the number of Dimensions for the vectors in your dataset. You can enter custom dimensions or select from common embedding models and dimensions.

    3. Select a Similarity metric that your embedding model will use to compare vectors. The available metrics are Cosine, Dot Product, and Euclidean.

    To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

    Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

    The built-in NVIDIA embedding provider integration is available in the AWS us-east-2 region only.

    For applicable databases, the NVIDIA embedding provider is the default embedding generation method when you create vector-enabled collections:

    1. Select the NVIDIA embedding provider integration.

    2. Select the Embedding model to use to generate embeddings. If only one model is available, it is selected by default.

    3. Enter the number of Dimensions that you want the generated vectors to have. You can only edit this field if the chosen model supports a range of dimensions.

    4. Select a Similarity metric to use to calculate vector similarities. The available metrics are Cosine, Dot Product, and Euclidean.

    To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

    To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

  7. Click Create collection.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see the Collections reference.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, your must include embeddings (vector arrays) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
    check_exists=False, # Optional.
)
print(f"* Collection: {collection.full_name}\n")

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

The built-in NVIDIA embedding provider integration is available in the AWS us-east-2 region only.

The following example creates a collection integrated with NVIDIA:

collection = database.create_collection(
    "COLLECTION_NAME",
    metric=VectorMetric.COSINE,
    service=CollectionVectorServiceOptions(
        provider="nvidia",
        model_name="NV-Embed-QA",
    ),
)
print(f"* Collection: {collection.full_name}\n")

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Collections reference and the documentation for your embedding provider integration.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see the Collections reference.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, your must include embeddings (vector arrays) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
    checkExists: false, // Optional
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

The built-in NVIDIA embedding provider integration is available in the AWS us-east-2 region only.

The following example creates a collection integrated with NVIDIA:

(async function () {
  const collection = await db.createCollection('COLLECTION_NAME', {
    vector: {
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Collections reference and the documentation for your embedding provider integration.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see the Collections reference.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, your must include embeddings (vector arrays) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

The built-in NVIDIA embedding provider integration is available in the AWS us-east-2 region only.

The following example creates a collection integrated with NVIDIA:

CollectionOptions.CollectionOptionsBuilder builder = CollectionOptions
       .builder()
       .vectorSimilarity(SimilarityMetric.COSINE)
       .vectorize("nvidia", "NV-Embed-QA");
Collection<Document> collection = db
       .createCollection("COLLECTION_NAME", builder.build());

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Collections reference and the documentation for your embedding provider integration.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see the Collections reference.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, your must include embeddings (vector arrays) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
curl -sS --location -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "vector_test",
    "options": {
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  }
}' | jq

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

The built-in NVIDIA embedding provider integration is available in the AWS us-east-2 region only.

The following example creates a collection integrated with NVIDIA:

curl -sS --location -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "metric": "cosine",
        "service": {
          "provider": "nvidia",
          "modelName": "NV-Embed-QA"
        }
      }
    }
  }
}' | jq

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Collections reference and the documentation for your embedding provider integration.

If you get a Collection Limit Reached or TOO_MANY_INDEXES message, you must delete a collection before you can create a new one.

Serverless (Vector) databases created after June 24, 2024 can have up to 10 collections. Databases created before this date can have up to 5 collections. The collection limit is based on Storage Attached Indexing (SAI).

After you create a collection, load data into the collection.

Delete a collection

Deleting a collection permanently deletes all data in the collection.

  • Astra Portal

  • Python

  • TypeScript

  • Java

  • curl

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. In the Namespace field, select the namespace that contains the collection you want to delete.

  4. In the Collections section, locate the collection you want to delete, click more_vert More, and then click Delete collection.

  5. In the Delete collection dialog, enter the collection name, and then click Delete collection.

The collection and all of its data are permanently deleted.

Use the Python client to delete a collection:

# (Optional) Delete the collection.
drop_result = collection.drop()
print(f"\nCleanup: {drop_result}\n")

Use the TypeScript client to delete a collection:

  // (Optional) Delete the collection
  await db.dropCollection('vector_test');
  console.log('* Collection dropped.');

Use the Java client to delete a collection:

    // (Optional) Delete the collection
    collection.drop();
    System.out.println("Deleted the collection");

Use the Data API to delete a collection:

# (Optional) Delete the collection
curl -sS --location -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "deleteCollection": {
    "name": "vector_test"
  }
}' | jq

Tables

To manage tables, you need an active Serverless (Non-Vector) database, and you or your application token must have the Organization Administrator or Database Administrator role.

You can store non-vector data, including CQL table data, in collections in Serverless (Vector) databases.

Create a table

You can use the CQL shell to create a table through the Astra Portal, a CQL driver, or the standalone CQLSH client.

For information about the CQL shell and instructions for using the CQL drivers or CQLSH client to create tables, see Cassandra Query Language (CQL) quickstart.

To use the CQL shell in the Astra Portal to create an empty table, do the following:

  1. In the Astra Portal, go to Databases, and then select your database.

  2. On the Overview tab, in the Keyspaces section, note the name of the keyspace where you want to create the table.

    If you use CQL with a Serverless (Vector) database, you use a namespace name for the keyspace name.

  3. Click CQL Console, and then wait for the token@cqlsh> prompt to appear.

  4. Select the keyspace that you want to create the table in:

    use KEYSPACE_NAME;
  5. Create the table:

    CREATE TABLE users (
        firstname text,
        lastname text,
        email text,
        "favorite color" text,
        PRIMARY KEY (firstname, lastname)
    ) WITH CLUSTERING ORDER BY (lastname ASC);

After you create a table, load data into the table.

Delete a table

You can use the CQL shell to delete a table through the Astra Portal, a CQL driver, or the standalone CQLSH client.

For information about the CQL shell and instructions for using the CQL drivers or CQLSH client to delete tables, see Cassandra Query Language (CQL) quickstart.

Deleting a table permanently deletes all data in the table.

To use the CQL shell in the Astra Portal to delete a table, do the following:

  1. In the Astra Portal, go to Databases, and then select your database.

  2. On the Overview tab, in the Keyspaces section, note the name of the keyspace that contains the table you want to delete.

    If you use CQL with a Serverless (Vector) database, you use a namespace name for the keyspace name.

  3. Click CQL Console, and then wait for the token@cqlsh> prompt to appear.

  4. Select the keyspace that contains the table you want to delete:

    use KEYSPACE_NAME;
  5. Get a list of all tables in the keyspace:

    desc tables;
  6. Delete the table and all of its data:

    drop table TABLE_NAME;

The table and its data are deleted.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com