Manage collections and tables

A collection is a set of structured vector data. A table is the same but for non-vector data. You can use collections with a Serverless (Vector) database and tables with a Serverless (Non-Vector) database.

Create a collection

Before you can load vector data, you must have an existing collection.

You cannot create a collection or load data to a specific region using the Astra Portal. You must use the initial region you selected when you created the database.

Here’s how to create an empty collection:

  • Astra Portal

  • Python

  • TypeScript

  • Java

Use the Astra Portal to create a collection.

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

    To create a sample collection with a pre-loaded vector dataset, see Load a sample vector dataset.

  3. Optional: Use the Namespace dropdown to select the namespace where you want to create the collection. Otherwise, leave default_keyspace selected to create the collection in the default namespace.

  4. Click Create Collection.

  5. In the Create collection dialog, enter a name for the new collection in the Collection name field.

  6. Optional: Turn on Vector-enabled collection.

    Next, select an Embedding generation method:

    • Bring my own

    • Use an Astra-hosted provider

    • Use an external provider

    1. Select the Bring my own method if you plan to generate your own embeddings and import them when you add data to your collection.

    2. Enter the number of Dimensions of the vectors in your dataset. Clicking this field reveals a list of common embedding models and their dimensions. You can also enter a custom dimension.

    3. Select a Similarity metric that your embedding model will use to compare vectors.

      The available metrics are:

    You can add an Astra-hosted embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize.

    Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

    Only databases in the AWS us-east-2 region can use the built-in NVIDIA embedding provider integration.

    If available for your database, the NVIDIA embedding provider and its related settings are selected by default when creating a new collection.

    1. Select the NVIDIA embedding provider integration.

    2. Complete the following fields:

      • Embedding model: The model that you want to use to generate embeddings. If only one model is available, it is selected by default.

      • Dimensions: The number of dimensions that you want the generated vectors to have. This field is only configurable if your chosen model supports a range of dimensions.

      • Similarity metric: The method you want to use to calculate vector similarities.

        The available metrics are:

    You can add an external embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize. To do this, you must add an external embedding provider integration to your Astra DB organization, and then you can select the external embedding provider when you create a collection.

    If you turn off Vector-enabled collection, the resulting collection is not vector-enabled. You cannot add vector data to a non-vector collection.

  7. Click Create collection.

    If you get a Collection Limit Reached message, you’ll need to delete a collection before you can create a new one.

An empty collection appears in the list of collections. You can now load data into this collection.

Use the Python client to create a collection. The syntax depends on whether you’re bringing your own embeddings or adding an embedding provider to enable Astra DB vectorize.

  • Bring my own

  • Use an Astra-hosted provider

  • Use an external provider

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
    check_exists=False, # Optional
)
print(f"* Collection: {collection.full_name}\n")

check_exists is optional.

You can add an Astra-hosted embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

Only databases in the AWS us-east-2 region can use the built-in NVIDIA embedding provider integration.

Create a collection integrated with NVIDIA:

collection = database.create_collection(
    "COLLECTION_NAME",
    metric=VectorMetric.COSINE,
    service=CollectionVectorServiceOptions(
        provider="nvidia",
        model_name="NV-Embed-QA",
    ),
)
print(f"* Collection: {collection.full_name}\n")

You can add an external embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize. To do this, you must add an external embedding provider integration to your Astra DB organization, and then you can select the external embedding provider when you create a collection.

Use the TypeScript client to create a collection. The syntax depends on whether you’re bringing your own embeddings or adding an embedding provider to enable Astra DB vectorize.

  • Bring my own

  • Use an Astra-hosted provider

  • Use an external provider

// Schema for the collection (VectorizeDoc adds the $vector field)
interface Idea extends VectorizeDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
    checkExists: false, // Optional
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

checkExists is optional.

You can add an Astra-hosted embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

Only databases in the AWS us-east-2 region can use the built-in NVIDIA embedding provider integration.

Create a collection integrated with NVIDIA:

(async function () {
  const collection = await db.createCollection('COLLECTION_NAME', {
    vector: {
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

You can add an external embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize. To do this, you must add an external embedding provider integration to your Astra DB organization, and then you can select the external embedding provider when you create a collection.

Use the Java client to create a collection. The syntax depends on whether you’re bringing your own embeddings or adding an embedding provider to enable Astra DB vectorize.

  • Bring my own

  • Use an Astra-hosted provider

  • Use an external provider

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

You can add an Astra-hosted embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize.

Currently, DataStax offers an Astra-hosted NVIDIA embedding provider integration for databases in the AWS us-east-2 region.

Only databases in the AWS us-east-2 region can use the built-in NVIDIA embedding provider integration.

Create a collection integrated with NVIDIA:

CollectionOptions.CollectionOptionsBuilder builder = CollectionOptions
 .builder()
 .vectorSimilarity(SimilarityMetric.COSINE)
 .defaultIdType(CollectionIdTypes.UUID)
 .vectorize("nvidia", "NV-Embed-QA");
Collection<Document> collection = db
 .createCollection("COLLECTION_NAME", builder.build());

You can add an external embedding provider integration to your collection to automatically generate embeddings with Astra DB vectorize. To do this, you must add an external embedding provider integration to your Astra DB organization, and then you can select the external embedding provider when you create a collection.

Delete a collection

You can delete a collection that you’re not using. All of the data in the collection is permanently deleted.

  • Astra Portal

  • Python

  • TypeScript

  • Java

Use the Astra Portal to delete a collection.

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. Use the Namespace dropdown to select the namespace that contains the collection you want to delete.

  4. In the Collections section, click more_vert More next to the collection you want to delete. Select Delete collection.

  5. In the Delete collection dialog, enter the name of the collection to confirm that you want to delete it.

  6. Click Delete collection.

The collection and all of its data is deleted permanently.

Use the Python client to delete a collection.

# (Optional) Delete the collection.
drop_result = collection.drop()
print(f"\nCleanup: {drop_result}\n")

Use the TypeScript client to delete a collection.

  // (Optional) Delete the collection
  await db.dropCollection('vector_test');
  console.log('* Collection dropped.');

  // Close the client
  await client.close();

Use the Java client to delete a collection.

    // (Optional) Delete the collection
    collection.drop();
    System.out.println("Deleted the collection");

Create a table

You can use the CQL shell to create a table through the Astra Portal, a CQL driver, or the standalone CQLSH client.

For information about the CQL shell and instructions for using the CQL drivers or CQLSH client to create tables, see Cassandra Query Language (CQL) quickstart.

To use the CQL shell in the Astra Portal to create an empty table, do the following:

  1. In the Astra Portal, go to Databases, and then select your Serverless (Non-Vector) database.

  2. In the Overview tab, note the list of available keyspaces in the Keyspaces section. You will create your table in one of these keyspaces.

  3. Click CQL Console. Wait a few seconds for the token@cqlsh> prompt to appear.

  4. Select the keyspace you want to create your table in.

    use KEYSPACE_NAME;
  5. Create your table.

    CREATE TABLE users (
        firstname text,
        lastname text,
        email text,
        "favorite color" text,
        PRIMARY KEY (firstname, lastname)
    ) WITH CLUSTERING ORDER BY (lastname ASC);

You can now load data into this table.

Delete a table

You can use the CQL shell to delete a table through the Astra Portal, a CQL driver, or the standalone CQLSH client.

For information about the CQL shell and instructions for using the CQL drivers or CQLSH client to delete tables, see Cassandra Query Language (CQL) quickstart.

Deleting a table permanently deletes all data in the table.

To use the CQL shell in the Astra Portal to delete a table, do the following:

  1. In the Astra Portal, go to Databases, and then select your Serverless (Non-Vector) database.

  2. In the Overview tab, note the list of available keyspaces in the Keyspaces section. You will delete a table from one of these keyspaces.

  3. Click CQL Console. Wait a few seconds for the token@cqlsh> prompt to appear.

  4. Select the keyspace containing the table you want to delete.

    use KEYSPACE_NAME;
  5. Get a list of all tables in this keyspace.

    desc tables;
  6. Delete the table and all of its data.

    drop table users;

The table and its data are deleted.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com