Manage collections and tables

Collections store semi-structured data, in the form of documents, in Serverless (Vector) databases.

Tables store structured data, in the form of rows, in Serverless (Vector) and Serverless (Non-Vector) databases.

Collections and tables in Serverless (Vector) databases can store both vector and non-vector data, if the data is relevant. Consider the needs of your application, and then decide how to segregate your data into separate collections, tables, keyspaces, and databases.

You can’t use the Data Explorer in the Astra Portal to create or manage tables in Serverless (Vector) databases.

Instead, you must use the Data API or the CQL shell.

For Serverless (Non-Vector) databases, you must use the CQL shell.

Collections

To manage collections, you must have the appropriate permissions, such as the Database Administrator role. To programmatically manage collections, you need an application token with sufficient permissions.

Create a collection

When you create a collection, you decide if the collection can store structured vector data. This is known as a vector-enabled collection. For vector-enabled collections, you also decide how to provide embeddings. You can bring your own embeddings and automatically generate embeddings with vectorize. You must decide which options you need when you create the collection. For more information, see Vector and vectorize.

You can create a collection in the Astra Portal or with the Data API.

For multi-region databases, you must use the Data API to create collections or load data into regions other than the primary region. In the Astra Portal, you can create collections and load data into the primary region only, which is the region you selected when you created the database. However, because multi-region databases follow an eventual consistent model, data loaded into any region is eventually replicated to the database’s other regions.

  • Astra Portal

  • Python

  • TypeScript

  • Java

  • curl

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. In the Keyspace field, select the keyspace where you want to create the collection or use default_keyspace.

  4. Click Create Collection.

  5. In the Create collection dialog, enter a name for the collection. Collection names can contain no more than 50 characters, including letters, numbers, and underscores.

  6. To store vector data in this collection, turn on Vector-enabled collection, and then select an Embedding generation method.

    If you turn off Vector-enabled collection, your collection is not vector-enabled. You can’t load vector data into a non-vector collection.

    • Bring my own embeddings

    • Use an Astra-hosted provider

    • Use an external provider

    1. If you want to generate your own embeddings and import them when you load data into your collection, select Bring my own.

    2. Enter the number of Dimensions for the vectors in your dataset. You can enter custom dimensions or select from common embedding models and dimensions.

    3. Select a Similarity metric that your embedding model will use to compare vectors. The available metrics are Cosine, Dot Product, and Euclidean.

    To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

    The Astra-hosted NVIDIA embedding provider integration is available only for databases in AWS us-east-2 or GCP us-east1.

    For applicable databases, the NVIDIA embedding provider is the default embedding generation method when you create vector-enabled collections:

    1. Select the NVIDIA embedding provider integration.

    2. Select the Embedding model to use to generate embeddings. If only one model is available, it is selected by default.

    3. Enter the number of Dimensions that you want the generated vectors to have. You can only edit this field if the chosen model supports a range of dimensions.

    4. Select a Similarity metric to use to calculate vector similarities. The available metrics are Cosine, Dot Product, and Euclidean.

    To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

    To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

  7. Click Create collection.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see Work with collections.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, you include embeddings (vectors) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
)
print(f"* Collection: {collection.full_name}\n")

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

The Astra-hosted NVIDIA embedding provider integration is available only for databases in AWS us-east-2 or GCP us-east1.

The following example creates a collection integrated with NVIDIA:

collection = database.create_collection(
    "COLLECTION_NAME",
    metric=VectorMetric.SIMILARITY_METRIC,
    service=CollectionVectorServiceOptions(
        provider="nvidia",
        model_name="NV-Embed-QA",
    ),
)
print(f"* Collection: {collection.full_name}\n")

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Work with collections and the documentation for your embedding provider integration.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see Work with collections.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, you include embeddings (vectors) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
  });
  console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`);

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

The Astra-hosted NVIDIA embedding provider integration is available only for databases in AWS us-east-2 or GCP us-east1.

The following example creates a collection integrated with NVIDIA:

(async function () {
  const collection = await db.createCollection('COLLECTION_NAME', {
    vector: {
      service: {
        provider: 'nvidia',
        modelName: 'NV-Embed-QA',
      },
    },
  });
  console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`);

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Work with collections and the documentation for your embedding provider integration.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see Work with collections.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, you include embeddings (vectors) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

The Astra-hosted NVIDIA embedding provider integration is available only for databases in AWS us-east-2 or GCP us-east1.

The following example creates a collection integrated with NVIDIA:

CollectionOptions.CollectionOptionsBuilder builder = CollectionOptions
       .builder()
       .vectorSimilarity(SimilarityMetric.SIMILARITY_METRIC)
       .vectorize("nvidia", "NV-Embed-QA");
Collection<Document> collection = db
       .createCollection("COLLECTION_NAME", builder.build());

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Work with collections and the documentation for your embedding provider integration.

The Data API syntax depends on your embedding generation method and other configurations.

For more information, examples, and parameters, see Work with collections.

  • Bring my own embeddings

  • Use an Astra-hosted provider

  • Use an external provider

If you choose to bring your own embeddings, you include embeddings (vectors) when you load vector data into your collection.

The following example creates a vector-enabled collection in a Serverless (Vector) database, and it requires you to provide embeddings when you load data:

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
curl -sS -L -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "vector_test",
    "options": {
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  }
}' | jq

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

The Astra-hosted NVIDIA embedding provider integration is available only for databases in AWS us-east-2 or GCP us-east1.

The following example creates a collection integrated with NVIDIA:

curl -sS -L -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "createCollection": {
    "name": "COLLECTION_NAME",
    "options": {
      "vector": {
        "metric": "cosine",
        "service": {
          "provider": "nvidia",
          "modelName": "NV-Embed-QA"
        }
      }
    }
  }
}' | jq

To automatically generate embeddings with Astra DB vectorize, add an embedding provider integration to your collection. Embedding provider integrations are either Astra-hosted or external.

To use an external embedding provider, you must add an embedding provider integration to your Astra DB organization, and then you can select that embedding provider when you create a collection.

For more information and examples, see the Work with collections and the documentation for your embedding provider integration.

If you get a Collection Limit Reached or TOO_MANY_INDEXES message, you must delete a collection before you can create a new one.

Serverless (Vector) databases created after June 24, 2024 can have approximately 10 collections. Databases created before this date can have approximately 5 collections. The collection limit is based on the number of indexes.

After you create a collection, load data into the collection.

Delete a collection

Deleting a collection permanently deletes all data in the collection.

  • Astra Portal

  • Python

  • TypeScript

  • Java

  • curl

  1. In the Astra Portal, go to Databases, and then select your Serverless (Vector) database.

  2. Click Data Explorer.

  3. In the Keyspace field, select the keyspace that contains the collection you want to delete.

  4. In the Collections section, locate the collection you want to delete, click More, and then click Delete collection.

  5. In the Delete collection dialog, enter the collection name, and then click Delete collection.

The collection and all of its data are permanently deleted.

Use the Data API Python client to delete a collection:

# (Optional) Delete the collection.
drop_result = collection.drop()
print(f"\nCleanup: {drop_result}\n")

Use the Data API TypeScript client to delete a collection:

  // (Optional) Delete the collection
  await db.dropCollection('vector_test');
  console.log('* Collection dropped.');

Use the Data API Java client to delete a collection:

    // (Optional) Delete the collection
    collection.drop();
    System.out.println("Deleted the collection");

Use the Data API to delete a collection:

# (Optional) Delete the collection
curl -sS -L -X POST "$ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace" \
--header "Token: $ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "deleteCollection": {
    "name": "vector_test"
  }
}' | jq

Tables

You can create tables in Serverless (Non-Vector) and Serverless (Vector) databases.

To manage tables, you must have the appropriate permissions, such as the Database Administrator role. To manage tables programmatically, you need an application token with sufficient permissions.

Create a table in the Astra Portal

To use the CQL shell in the Astra Portal to create a table, do the following:

  1. In the Astra Portal navigation menu, select your database.

  2. Note the name of the keyspace where you want to create the table.

  3. Click CQL Console, and then wait for the token@cqlsh> prompt to appear.

  4. Select the keyspace that you want to create the table in:

    use KEYSPACE_NAME;
  5. Create a table:

    CREATE TABLE users (
        firstname text,
        lastname text,
        email text,
        "favorite color" text,
        PRIMARY KEY (firstname, lastname)
    ) WITH CLUSTERING ORDER BY (lastname ASC);

After you create a table, load data into the table.

Delete a table in the Astra Portal

Deleting a table permanently deletes all data in the table.

To use the CQL shell in the Astra Portal to delete a table, do the following:

  1. In the Astra Portal, go to Databases, and then select your database.

  2. Note the name of the keyspace that contains the table you want to delete.

  3. Click CQL Console, and then wait for the token@cqlsh> prompt to appear.

  4. Select the keyspace that contains the table you want to delete:

    use KEYSPACE_NAME;
  5. Get a list of all tables in the keyspace:

    desc tables;
  6. Delete the table and all of its data:

    drop table TABLE_NAME;

The table and its data are deleted.

Manage tables programmatically

In addition to the built-in CQL shell in the Astra Portal, you can use the standalone CQL shell, a CQL driver, or the Data API to manage tables:

  • For Serverless (Vector) databases, you can use the Data API, the CQL shell, or a driver.

  • For Serverless (Non-Vector) databases, you can use the CQL shell or a driver.

For information about the CQL shell and drivers, see Cassandra Query Language (CQL) for Astra DB.

For information about the Data API and clients, see Work with tables.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com