Quickstart

network_check Beginner
query_builder 15 min

Learn how to create a Serverless (Vector) database, connect to your database, load a few documents with vector embeddings, and perform a similarity search to find documents that are close to the query vector.

Create a Serverless (Vector) database

  1. Create an Astra account or sign in to an existing Astra account.

  2. In the Astra Portal, select Databases in the main navigation.

  3. Click Create Database.

  4. In the Create Database dialog, select the Serverless (Vector) deployment type.

  5. In Configuration, enter a meaningful, human-readable Database name that meets the following requirements:

    • Starts and ends with a letter or number.

    • Contains only letters, numbers, and the following special characters: & + - _ ( ) < > . , @.

      After you create a database, you can’t change its name.

  6. Select your preferred Provider and Region.

    You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.

  7. Click Create Database.

    New databases start in Pending status, and then move to Initializing. Your database is ready to use when it reaches Active status.

  8. Make sure the database is in Active status, and then, in the Database Details section, click Generate Token.

  9. In the Application Token dialog, click content_paste Copy, and then store the token securely. The token format is AstraCS: followed by a unique token string.

    Application tokens created from Database Details have the Database Administrator role for the associated database.

  10. In Database Details, copy your database’s API endpoint. The endpoint format is https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com.

  11. In your terminal, assign your token and API endpoint to environment variables.

    • Linux or macOS

    • Windows

    • Google Colab

    export ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
    export ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
    set ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
    set ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
    import os
    os.environ["ASTRA_DB_API_ENDPOINT"] = "API_ENDPOINT" # Your database API endpoint
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "TOKEN" # Your database application token

Install a client

You can interact with Astra DB Serverless in the Astra Portal or programmatically. This tutorial uses the DataStax Python, TypeScript, and Java clients. For more information, see Connection methods comparison and Get started with the Data API.

Use a package manager to install the client library for your preferred language.

  • Python

  • TypeScript

  • Java

Install the Python client with pip:

  1. Verify that pip is version 23.0 or higher:

    pip --version
  2. If needed, upgrade pip:

    python -m pip install --upgrade pip
  3. Install the astrapy package. You must have Python 3.8 or higher.

    pip install astrapy

Install the TypeScript client:

  1. Verify that Node is version 18 or higher.

    node --version
  2. Install the TypeScript client with your preferred package manager:

    • npm

    • Yarn

    • pnpm

    npm install @datastax/astra-db-ts
    yarn add @datastax/astra-db-ts
    pnpm add @datastax/astra-db-ts

Install the Java client with Maven or Gradle.

  • Maven

  • Gradle

  1. Install Java 11+ and Maven 3.9+.

  2. Create a pom.xml file in the root of your project, and then replace VERSION with the latest version of the Java client. Maven Central

    pom.xml
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                                 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.example</groupId>
      <artifactId>test-java-client</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <!-- The Java client -->
      <dependencies>
        <dependency>
          <groupId>com.datastax.astra</groupId>
          <artifactId>astra-db-java</artifactId>
          <version>VERSION</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
              <executable>java</executable>
              <mainClass>com.example.Quickstart</mainClass>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>
  1. Install Java 11+ and Gradle.

  2. Create a build.gradle file in the root of your project:

    build.gradle
    plugins {
        id 'java'
        id 'application'
    }
    
    repositories {
        mavenCentral()
    }
    
    dependencies {
        implementation 'com.datastax.astra:astra-db-java:1.+'
    }
    
    application {
        mainClassName = 'com.example.Quickstart'
    }

Create a script

Copy the following quickstart script to a Python, TypeScript, or Java file.

  • Python

  • TypeScript

  • Java

To avoid a namespace collision, don’t name your Python client script files astrapy.py.

quickstart.py
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric

# Initialize the client and get a "Database" object
client = DataAPIClient(os.environ["ASTRA_DB_APPLICATION_TOKEN"])
database = client.get_database(os.environ["ASTRA_DB_API_ENDPOINT"])
print(f"* Database: {database.info().name}\n")

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
    check_exists=False, # Optional.
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents with embeddings into the collection.
documents = [
    {
        "text": "Chat bot integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
insertion_result = collection.insert_many(documents)
print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")

# Perform a similarity search.
query_vector = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
    sort={"$vector": query_vector},
    limit=10,
    include_similarity=True,
)
print("Vector search results:")
for document in results:
    print("    ", document)
quickstart.ts
import { DataAPIClient, VectorDoc, UUID } from '@datastax/astra-db-ts';

const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

// Initialize the client and get a "Db" object
const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);

console.log(`* Connected to DB ${db.id}`);

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
    checkExists: false, // Optional
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

  // Insert documents with embeddings into the collection.
  const documents = [
    {
      idea: 'Chat bot integrated sneakers that talk to you',
      $vector: [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
      idea: 'An AI quilt to help you sleep forever',
      $vector: [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
      idea: 'A deep learning display that controls your mood',
      $vector: [0.1, 0.05, 0.08, 0.3, 0.6],
    },
  ];

  const inserted = await collection.insertMany(documents);
  console.log(`* Inserted ${inserted.insertedCount} items.`);

  // Perform a similarity search
  const cursor = await collection.find({}, {
    sort: { $vector: [0.15, 0.1, 0.1, 0.35, 0.55] },
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:');
  for await (const doc of cursor) {
    console.log('  ', doc.idea, doc.$similarity);
  }
})();
src/main/java/com/example/Quickstart.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.SimilarityMetric;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client
    DataAPIClient client = new DataAPIClient(astraToken);
    System.out.println("Connected to AstraDB");

    Database db = client.getDatabase(astraApiEndpoint);
    System.out.println("Connected to Database.");

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

    // Insert documents with embeddings into the collection
    collection.insertMany(
            new Document("1")
                    .append("text", "Chat bot integrated sneakers that talk to you")
                    .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
            new Document("2")
                    .append("text", "An AI quilt to help you sleep forever")
                    .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
            new Document("3")
                    .append("text", "A deep learning display that controls your mood")
                    .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
    System.out.println("Inserted documents into the collection");

    // Perform a similarity search
    FindIterable<Document> resultsSet = collection.find(
            new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
            10
    );
    resultsSet.forEach(System.out::println);

  }
}

Explore the script

The quickstart script does the following:

  1. Initializes the client and connects to your database. For more information, see Get started with the Data API.

  2. Creates a collection named vector_test that uses the default similarity metric cosine and dimensionality of 5. For more information about collection settings, see Manage collections and tables.

  3. Loads documents (structured vector data) into the collection with pre-generated vector embeddings. With Astra DB, you can bring your own embeddings or use vectorize to automatically generate embeddings. For more information about loading data and supported data types, see Load your data.

  4. Performs a similarity search to find documents that are close to the specified query vector. This search returns a list of documents sorted by their similarity to the query vector with the most similar documents first. The calculation uses the similarity metric specified in the collection settings. For more information, see Perform a vector search.

Run the script

Run the quickstart script.

  • Python

  • TypeScript

  • Java

python quickstart.py
  • npm

  • Yarn

  • pnpm

npx tsx quickstart.ts
yarn dlx tsx quickstart.ts
pnpx dlx tsx quickstart.ts
  • Maven

  • Gradle

mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
gradle build
gradle run

Next steps

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com