Quickstart

network_check Beginner
query_builder 15 min

Learn how to create a Serverless (Vector) database, connect to your database, load a few documents with vector embeddings, and perform a similarity search to find documents that are close to the query vector.

Create a Serverless (Vector) database

  1. Sign in or create an Astra DB account.

  2. In the Astra Portal navigation menu, click Databases.

  3. Click Create Database, and then select the Serverless (Vector) deployment type.

  4. Enter a meaningful, human-readable Database name.

    After you create a database, you can’t change its name.

    Database names must start and end with a letter or number, and contain no more than 50 characters, including letters, numbers, spaces, and the special characters & + - _ ( ) < > . , @.

  5. Select a Provider and Region to host your database.

    On the Free plan, you can access a limited set of supported regions. To access lock Locked regions, you must upgrade your subscription plan.

    To minimize latency in production databases, select a region that is close to your application’s users.

  6. Click Create Database.

    New databases start in Pending status, and then move to Initializing. Your database is ready to use when it reaches Active status.

  7. Make sure the database is in Active status, and then, in the Database Details section, click Generate Token.

  8. In the Application Token dialog, click content_paste Copy, and then store the token securely. The token format is AstraCS: followed by a unique token string.

    Application tokens created from Database Details have the Database Administrator role for the associated database.

  9. In Database Details, copy your database’s API endpoint. The endpoint format is https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com.

  10. In your terminal, assign your token and API endpoint to environment variables.

    • Linux or macOS

    • Windows

    • Google Colab

    export ASTRA_DB_API_ENDPOINT=API_ENDPOINT
    export ASTRA_DB_APPLICATION_TOKEN=TOKEN
    set ASTRA_DB_API_ENDPOINT=API_ENDPOINT
    set ASTRA_DB_APPLICATION_TOKEN=TOKEN
    import os
    os.environ["ASTRA_DB_API_ENDPOINT"] = "API_ENDPOINT"
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "TOKEN"

Install a client

You can interact with Astra DB Serverless in the Astra Portal or programmatically. This tutorial uses the Astra DB Python, TypeScript, and Java clients. For more information, see Compare connection methods and Get started with the Data API.

Use a package manager to install the client library for your preferred language.

  • Python

  • TypeScript

  • Java

Install the Python client with pip:

  1. Verify that pip is version 23.0 or later:

    pip --version
  2. If needed, upgrade pip:

    python -m pip install --upgrade pip
  3. Install the astrapy package Latest release. You must have Python 3.8 or later.

    pip install astrapy

Install the TypeScript client:

  1. Verify that Node is version 18 or later:

    node --version
  2. Install astra-db-ts Latest release with your preferred package manager:

    • npm

    • Yarn

    • pnpm

    npm install @datastax/astra-db-ts
    yarn add @datastax/astra-db-ts
    pnpm add @datastax/astra-db-ts

Install the Java client with Maven or Gradle.

  • Maven

  • Gradle

  1. Install Java 11 or later and Maven 3.9 or later.

  2. Create a pom.xml file in the root of your project, and then replace VERSION with the latest version of astra-db-java Maven Central.

    pom.xml
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                                 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.example</groupId>
      <artifactId>test-java-client</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <!-- The Java client -->
      <dependencies>
        <dependency>
          <groupId>com.datastax.astra</groupId>
          <artifactId>astra-db-java</artifactId>
          <version>VERSION</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
              <executable>java</executable>
              <mainClass>com.example.Quickstart</mainClass>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>
  1. Install Gradle and Java 11 or later.

  2. Create a build.gradle file in the root of your project:

    build.gradle
    plugins {
        id 'java'
        id 'application'
    }
    
    repositories {
        mavenCentral()
    }
    
    dependencies {
        implementation 'com.datastax.astra:astra-db-java:1.+'
    }
    
    application {
        mainClassName = 'com.example.Quickstart'
    }

Create a script

Copy the following quickstart script to a Python, TypeScript, or Java file.

  • Python

  • TypeScript

  • Java

To avoid a namespace collision, don’t name your Python client script files astrapy.py.

quickstart.py
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric

# Initialize the client and get a "Database" object
client = DataAPIClient(os.environ["ASTRA_DB_APPLICATION_TOKEN"])
database = client.get_database(os.environ["ASTRA_DB_API_ENDPOINT"])
print(f"* Database: {database.info().name}\n")

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
    check_exists=False, # Optional.
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents with embeddings into the collection.
documents = [
    {
        "text": "Chat bot integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
insertion_result = collection.insert_many(documents)
print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")

# Perform a similarity search.
query_vector = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
    sort={"$vector": query_vector},
    limit=10,
    include_similarity=True,
)
print("Vector search results:")
for document in results:
    print("    ", document)
quickstart.ts
import { DataAPIClient, VectorDoc, UUID } from '@datastax/astra-db-ts';

const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

// Initialize the client and get a "Db" object
const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);

console.log(`* Connected to DB ${db.id}`);

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
    checkExists: false, // Optional
  });
  console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`);

  // Insert documents with embeddings into the collection.
  const documents = [
    {
      idea: 'Chat bot integrated sneakers that talk to you',
      $vector: [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
      idea: 'An AI quilt to help you sleep forever',
      $vector: [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
      idea: 'A deep learning display that controls your mood',
      $vector: [0.1, 0.05, 0.08, 0.3, 0.6],
    },
  ];

  const inserted = await collection.insertMany(documents);
  console.log(`* Inserted ${inserted.insertedCount} items.`);

  // Perform a similarity search
  const cursor = await collection.find({}, {
    sort: { $vector: [0.15, 0.1, 0.1, 0.35, 0.55] },
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:');
  for await (const doc of cursor) {
    console.log('  ', doc.idea, doc.$similarity);
  }
})();
src/main/java/com/example/Quickstart.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.SimilarityMetric;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client
    DataAPIClient client = new DataAPIClient(astraToken);
    System.out.println("Connected to AstraDB");

    Database db = client.getDatabase(astraApiEndpoint);
    System.out.println("Connected to Database.");

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

    // Insert documents with embeddings into the collection
    collection.insertMany(
            new Document("1")
                    .append("text", "Chat bot integrated sneakers that talk to you")
                    .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
            new Document("2")
                    .append("text", "An AI quilt to help you sleep forever")
                    .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
            new Document("3")
                    .append("text", "A deep learning display that controls your mood")
                    .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
    System.out.println("Inserted documents into the collection");

    // Perform a similarity search
    FindIterable<Document> resultsSet = collection.find(
            new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
            10
    );
    resultsSet.forEach(System.out::println);

  }
}

Explore the script

The quickstart script does the following:

  1. Initializes the client and connects to your database. For more information, see Get started with the Data API.

  2. Creates a collection named vector_test that uses the default similarity metric cosine and dimensionality of 5. For more information about collection settings, see Manage collections and tables.

  3. Loads documents (structured vector data) into the collection with pre-generated vector embeddings. With Astra DB, you can bring your own embeddings or use vectorize to automatically generate embeddings. For more information about loading data and supported data types, see Load your data.

  4. Performs a similarity search to find documents that are close to the specified query vector. This search returns a list of documents sorted by their similarity to the query vector with the most similar documents first. The calculation uses the similarity metric specified in the collection settings. For more information, see Perform a vector search.

Run the script

Run the quickstart script.

  • Python

  • TypeScript

  • Java

python quickstart.py
  • npm

  • Yarn

  • pnpm

npx tsx quickstart.ts
yarn dlx tsx quickstart.ts
pnpx dlx tsx quickstart.ts
  • Maven

  • Gradle

mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
gradle build
gradle run

Next steps

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com