Quickstart

network_check Beginner

query_builder 15 min

Learn how to create a Serverless (Vector) database, connect to your database, load a few documents with vector embeddings, and perform a similarity search to find documents that are close to the query vector.

Create a Serverless (Vector) database

Sign in or create an Astra DB account.
In the Astra Portal navigation menu, click Databases, and then click Create Database.
Select the Serverless (Vector) deployment type.
Enter a meaningful, human-readable Database name.

After you create a database, you can’t change its name.

Database names are permanent. They must start and end with a letter or number, and they can contain no more than 50 characters, including letters, numbers, and the special characters & + - _ ( ) < > . , @.
Select a Provider and Region to host your database.

On the Free plan, you can access a limited set of supported regions. To access lock Locked regions, you must upgrade your subscription plan.

To minimize latency in production databases, select a region that is close to your application’s users.
Click Create Database.

New databases start in Pending status, and then move to Initializing. Your database is ready to use when it reaches Active status.
Make sure the database is in Active status, and then, in the Database Details section, click Generate Token.
In the Application Token dialog, click content_paste Copy, and then store the token securely. The token format is AstraCS: followed by a unique token string.

Application tokens created from Database Details have the Database Administrator role for the associated database.
In Database Details, copy your database’s API endpoint. The endpoint format is https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com.

In your terminal, assign your token and API endpoint to environment variables.

Linux or macOS
Windows
Google Colab

export ASTRA_DB_API_ENDPOINT=API_ENDPOINT
export ASTRA_DB_APPLICATION_TOKEN=TOKEN

set ASTRA_DB_API_ENDPOINT=API_ENDPOINT

set ASTRA_DB_APPLICATION_TOKEN=TOKEN

import os
os.environ["ASTRA_DB_API_ENDPOINT"] = "API_ENDPOINT"
os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "TOKEN"

Install a client

You can interact with Astra DB Serverless in the Astra Portal or programmatically. This tutorial uses the Astra DB Python, TypeScript, and Java clients. For more information, see Compare connection methods and Get started with the Data API.

Use a package manager to install the client library for your preferred language.

Python
TypeScript
Java

Install the Python client with pip:

Verify that pip is version 23.0 or later:
```
pip --version
```
If needed, upgrade pip:
```
python -m pip install --upgrade pip
```
Install the astrapy package . You must have Python 3.8 or later.
```
pip install astrapy
```

Install the TypeScript client:

Verify that Node is version 18 or later:
```
node --version
```

Install astra-db-ts Latest release with your preferred package manager:

npm
Yarn
pnpm

npm install @datastax/astra-db-ts

yarn add @datastax/astra-db-ts

pnpm add @datastax/astra-db-ts

Install the Java client with Maven or Gradle.

Maven
Gradle

Install Java 11 or later and Maven 3.9 or later.

Create a pom.xml file in the root of your project, and then replace VERSION with the latest version of astra-db-java .

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.example</groupId>
  <artifactId>test-java-client</artifactId>
  <version>1.0-SNAPSHOT</version>

  <!-- The Java client -->
  <dependencies>
    <dependency>
      <groupId>com.datastax.astra</groupId>
      <artifactId>astra-db-java</artifactId>
      <version>VERSION</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>3.0.0</version>
        <configuration>
          <executable>java</executable>
          <mainClass>com.example.Quickstart</mainClass>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>java</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>11</source>
          <target>11</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

Install Gradle and Java 11 or later.

Create a build.gradle file in the root of your project:

build.gradle

plugins {
    id 'java'
    id 'application'
}

repositories {
    mavenCentral()
}

dependencies {
    implementation 'com.datastax.astra:astra-db-java:1.+'
}

application {
    mainClassName = 'com.example.Quickstart'
}

Create a script

Copy the following quickstart script to a Python, TypeScript, or Java file.

Python
TypeScript
Java

To avoid a namespace collision, don’t name your Python client script files astrapy.py.

quickstart.py

import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric

# Initialize the client and get a "Database" object
client = DataAPIClient(os.environ["ASTRA_DB_APPLICATION_TOKEN"])
database = client.get_database(os.environ["ASTRA_DB_API_ENDPOINT"])
print(f"* Database: {database.info().name}\n")

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
    check_exists=False, # Optional.
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents with embeddings into the collection.
documents = [
    {
        "text": "Chat bot integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
insertion_result = collection.insert_many(documents)
print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")

# Perform a similarity search.
query_vector = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
    sort={"$vector": query_vector},
    limit=10,
    include_similarity=True,
)
print("Vector search results:")
for document in results:
    print("    ", document)

quickstart.ts

import { DataAPIClient, VectorDoc, UUID } from '@datastax/astra-db-ts';

const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

// Initialize the client and get a "Db" object
const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);

console.log(`* Connected to DB ${db.id}`);

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
    checkExists: false, // Optional
  });
  console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`);

  // Insert documents with embeddings into the collection.
  const documents = [
    {
      idea: 'Chat bot integrated sneakers that talk to you',
      $vector: [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
      idea: 'An AI quilt to help you sleep forever',
      $vector: [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
      idea: 'A deep learning display that controls your mood',
      $vector: [0.1, 0.05, 0.08, 0.3, 0.6],
    },
  ];

  const inserted = await collection.insertMany(documents);
  console.log(`* Inserted ${inserted.insertedCount} items.`);

  // Perform a similarity search
  const cursor = await collection.find({}, {
    sort: { $vector: [0.15, 0.1, 0.1, 0.35, 0.55] },
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:');
  for await (const doc of cursor) {
    console.log('  ', doc.idea, doc.$similarity);
  }
})();

src/main/java/com/example/Quickstart.java

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.SimilarityMetric;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client
    DataAPIClient client = new DataAPIClient(astraToken);
    System.out.println("Connected to AstraDB");

    Database db = client.getDatabase(astraApiEndpoint);
    System.out.println("Connected to Database.");

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

    // Insert documents with embeddings into the collection
    collection.insertMany(
            new Document("1")
                    .append("text", "Chat bot integrated sneakers that talk to you")
                    .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
            new Document("2")
                    .append("text", "An AI quilt to help you sleep forever")
                    .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
            new Document("3")
                    .append("text", "A deep learning display that controls your mood")
                    .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
    System.out.println("Inserted documents into the collection");

    // Perform a similarity search
    FindIterable<Document> resultsSet = collection.find(
            new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
            10
    );
    resultsSet.forEach(System.out::println);

  }
}

Explore the script

The quickstart script does the following:

Initializes the client and connects to your database. For more information, see Get started with the Data API.
Creates a collection named vector_test that uses the default similarity metric cosine and dimensionality of 5. For more information about collection settings, see Manage collections and tables.
Loads documents (structured vector data) into the collection with pre-generated vector embeddings. With Astra DB, you can bring your own embeddings or use vectorize to automatically generate embeddings. For more information about loading data and supported data types, see Load your data.
Performs a similarity search to find documents that are close to the specified query vector. This search returns a list of documents sorted by their similarity to the query vector with the most similar documents first. The calculation uses the similarity metric specified in the collection settings. For more information, see Perform a vector search.

Run the script

Run the quickstart script.

Python
TypeScript
Java

python quickstart.py

npm
Yarn
pnpm

npx tsx quickstart.ts

yarn dlx tsx quickstart.ts

pnpx dlx tsx quickstart.ts

Maven
Gradle

mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"

gradle build
gradle run

Next steps

In the Astra Portal, use the Data Explorer to view your new collection and documents.
Modify and rerun the script to load more documents.

The quickstart script automatically assigns document IDs; therefore, if you rerun the quickstart script, you will load the same documents with different IDs. To avoid returning trivial search results with literal exact matches, modify the script to load different documents, delete the collection, or delete documents from the collection.
Modify the vector search portion of the script. For more information, see Perform a vector search and Find documents using filtering options.

For example, if you want the script to return specific information about the queried documents, you can add the projection option. This is useful if you use vectorize and you want the output to include the $vectorize text or the generated embedding.
Explore the Data API reference to learn more about interacting with your Serverless (Vector) database programmatically.
Learn how use Astra DB as the vector store for your AI applications through integrations guides, code examples, and tutorials.
If you want to clean up the resources you created in this tutorial, see Terminate a database.