Quickstart

network_check Beginner
query_builder 15 min

The DataStax Astra DB Serverless (Vector) documentation site is currently in Public Preview and is provided on an “AS IS” basis, without warranty or indemnity of any kind. For more, see the DataStax Preview Terms.

Objective

Learn how to create a new database, connect to your database, load a set of vector embeddings, and perform a similarity search to find vectors that are close to the one in your query.

Tutorial overview
Tutorial overview

Create a vector database

  1. Create an Astra account or sign in to an existing Astra account.

  2. In the Astra Portal, select Databases in the main navigation.

  3. Click Create Database.

  4. In the Create Database dialog, select the Serverless (Vector) deployment type.

  5. In the Configuration section, enter a name for the new database in the Database name field.

    Since database names can’t be changed later, it’s best to name your database something meaningful. Database names must start and end with an alphanumeric character, and may contain only the following special characters: & + - _ ( ) < > . , @.

  6. Select your preferred Provider and Region.

    You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.

    Not all regions may be available. If you don’t see your preferred region listed, please submit a support ticket or send us a message using our live chat in the bottom right of the Astra Portal.

  7. Click Create Database.

    You are redirected to your new database’s Overview screen. Your database starts in Pending status before transitioning to Initializing. You’ll receive a notification once your database is initialized.

  8. Generate an application token.

    Once your database is initialized, go to your database’s Overview screen. Under Database Details > Application Tokens, click Generate Token. In the Application Token dialog, click the clipboard icon to copy the token (e.g. AstraCS:WSnyFUhRxsrg…​). Make sure to store the token in a secure location before closing the dialog.

    Your token is automatically assigned the Database Administrator role.

  9. Copy your database’s API endpoint, located under Database Details > API Endpoint (e.g. https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com)

  10. Assign your token and API endpoint to environment variables in your terminal.

    • Linux or macOS

    • Windows

    • Google Colab

    export ASTRA_DB_API_ENDPOINT="<Astra DB API endpoint>" (1)
    export ASTRA_DB_APPLICATION_TOKEN="<AstraCS:...>" (2)
    1 Replace <Astra DB API endpoint> with your API endpoint.
    2 Replace <AstraCS:…​> with your Application token.
    set ASTRA_DB_API_ENDPOINT=<Astra DB API endpoint> (1)
    set ASTRA_DB_APPLICATION_TOKEN=<AstraCS:...> (2)
    1 Replace <Astra DB API endpoint> with your API endpoint.
    2 Replace <AstraCS:…​> with your Application token.
    import os
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "<Astra DB API endpoint>" (1)
    os.environ["ASTRA_DB_API_ENDPOINT"] = "<AstraCS:...>" (2)
    1 Replace <Astra DB API endpoint> with your API endpoint.
    2 Replace <AstraCS:…​> with your Application token.

Set up the client

Install the library for the language and package manager you’re using.

  • Python

  • TypeScript

  • Java

  1. Verify that pip is version 23.0 or higher.

    pip --version
  2. Upgrade pip if needed.

    python -m pip install --upgrade pip
  3. Install the AstraPy package. You must have Python 3.7 or higher.

    pip install astrapy
  1. Verify that Node is version 14 or higher.

    node --version
  2. Install the tsx package globally

    npm install -g tsx
  3. Install the astra-db-ts package

    npm install @datastax/astra-db-ts@latest

Requires Java 11+ and Maven 3.9+ (or Gradle).

pom.xml (Maven)
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.example</groupId>
  <artifactId>test-java-client</artifactId>
  <version>1.0-SNAPSHOT</version>

  <!-- The Java client -->
  <dependencies>
    <dependency>
      <groupId>com.datastax.astra</groupId>
      <artifactId>astra-db-client</artifactId>
      <version>1.0</version>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>3.0.0</version>
        <configuration>
          <executable>java</executable>
          <mainClass>com.example.Quickstart</mainClass>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>java</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>11</source>
          <target>11</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>
build.gradle (Gradle)
plugins {
    id 'java'
    id 'application'
}

repositories {
    mavenCentral()
}

dependencies {
    implementation 'com.datastax.astra:astra-db-client:1.0'
}

application {
    mainClassName = 'com.example.Quickstart'
}

Paste the following code into a new file on your computer.

  • Python

  • TypeScript

  • Java

quickstart.py
import os

from astrapy.db import AstraDB

# Initialization
db = AstraDB(
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)

# Create collection
col = db.create_collection("vector_test", dimension=5, metric="cosine")

Don’t name the file astrapy.py to avoid a namespace collision.

quickstart.ts
import { AstraDB } from "@datastax/astra-db-ts";

async function main() {

  const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

  const db = new AstraDB(ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT);

  // Create a collection
  await db.createCollection(
    "vector_test",
    {
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  );
  const col = await db.collection("vector_test");

  // ...
src/main/java/com/example/Quickstart.java
package com.example;

import com.dtsx.astra.sdk.AstraDB;
import io.stargate.sdk.json.CollectionClient;
import io.stargate.sdk.json.domain.JsonDocument;
import io.stargate.sdk.json.domain.JsonResult;
import io.stargate.sdk.json.domain.SimilarityMetric;
import io.stargate.sdk.json.domain.CollectionDefinition;

import java.util.List;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialization
    AstraDB db = new AstraDB(astraToken, astraApiEndpoint);

    // Create a collection
    CollectionDefinition colDefinition = CollectionDefinition.builder()
        .name("vector_test")
        .vector(5, SimilarityMetric.cosine)
        .build();
    db.createCollection(colDefinition);
    CollectionClient col = db.collection("vector_test");

    // ...

Load vector embeddings

Insert a few documents with embeddings into the vector database.

  • Python

  • TypeScript

  • Java

quickstart.py
documents = [
    {
        "_id": "1",
        "text": "ChatGPT integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "_id": "2",
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "_id": "3",
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
res = col.insert_many(documents)
quickstart.ts
  // ...

  const documents = [
      {
          "_id": "1",
          "text": "ChatGPT integrated sneakers that talk to you",
          "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
      },
      {
          "_id": "2",
          "text": "An AI quilt to help you sleep forever",
          "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
      },
      {
          "_id": "3",
          "text": "A deep learning display that controls your mood",
          "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
      }
  ];
  const results = await col.insertMany(documents);

  // ...
src/main/java/com/example/Quickstart.java
      // ...

      col.insertMany(List.of(
          new JsonDocument()
              .id("1")
              .put("text", "ChatGPT integrated sneakers that talk to you")
              .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
          new JsonDocument()
              .id("2")
              .put("text", "An AI quilt to help you sleep forever")
              .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
          new JsonDocument()
              .id("3")
              .put("text", "A deep learning display that controls your mood")
              .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f})
      ));

      // ...

Use the Data Explorer to inspect your loaded data.

Find documents that are close to a specific vector embedding.

  • Python

  • TypeScript

  • Java

quickstart.py
query = [0.15, 0.1, 0.1, 0.35, 0.55]
results = col.vector_find(query, limit=2, fields={"text", "$vector"})

for document in results:
    print(document)
quickstart.ts
  // ...

  interface Document {
    _id: string;
    text: string;
    $vector: number[];
  }

  const options = {
      sort: {
          "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
      },
      limit: 5
  };

  const document_list = await col.find({}, options).toArray();
  document_list.forEach((doc: Document) => console.log(doc));
}

main().catch(console.error);
src/main/java/com/example/Quickstart.java
      // ...

      List<JsonResult> resultsSet = col.similaritySearch(
          new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
          10
      );
      resultsSet.stream().forEach(System.out::println);
    }
}

Run the code

Run the code you defined above.

  • Python

  • TypeScript

  • Java

python quickstart.py
tsx quickstart.ts
Maven
mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
Gradle
gradle build
gradle run

You will get a sorted list of the documents you inserted. The database sorts documents by their similarity to the query vector, most similar documents first. The calculation uses cosine similarity by default.

Next steps

Learn more with one of the following tutorials:

  • Load your data

    Learn how to load your own data. Use the Astra Portal or any of our clients.

Support

Was This Helpful?

Give Feedback

How can we improve the documentation?