Quickstart

network_check Beginner
query_builder 15 min

Objective

Learn how to create a new database, connect to your database, load a set of vector embeddings, and perform a similarity search to find vectors that are close to the one in your query.

Create a Serverless (Vector) database

  1. Create an Astra account or sign in to an existing Astra account.

  2. In the Astra Portal, select Databases in the main navigation.

  3. Select Create Database.

  4. In the Create Database dialog, select the Serverless (Vector) deployment type.

  5. In Configuration, enter a meaningful Database name.

    You can’t change database names. Make sure the name is human-readable and meaningful. Database names must start and end with an alphanumeric character, and can contain the following special characters: & + - _ ( ) < > . , @.

  6. Select your preferred Provider and Region.

    You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.

  7. Click Create Database.

    You are redirected to your new database’s Overview screen. Your database starts in Pending status before transitioning to Initializing. You’ll receive a notification once your database is initialized.

  8. Ensure the database is in Active status, and then select Generate Token. In the Application Token dialog, click the clipboard icon to copy the token (e.g. AstraCS:WSnyFUhRxsrg…​). Store the token in a secure location before closing the dialog.

    Your token is automatically assigned the Database Administrator role.

  9. Copy your database’s API endpoint, located under Database Details > API Endpoint (e.g. https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com)

  10. Assign your token and API endpoint to environment variables in your terminal.

    • Linux or macOS

    • Windows

    • Google Colab

    export ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
    export ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
    set ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
    set ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
    import os
    os.environ["ASTRA_DB_API_ENDPOINT"] = "API_ENDPOINT" # Your database API endpoint
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "TOKEN" # Your database application token

Install the client

Install the library for the language and package manager you’re using.

  • Python

  • TypeScript

  • Java

To install the Python client with pip:

  1. Verify that pip is version 23.0 or higher.

    pip --version
  2. Upgrade pip if needed.

    python -m pip install --upgrade pip
  3. Install the astrapy package. You must have Python 3.8 or higher.

    pip install astrapy

To install the TypeScript client:

  1. Verify that Node is version 14 or higher.

    node --version
  2. Use npm or Yarn to install the TypeScript client.

    • npm

    • Yarn

    To install the TypeScript client with npm:

    npm install @datastax/astra-db-ts@latest

    To install the TypeScript client with Yarn:

    1. Verify that Yarn is version 2.0 or higher.

      yarn --version
    2. Install the astra-db-ts package.

      yarn add @datastax/astra-db-ts@latest

Use Maven or Gradle to install the Java client.

  • Maven

  • Gradle

To install the Java client with Maven:

  1. Install Java 11+ and Maven 3.9+.

  2. Create a pom.xml file in the root of your project.

    pom.xml
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                                 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.example</groupId>
      <artifactId>test-java-client</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <!-- The Java client -->
      <dependencies>
        <dependency>
          <groupId>com.datastax.astra</groupId>
          <artifactId>astra-db-client</artifactId>
          <version>1.2.3</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
              <executable>java</executable>
              <mainClass>com.example.Quickstart</mainClass>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>

To install the Java client with Gradle:

  1. Install Java 11+ and Gradle.

  2. Create a build.gradle file in the root of your project.

    build.gradle
    plugins {
        id 'java'
        id 'application'
    }
    
    repositories {
        mavenCentral()
    }
    
    dependencies {
        implementation 'com.datastax.astra:astra-db-client:1.2.3'
    }
    
    application {
        mainClassName = 'com.example.Quickstart'
    }

Initialize the client

Connect to your database, create a collection, and then delete the collection.

Typically, you do not delete a collection immediately after creating it. This code is written so that you can run the code repeatedly while trying different things.

  • Python

  • TypeScript

  • Java

quickstart.py
import os

from astrapy.db import AstraDB

# Initialize the client. The namespace parameter is optional if you use
# "default_keyspace".
db = AstraDB(
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
    namespace="default_keyspace",
)
print(db)

# Delete the collection
res = db.delete_collection(collection_name="vector_test")
print(res)

Don’t name the file astrapy.py to avoid a namespace collision.

quickstart.ts
import { AstraDB } from "@datastax/astra-db-ts";

async function main() {

  const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

  // Initialize the client. The keyspace parameter is optional if you use
  // "default_keyspace".
  const db = new AstraDB(
      ASTRA_DB_APPLICATION_TOKEN,
      ASTRA_DB_API_ENDPOINT,
      "default_keyspace"
  );

  // Delete the collection
  const response = await db.dropCollection("vector_test");
  console.log(response);
}

main().catch(console.error);
src/main/java/com/example/Quickstart.java
package com.example;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.SimilarityMetric;
import io.stargate.sdk.data.domain.CollectionDefinition;
import java.util.List;
import java.util.stream.Stream;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client. The keyspace parameter is optional if you use
    // "default_keyspace".
    AstraDB db = new AstraDB(astraToken, astraApiEndpoint, "default_keyspace");
    System.out.println("Connected to AstraDB");

    // Delete the collection
    db.deleteCollection("vector_test");
    System.out.println("Deleted the collection");
  }
}

Run the code

Run the code you defined above.

  • Python

  • TypeScript

  • Java

python quickstart.py
npm
npx tsx quickstart.ts
Yarn
yarn dlx tsx quickstart.ts
Maven
mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
Gradle
gradle build
gradle run

Create a collection

Create a collection in your database. Choose dimensions that match your vector data and pick an appropriate similarity metric: cosine (default), dot_product, or euclidean.

  • Python

  • TypeScript

  • Java

quickstart.py
import os

from astrapy.db import AstraDB

# Initialize the client. The namespace parameter is optional if you use
# "default_keyspace".
db = AstraDB(
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
    namespace="default_keyspace",
)
print(db)

# ⬇️ NEW CODE

# Create a collection. The default similarity metric is "cosine".
collection = db.create_collection("vector_test", dimension=5, metric="cosine")
print(collection)

# ⬆️ NEW CODE

# Delete the collection
res = db.delete_collection(collection_name="vector_test")
print(res)
quickstart.ts
import { AstraDB } from "@datastax/astra-db-ts";

async function main() {

  const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

  // Initialize the client. The keyspace parameter is optional if you use
  // "default_keyspace".
  const db = new AstraDB(
      ASTRA_DB_APPLICATION_TOKEN,
      ASTRA_DB_API_ENDPOINT,
      "default_keyspace"
  );

  // ⬇️ NEW CODE

  // Create a collection. The default similarity metric is "cosine".
  await db.createCollection(
    "vector_test",
    {
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  );
  const collection = await db.collection("vector_test");
  console.log(collection);

  // ⬆️ NEW CODE

  // Delete the collection
  const response = await db.dropCollection("vector_test");
  console.log(response);
}

main().catch(console.error);
src/main/java/com/example/Quickstart.java
package com.example;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.SimilarityMetric;
import io.stargate.sdk.data.domain.CollectionDefinition;
import java.util.List;
import java.util.stream.Stream;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client. The keyspace parameter is optional if you use
    // "default_keyspace".
    AstraDB db = new AstraDB(astraToken, astraApiEndpoint, "default_keyspace");
    System.out.println("Connected to AstraDB");

    // ⬇️ NEW CODE

    // Create a collection. The default similarity metric is cosine.
    CollectionDefinition colDefinition = CollectionDefinition.builder()
        .name("vector_test")
        .vector(5, SimilarityMetric.cosine)
        .build();
    db.createCollection(colDefinition);
    AstraDBCollection collection = db.collection("vector_test");
    System.out.println("Created a collection");

    // ⬆️ NEW CODE

    // Delete the collection
    db.deleteCollection("vector_test");
    System.out.println("Deleted the collection");
  }
}

Load vector embeddings

Insert a few documents with embeddings into the collection.

  • Python

  • TypeScript

  • Java

quickstart.py
import os

from astrapy.db import AstraDB

# Initialize the client. The namespace parameter is optional if you use
# "default_keyspace".
db = AstraDB(
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
    namespace="default_keyspace",
)
print(db)

# Create a collection. The default similarity metric is "cosine".
collection = db.create_collection("vector_test", dimension=5, metric="cosine")
print(collection)

# ⬇️ NEW CODE

# Insert documents into the collection
documents = [
    {
        "_id": "1",
        "text": "ChatGPT integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "_id": "2",
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "_id": "3",
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
res = collection.insert_many(documents)
print(res)

# ⬆️ NEW CODE

# Delete the collection
res = db.delete_collection(collection_name="vector_test")
print(res)
quickstart.ts
import { AstraDB } from "@datastax/astra-db-ts";

async function main() {

  const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

  // Initialize the client. The keyspace parameter is optional if you use
  // "default_keyspace".
  const db = new AstraDB(
      ASTRA_DB_APPLICATION_TOKEN,
      ASTRA_DB_API_ENDPOINT,
      "default_keyspace"
  );

  // Create a collection. The default similarity metric is "cosine".
  await db.createCollection(
    "vector_test",
    {
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  );
  const collection = await db.collection("vector_test");
  console.log(collection);

  // ⬇️ NEW CODE

  // Insert documents into the collection
  const documents = [
      {
          "_id": "1",
          "text": "ChatGPT integrated sneakers that talk to you",
          "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
      },
      {
          "_id": "2",
          "text": "An AI quilt to help you sleep forever",
          "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
      },
      {
          "_id": "3",
          "text": "A deep learning display that controls your mood",
          "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
      }
  ];
  const results = await collection.insertMany(documents);
  console.log(results);

  // ⬆️ NEW CODE

  // Delete the collection
  const response = await db.dropCollection("vector_test");
  console.log(response);
}

main().catch(console.error);
src/main/java/com/example/Quickstart.java
package com.example;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.SimilarityMetric;
import io.stargate.sdk.data.domain.CollectionDefinition;
import java.util.List;
import java.util.stream.Stream;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client. The keyspace parameter is optional if you use
    // "default_keyspace".
    AstraDB db = new AstraDB(astraToken, astraApiEndpoint, "default_keyspace");
    System.out.println("Connected to AstraDB");

    // Create a collection. The default similarity metric is cosine.
    CollectionDefinition colDefinition = CollectionDefinition.builder()
        .name("vector_test")
        .vector(5, SimilarityMetric.cosine)
        .build();
    db.createCollection(colDefinition);
    AstraDBCollection collection = db.collection("vector_test");
    System.out.println("Created a collection");

    // ⬇️ NEW CODE

    // Insert documents into the collection
    collection.insertMany(List.of(
        new JsonDocument()
            .id("1")
            .put("text", "ChatGPT integrated sneakers that talk to you")
            .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
        new JsonDocument()
            .id("2")
            .put("text", "An AI quilt to help you sleep forever")
            .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
        new JsonDocument()
            .id("3")
            .put("text", "A deep learning display that controls your mood")
            .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f})
        ));
    System.out.println("Inserted documents into the collection");

    // ⬆️ NEW CODE

    // Delete the collection
    db.deleteCollection("vector_test");
    System.out.println("Deleted the collection");
  }
}

After you run this code, use the Data Explorer to inspect your loaded data.

Find documents that are close to a specific vector embedding.

  • Python

  • TypeScript

  • Java

quickstart.py
import os

from astrapy.db import AstraDB

# Initialize the client. The namespace parameter is optional if you use
# "default_keyspace".
db = AstraDB(
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
    namespace="default_keyspace",
)
print(db)

# Create a collection. The default similarity metric is "cosine".
collection = db.create_collection("vector_test", dimension=5, metric="cosine")
print(collection)

# Insert documents into the collection
documents = [
    {
        "_id": "1",
        "text": "ChatGPT integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "_id": "2",
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "_id": "3",
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
res = collection.insert_many(documents)
print(res)

# ⬇️ NEW CODE

# Perform a similarity search
query = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.vector_find(query, limit=2, fields={"text", "$vector"})

for document in results:
    print(document)

# ⬆️ NEW CODE

# Delete the collection
res = db.delete_collection(collection_name="vector_test")
print(res)
quickstart.ts
import { AstraDB } from "@datastax/astra-db-ts";

async function main() {

  const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

  // Initialize the client. The keyspace parameter is optional if you use
  // "default_keyspace".
  const db = new AstraDB(
      ASTRA_DB_APPLICATION_TOKEN,
      ASTRA_DB_API_ENDPOINT,
      "default_keyspace"
  );

  // Create a collection. The default similarity metric is "cosine".
  await db.createCollection(
    "vector_test",
    {
      "vector": {
        "dimension": 5,
        "metric": "cosine"
      }
    }
  );
  const collection = await db.collection("vector_test");
  console.log(collection);

  // Insert documents into the collection
  const documents = [
      {
          "_id": "1",
          "text": "ChatGPT integrated sneakers that talk to you",
          "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
      },
      {
          "_id": "2",
          "text": "An AI quilt to help you sleep forever",
          "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
      },
      {
          "_id": "3",
          "text": "A deep learning display that controls your mood",
          "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
      }
  ];
  const results = await collection.insertMany(documents);
  console.log(results);

  // ⬇️ NEW CODE

  // Define the search options
  const options = {
      sort: {
          "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
      },
      limit: 5
  };

  // Perform a similarity search
  const docs = await collection.find({}, options).toArray();
  docs.forEach(doc => console.log(doc));

  // ⬆️ NEW CODE

  // Delete the collection
  const response = await db.dropCollection("vector_test");
  console.log(response);
}

main().catch(console.error);
src/main/java/com/example/Quickstart.java
package com.example;

import com.dtsx.astra.sdk.AstraDB;
import com.dtsx.astra.sdk.AstraDBCollection;
import io.stargate.sdk.data.domain.JsonDocument;
import io.stargate.sdk.data.domain.JsonDocumentResult;
import io.stargate.sdk.data.domain.SimilarityMetric;
import io.stargate.sdk.data.domain.CollectionDefinition;
import java.util.List;
import java.util.stream.Stream;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client. The keyspace parameter is optional if you use
    // "default_keyspace".
    AstraDB db = new AstraDB(astraToken, astraApiEndpoint, "default_keyspace");
    System.out.println("Connected to AstraDB");

    // Create a collection. The default similarity metric is cosine.
    CollectionDefinition colDefinition = CollectionDefinition.builder()
        .name("vector_test")
        .vector(5, SimilarityMetric.cosine)
        .build();
    db.createCollection(colDefinition);
    AstraDBCollection collection = db.collection("vector_test");
    System.out.println("Created a collection");

    // Insert documents into the collection
    collection.insertMany(List.of(
        new JsonDocument()
            .id("1")
            .put("text", "ChatGPT integrated sneakers that talk to you")
            .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
        new JsonDocument()
            .id("2")
            .put("text", "An AI quilt to help you sleep forever")
            .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
        new JsonDocument()
            .id("3")
            .put("text", "A deep learning display that controls your mood")
            .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f})
        ));
    System.out.println("Inserted documents into the collection");

    // ⬇️ NEW CODE

    // Perform a similarity search
    Stream<JsonDocumentResult> resultsSet = collection.findVector(
        new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
        10
    );
    resultsSet.forEach(System.out::println);

    // ⬆️ NEW CODE

    // Delete the collection
    db.deleteCollection("vector_test");
    System.out.println("Deleted the collection");
  }
}

You will get a sorted list of the documents you inserted. The database sorts documents by their similarity to the query vector, most similar documents first. The calculation uses cosine similarity by default.

Next steps

Learn more with one of the following tutorials:

  • Load your data

    Learn how to load your own data. Use the Astra Portal or any of our clients.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com