Vector database quickstart with Data API

network_check Beginner
query_builder 15 min

Objective

Learn how to create a DSE namespace, connect to your namespace, load a set of vector embeddings using vectorize, and perform a similarity search to find vectors that are close to the one in your query.

Install DSE

Go install DSE if you haven’t already. You’ll also want to install the Data API. For exploration, use the Docker installation with Data API.

Install a terminal to run your client

The clients can be tested by running them in a terminal. You’ll want Xterm, Terminal, or another terminal emulator.

Identify your credentials

  1. You need to identify the credentials, or token, for your database. For initial exploration, you can use the default superuser credentials set in the database. The default superuser credentials are: cassandra is the username and cassandra is the password. These values will be used with a token provider in the client to generate a TOKEN used for authentication to run the client.

    1. Before going much further, you should create a new user with a secure password. This new user, once created, will be used to generate a token for authentication.

  2. You need the API endpoint that your namespaces will connect to. The API endpoint is the URL of the database you installed. The port number is 8181 by default. For example, if you installed the database using a Docker container, the API endpoint is http://localhost:8181. This value will be your DB_API_ENDPOINT.

  3. You may either assign your username/password and API endpoint to environment variables in your terminal, or modify the client code to include them directly, as shown in the examples below. Another value that you will want to set is the OPENAI_API_KEY. This is the API key that you received when you signed up for the OpenAI API. This key is used to authenticate your requests to the OpenAI API, and the clients use it to vectorize the text that you provide.

    • Linux or macOS

    • Windows

    • Google Colab

    export DB_API_ENDPOINT=DB_API_ENDPOINT # Your database API endpoint
    export OPENAI_API_KEY=API_KEY # Your OpenAI API key
    set DB_DB_API_ENDPOINT=DB_API_ENDPOINT # Your database API endpoint
    set OPENAI_API_KEY=API_KEY # Your OpenAI API key
    import os
    os.environ["DB_API_ENDPOINT"] = "DB_API_ENDPOINT" # Your database API endpoint
    os.environ["OPENAI_API_KEY"] = "API_KEY" # Your OpenAI API key

Install the client

Install the library for the language and package manager you’re using.

  • Python

  • TypeScript

  • Java

To install the Python client with pip:

  1. Verify that pip is version 23.0 or higher.

    pip --version
  2. Upgrade pip if needed.

    python -m pip install --upgrade pip
  3. Install the astrapy package. You must have Python 3.8 or higher.

    pip install astrapy

To install the TypeScript client:

  1. Verify that Node is version 14 or higher.

    node --version
  2. Use npm or Yarn to install the TypeScript client.

    • npm

    • Yarn

    To install the TypeScript client with npm:

    npm install @datastax/astra-db-ts

    To install the TypeScript client with Yarn:

    1. Verify that Yarn is version 2.0 or higher.

      yarn --version
    2. Install the astra-db-ts package.

      yarn add @datastax/astra-db-ts

Use Maven or Gradle to install the Java client.

  • Maven

  • Gradle

To install the Java client with Maven:

  1. Install Java 11+ and Maven 3.9+.

  2. Create a pom.xml file in the root of your project.

    pom.xml
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                                 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.example</groupId>
      <artifactId>test-java-client</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <!-- The Java client -->
      <dependencies>
        <dependency>
          <groupId>com.datastax.astra</groupId>
          <artifactId>astra-db-java</artifactId>
          <version>1.0.0</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
              <executable>java</executable>
              <mainClass>com.example.Quickstart</mainClass>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>

To install the Java client with Gradle:

  1. Install Java 11+ and Gradle.

  2. Create a build.gradle file in the root of your project.

    build.gradle
    plugins {
        id 'java'
        id 'application'
    }
    
    repositories {
        mavenCentral()
    }
    
    dependencies {
        implementation 'com.datastax.astra:astra-db-java:1.0.0'
    }
    
    application {
        mainClassName = 'com.example.Quickstart'
    }

Initialize the client

Paste the following code into a new file on your computer. If you created the environment variables, you don’t need to include the variables in the code.

  • Python

  • TypeScript

  • Java

QuickStartDSE69.py
import os

from astrapy import DataAPIClient
from astrapy.constants import Environment
from astrapy.authentication import UsernamePasswordTokenProvider
from astrapy.constants import VectorMetric
from astrapy.ids import UUID
from astrapy.exceptions import InsertManyException
from astrapy.info import CollectionVectorServiceOptions

# Database settings
DB_USERNAME = "cassandra"
DB_PASSWORD = "cassandra"
DB_API_ENDPOINT = "http://localhost:8181"
DB_NAMESPACE = "my_namespace"
DB_COLLECTION = "vector_test"

# Database settings if you exported them as environment variables
# DB_USERNAME = os.environ.get("DB_USERNAME")
# DB_PASSWORD = os.environ.get("DB_PASSWORD")
# DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT")

# Embedding provider settings
EMBEDDING_PROVIDER = "openai";
EMBEDDING_MODEL_NAME = "text-embedding-3-small";
EMBEDDING_DIMENSIONS = 1024
EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY");

# Build a token
tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD)

# Initialize the client and get a "Database" object
client = DataAPIClient(environment=Environment.DSE)
database = client.get_database(DB_API_ENDPOINT, token=tp)
database.get_database_admin().create_namespace(DB_NAMESPACE, update_db_namespace=True)

Don’t name the file astrapy.py to avoid a namespace collision.

curl
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts';

// Database settings
const DB_USERNAME = "cassandra";
const DB_PASSWORD = "cassandra";
const DB_API_ENDPOINT = "http://localhost:8181";
const DB_ENVIRONMENT = "dse";
const DB_NAMESPACE = "cycling";

// Database settings if you exported them as environment variables
// const DB_USERNAME = process.env.DB_USERNAME;
// const DB_PASSWORD = process.env.DB_PASSWORD;
// const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT;

// OpenAI settings
const OPEN_AI_PROVIDER = "openai";
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const MODEL_NAME = "text-embedding-3-small";

// Build a token in the required format
const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD);

// Initialize the client and get a "Db" object
const client = new DataAPIClient({ environment: DB_ENVIRONMENT });
const db = client.db(DB_API_ENDPOINT, { token: tp });
const dbAdmin = db.admin({ environment: DB_ENVIRONMENT });
curl
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.admin.DataAPIDatabaseAdmin;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.CommandOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.NamespaceOptions;
import com.datastax.astra.client.model.SimilarityMetric;
import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider;

import java.util.Optional;

import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL;
import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD;
import static com.datastax.astra.client.DataAPIOptions.builder;
import static com.datastax.astra.client.model.Filters.eq;

public class QuickStartDSE69 {

    public static void main(String[] args) {

        // Database Settings
        String cassandraUserName     = "cassandra";
        String cassandraPassword     = "cassandra";
        String dataApiUrl            = DEFAULT_ENDPOINT_LOCAL;  // http://localhost:8181
        String databaseEnvironment   = "DSE" // DSE, HCD, or ASTRA
        String keyspaceName          = "ks1";
        String collectionName        = "lyrics";

        // Database settings if you export them as environment variables
        // String cassandraUserName            = System.getenv("DB_USERNAME");
        // String cassandraPassword            = System.getenv("DB_PASSWORD");
        // String dataApiUrl                   = System.getenv("DB_API_ENDPOINT");
        
        // OpenAI Embeddings
        String openAiProvider        = "openai";
        String openAiKey             = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY
        String openAiModel           = "text-embedding-3-small";
        int openAiEmbeddingDimension = 1536;

        // Build a token in the form of Cassandra:base64(username):base64(password)
        String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString();
        System.out.println("1/7 - Creating Token: " + token);

        // Initialize the client
        DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build());
        System.out.println("2/7 - Connected to Data API");

    }
}

Create a namespace

The clients all support creating a namespace.

  • Python

  • TypeScript

  • Java

QuickStartDSE69.py
database.get_database_admin().create_namespace(DB_NAMESPACE)

Don’t name the file astrapy.py to avoid a namespace collision.

QuickStartDSE69.ts
(async () => {
  await dbAdmin.createNamespace(DB_NAMESPACE);
  console.log(await dbAdmin.listNamespaces());
})();
src/main/java/QuickStartDSE69.java
        // Create a default keyspace
        ((DataAPIDatabaseAdmin) client
                .getDatabase(dataApiUrl)
                .getDatabaseAdmin()).createNamespace(keyspaceName, NamespaceOptions.simpleStrategy(1));
        System.out.println("3/7 - Keyspace '" + keyspaceName + "'created ");

        Database db = client.getDatabase(dataApiUrl, keyspaceName);
        System.out.println("4/7 - Connected to Database");

Create a collection

Create a collection in your namespace. Choose dimensions that match your vector data and pick an appropriate similarity metric: cosine (default), dot_product, or euclidean. The embeddings will be generated using the vectorize method, so the collection needs the parameters for using an embedding service.

  • Python

  • TypeScript

  • Java

QuickStartDSE69.py
import os

from astrapy import DataAPIClient
from astrapy.constants import Environment
from astrapy.authentication import UsernamePasswordTokenProvider
from astrapy.constants import VectorMetric
from astrapy.ids import UUID
from astrapy.exceptions import InsertManyException
from astrapy.info import CollectionVectorServiceOptions

# Database settings
DB_USERNAME = "cassandra"
DB_PASSWORD = "cassandra"
DB_API_ENDPOINT = "http://localhost:8181"
DB_NAMESPACE = "my_namespace"
DB_COLLECTION = "vector_test"

# Database settings if you exported them as environment variables
# DB_USERNAME = os.environ.get("DB_USERNAME")
# DB_PASSWORD = os.environ.get("DB_PASSWORD")
# DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT")

# Embedding provider settings
EMBEDDING_PROVIDER = "openai";
EMBEDDING_MODEL_NAME = "text-embedding-3-small";
EMBEDDING_DIMENSIONS = 1024
EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY");

# Build a token
tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD)

# Initialize the client and get a "Database" object
client = DataAPIClient(environment=Environment.DSE)
database = client.get_database(DB_API_ENDPOINT, token=tp)
database.get_database_admin().create_namespace(DB_NAMESPACE, update_db_namespace=True)

# Create a collection. The default similarity metric is cosine. If you're not
# sure what dimension to set, use whatever dimension vector your embeddings
# model produces.
collection = database.create_collection(
    DB_COLLECTION,
    dimension=EMBEDDING_DIMENSIONS,
    metric=VectorMetric.COSINE,
    service={
        "provider": EMBEDDING_PROVIDER,
        "modelName": EMBEDDING_MODEL_NAME,
    },
    embedding_api_key=EMBEDDING_API_KEY,
    namespace=DB_NAMESPACE,
    check_exists=False,
)
print(f"* Collection: {collection.full_name}\n")
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts';

// Database settings
const DB_USERNAME = "cassandra";
const DB_PASSWORD = "cassandra";
const DB_API_ENDPOINT = "http://localhost:8181";
const DB_ENVIRONMENT = "dse";
const DB_NAMESPACE = "cycling";

// Database settings if you exported them as environment variables
// const DB_USERNAME = process.env.DB_USERNAME;
// const DB_PASSWORD = process.env.DB_PASSWORD;
// const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT;

// OpenAI settings
const OPEN_AI_PROVIDER = "openai";
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const MODEL_NAME = "text-embedding-3-small";

// Build a token in the required format
const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD);

// Initialize the client and get a "Db" object
const client = new DataAPIClient({ environment: DB_ENVIRONMENT });
const db = client.db(DB_API_ENDPOINT, { token: tp });
const dbAdmin = db.admin({ environment: DB_ENVIRONMENT });

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a typed, vector-enabled collection. The default metric is cosine.
  // If you're not sure what dimension to set, use whatever dimension vector
  // your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    namespace: DB_NAMESPACE,
    vector: {
      service: {
        provider: OPEN_AI_PROVIDER,
        modelName: MODEL_NAME
      },
      dimension: 5,
      metric: 'cosine',
    },
    embeddingApiKey: OPENAI_API_KEY,
    checkExists: false
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

})();
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.admin.DataAPIDatabaseAdmin;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.CommandOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.NamespaceOptions;
import com.datastax.astra.client.model.SimilarityMetric;
import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider;

import java.util.Optional;

import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL;
import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD;
import static com.datastax.astra.client.DataAPIOptions.builder;
import static com.datastax.astra.client.model.Filters.eq;

public class QuickStartDSE69 {

    public static void main(String[] args) {

        // Database Settings
        String cassandraUserName     = "cassandra";
        String cassandraPassword     = "cassandra";
        String dataApiUrl            = DEFAULT_ENDPOINT_LOCAL;  // http://localhost:8181
        String databaseEnvironment   = "DSE" // DSE, HCD, or ASTRA
        String keyspaceName          = "ks1";
        String collectionName        = "lyrics";

        // Database settings if you export them as environment variables
        // String cassandraUserName            = System.getenv("DB_USERNAME");
        // String cassandraPassword            = System.getenv("DB_PASSWORD");
        // String dataApiUrl                   = System.getenv("DB_API_ENDPOINT");
        
        // OpenAI Embeddings
        String openAiProvider        = "openai";
        String openAiKey             = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY
        String openAiModel           = "text-embedding-3-small";
        int openAiEmbeddingDimension = 1536;

        // Build a token in the form of Cassandra:base64(username):base64(password)
        String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString();
        System.out.println("1/7 - Creating Token: " + token);

        // Initialize the client
        DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build());
        System.out.println("2/7 - Connected to Data API");

        // Create a collection 
        Collection<Document> collectionLyrics =  db.createCollection(collectionName, CollectionOptions.builder()
        .vectorDimension(5)
        .vectorSimilarity(SimilarityMetric.COSINE)
        .build(),
        System.out.println("5/7 - Collection created");

    }
}

Load vector embeddings

Insert a few documents into the collection. Two methods are available for inserting data: $vectorize and $vector. The $vectorize method generate embeddings using a specified embedding service. The $vector method is used when you already have embeddings.

Use the $vectorize method

  • Python

  • TypeScript

  • Java

QuickStartDSE69.py
import os

from astrapy import DataAPIClient
from astrapy.constants import Environment
from astrapy.authentication import UsernamePasswordTokenProvider
from astrapy.constants import VectorMetric
from astrapy.ids import UUID
from astrapy.exceptions import InsertManyException
from astrapy.info import CollectionVectorServiceOptions

# Database settings
DB_USERNAME = "cassandra"
DB_PASSWORD = "cassandra"
DB_API_ENDPOINT = "http://localhost:8181"
DB_NAMESPACE = "my_namespace"
DB_COLLECTION = "vector_test"

# Database settings if you exported them as environment variables
# DB_USERNAME = os.environ.get("DB_USERNAME")
# DB_PASSWORD = os.environ.get("DB_PASSWORD")
# DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT")

# Embedding provider settings
EMBEDDING_PROVIDER = "openai";
EMBEDDING_MODEL_NAME = "text-embedding-3-small";
EMBEDDING_DIMENSIONS = 1024
EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY");

# Build a token
tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD)

# Initialize the client and get a "Database" object
client = DataAPIClient(environment=Environment.DSE)
database = client.get_database(DB_API_ENDPOINT, token=tp)
database.get_database_admin().create_namespace(DB_NAMESPACE, update_db_namespace=True)

# Create a collection. The default similarity metric is cosine. If you're not
# sure what dimension to set, use whatever dimension vector your embeddings
# model produces.
collection = database.create_collection(
    DB_COLLECTION,
    dimension=EMBEDDING_DIMENSIONS,
    metric=VectorMetric.COSINE,
    service={
        "provider": EMBEDDING_PROVIDER,
        "modelName": EMBEDDING_MODEL_NAME,
    },
    embedding_api_key=EMBEDDING_API_KEY,
    namespace=DB_NAMESPACE,
    check_exists=False,
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents into the collection.
# (UUIDs here are version 7.)
documents = [
    {
        "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"),
        "text": "Chat bot integrated sneakers that talk to you",
         "$vectorize": "Wild! How can they do that?"
    },
    {
        "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"),
        "text": "An AI quilt to help you sleep forever",
         "$vectorize": "Sleep like a baby soft and cuddly"
    },
    {
        "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"),
        "text": "A deep learning display that controls your mood",
         "$vectorize": "I do not want my mood controlled!"
    },
]
try:
    insertion_result = collection.insert_many(documents)
    print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")
except InsertManyException:
    print("* Documents found on DB already. Let's move on.\n")
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts';

// Database settings
const DB_USERNAME = "cassandra";
const DB_PASSWORD = "cassandra";
const DB_API_ENDPOINT = "http://localhost:8181";
const DB_ENVIRONMENT = "dse";
const DB_NAMESPACE = "cycling";

// Database settings if you exported them as environment variables
// const DB_USERNAME = process.env.DB_USERNAME;
// const DB_PASSWORD = process.env.DB_PASSWORD;
// const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT;

// OpenAI settings
const OPEN_AI_PROVIDER = "openai";
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const MODEL_NAME = "text-embedding-3-small";

// Build a token in the required format
const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD);

// Initialize the client and get a "Db" object
const client = new DataAPIClient({ environment: DB_ENVIRONMENT });
const db = client.db(DB_API_ENDPOINT, { token: tp });
const dbAdmin = db.admin({ environment: DB_ENVIRONMENT });

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a typed, vector-enabled collection. The default metric is cosine.
  // If you're not sure what dimension to set, use whatever dimension vector
  // your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    namespace: DB_NAMESPACE,
    vector: {
      service: {
        provider: OPEN_AI_PROVIDER,
        modelName: MODEL_NAME
      },
      dimension: 5,
      metric: 'cosine',
    },
    embeddingApiKey: OPENAI_API_KEY,
    checkExists: false
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

  // Insert documents into the collection (using UUIDv7s)
  const documents = [
    {
      _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'),
      text: 'ChatGPT integrated sneakers that talk to you',
      $vectorize: 'Wild! How can they do that?',
    },
    {
      _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'),
      text: 'An AI quilt to help you sleep forever',
      $vectorize: 'Sleep like a baby soft and cuddly',
    },
    {
      _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'),
      text: 'A deep learning display that controls your mood',
      $vectorize: 'I do not want my mood controlled!',
    },
  ];

  try {
    const inserted = await collection.insertMany(documents);
    console.log(`* Inserted ${inserted.insertedCount} items.`);
  } catch (e) {
    console.log('* Documents found on DB already. Let\'s move on!');
  }

})();
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.admin.DataAPIDatabaseAdmin;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.CommandOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.NamespaceOptions;
import com.datastax.astra.client.model.SimilarityMetric;
import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider;

import java.util.Optional;

import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL;
import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD;
import static com.datastax.astra.client.DataAPIOptions.builder;
import static com.datastax.astra.client.model.Filters.eq;

public class QuickStartDSE69 {

    public static void main(String[] args) {

        // Database Settings
        String cassandraUserName     = "cassandra";
        String cassandraPassword     = "cassandra";
        String dataApiUrl            = DEFAULT_ENDPOINT_LOCAL;  // http://localhost:8181
        String databaseEnvironment   = "DSE" // DSE, HCD, or ASTRA
        String keyspaceName          = "ks1";
        String collectionName        = "lyrics";

        // Database settings if you export them as environment variables
        // String cassandraUserName            = System.getenv("DB_USERNAME");
        // String cassandraPassword            = System.getenv("DB_PASSWORD");
        // String dataApiUrl                   = System.getenv("DB_API_ENDPOINT");
        
        // OpenAI Embeddings
        String openAiProvider        = "openai";
        String openAiKey             = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY
        String openAiModel           = "text-embedding-3-small";
        int openAiEmbeddingDimension = 1536;

        // Build a token in the form of Cassandra:base64(username):base64(password)
        String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString();
        System.out.println("1/7 - Creating Token: " + token);

        // Initialize the client
        DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build());
        System.out.println("2/7 - Connected to Data API");

        // Create a collection 
        Collection<Document> collectionLyrics =  db.createCollection(collectionName, CollectionOptions.builder()
        .vectorDimension(5)
        .vectorSimilarity(SimilarityMetric.COSINE)
        .build(),
        System.out.println("5/7 - Collection created");

    // Insert some documents
    collectionLyrics.insertMany(
        new Document(1).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("A lovestruck Romeo sings the streets a serenade"),
        new Document(2).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("Says something like, You and me babe, how about it?"),
        new Document(4).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("Juliet says,Hey, it's Romeo, you nearly gimme a heart attack"),
        new Document(5).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("He's underneath the window"),
        new Document(6).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("She's singing, Hey la, my boyfriend's back"),
        new Document(7).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("You shouldn't come around here singing up at people like that"),
        new Document(8).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("Anyway, what you gonna do about it?"));
    System.out.println("6/7 - Collection populated");

    }
}

Use the $vector method instead of $vectorize

The $vector method can be used if you already have embeddings.

  • Python

  • TypeScript

  • Java

QuickStartDSE69.py
import os

from astrapy import DataAPIClient
from astrapy.constants import Environment
from astrapy.authentication import UsernamePasswordTokenProvider
from astrapy.constants import VectorMetric
from astrapy.ids import UUID
from astrapy.exceptions import InsertManyException
from astrapy.info import CollectionVectorServiceOptions

# Database settings
DB_USERNAME = "cassandra"
DB_PASSWORD = "cassandra"
DB_API_ENDPOINT = "http://localhost:8181"
DB_NAMESPACE = "my_namespace"
DB_COLLECTION = "vector_test"

# Database settings if you exported them as environment variables
# DB_USERNAME = os.environ.get("DB_USERNAME")
# DB_PASSWORD = os.environ.get("DB_PASSWORD")
# DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT")

# Embedding provider settings
EMBEDDING_PROVIDER = "openai";
EMBEDDING_MODEL_NAME = "text-embedding-3-small";
EMBEDDING_DIMENSIONS = 1024
EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY");

# Build a token
tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD)

# Initialize the client and get a "Database" object
client = DataAPIClient(environment=Environment.DSE)
database = client.get_database(DB_API_ENDPOINT, token=tp)
database.get_database_admin().create_namespace(DB_NAMESPACE, update_db_namespace=True)

# Create a collection. The default similarity metric is cosine. If you're not
# sure what dimension to set, use whatever dimension vector your embeddings
# model produces.
collection = database.create_collection(
    DB_COLLECTION,
    dimension=EMBEDDING_DIMENSIONS,
    metric=VectorMetric.COSINE,
    service={
        "provider": EMBEDDING_PROVIDER,
        "modelName": EMBEDDING_MODEL_NAME,
    },
    embedding_api_key=EMBEDDING_API_KEY,
    namespace=DB_NAMESPACE,
    check_exists=False,
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents into the collection.
# (UUIDs here are version 7.)
documents = [
    {
        "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"),
        "text": "Chat bot integrated sneakers that talk to you",
         "$vector": [0.25, 0.25, 0.25, 0.25, 0.45],
    },
    {
        "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"),
        "text": "An AI quilt to help you sleep forever",
         "$vector": [0.10, 0.15, 0.25, 0.25, 0.15],
    },
    {
        "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"),
        "text": "A deep learning display that controls your mood",
         "$vector": [0.55, 0.25, 0.25, 0.28, 0.25],
    },
]
try:
    insertion_result = collection.insert_many(documents)
    print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")
except InsertManyException:
    print("* Documents found on DB already. Let's move on.\n")
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts';

// Database settings
const DB_USERNAME = "cassandra";
const DB_PASSWORD = "cassandra";
const DB_API_ENDPOINT = "http://localhost:8181";
const DB_ENVIRONMENT = "dse";
const DB_NAMESPACE = "cycling";

// Database settings if you exported them as environment variables
// const DB_USERNAME = process.env.DB_USERNAME;
// const DB_PASSWORD = process.env.DB_PASSWORD;
// const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT;

// OpenAI settings
const OPEN_AI_PROVIDER = "openai";
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const MODEL_NAME = "text-embedding-3-small";

// Build a token in the required format
const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD);

// Initialize the client and get a "Db" object
const client = new DataAPIClient({ environment: DB_ENVIRONMENT });
const db = client.db(DB_API_ENDPOINT, { token: tp });
const dbAdmin = db.admin({ environment: DB_ENVIRONMENT });

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a typed, vector-enabled collection. The default metric is cosine.
  // If you're not sure what dimension to set, use whatever dimension vector
  // your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    namespace: DB_NAMESPACE,
    vector: {
      service: {
        provider: OPEN_AI_PROVIDER,
        modelName: MODEL_NAME
      },
      dimension: 5,
      metric: 'cosine',
    },
    embeddingApiKey: OPENAI_API_KEY,
    checkExists: false
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

  // Insert documents into the collection (using UUIDv7s)
  const documents = [
    {
      _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'),
      text: 'ChatGPT integrated sneakers that talk to you',
      $vector: [0.25, 0.25, 0.25, 0.25, 0.45],
    },
    {
      _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'),
      text: 'An AI quilt to help you sleep forever',
      $vector: [0.10, 0.15, 0.25, 0.25, 0.15],
    },
    {
      _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'),
      text: 'A deep learning display that controls your mood',
      $vector: 'I do not want my mood controlled!',
    },
  ];

  try {
    const inserted = await collection.insertMany(documents);
    console.log(`* Inserted ${inserted.insertedCount} items.`);
  } catch (e) {
    console.log('* Documents found on DB already. Let\'s move on!');
  }

})();
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.admin.DataAPIDatabaseAdmin;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.CommandOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.NamespaceOptions;
import com.datastax.astra.client.model.SimilarityMetric;
import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider;

import java.util.Optional;

import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL;
import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD;
import static com.datastax.astra.client.DataAPIOptions.builder;
import static com.datastax.astra.client.model.Filters.eq;

public class QuickStartDSE69 {

    public static void main(String[] args) {

        // Database Settings
        String cassandraUserName     = "cassandra";
        String cassandraPassword     = "cassandra";
        String dataApiUrl            = DEFAULT_ENDPOINT_LOCAL;  // http://localhost:8181
        String databaseEnvironment   = "DSE" // DSE, HCD, or ASTRA
        String keyspaceName          = "ks1";
        String collectionName        = "lyrics";

        // Database settings if you export them as environment variables
        // String cassandraUserName            = System.getenv("DB_USERNAME");
        // String cassandraPassword            = System.getenv("DB_PASSWORD");
        // String dataApiUrl                   = System.getenv("DB_API_ENDPOINT");
        
        // OpenAI Embeddings
        String openAiProvider        = "openai";
        String openAiKey             = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY
        String openAiModel           = "text-embedding-3-small";
        int openAiEmbeddingDimension = 1536;

        // Build a token in the form of Cassandra:base64(username):base64(password)
        String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString();
        System.out.println("1/7 - Creating Token: " + token);

        // Initialize the client
        DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build());
        System.out.println("2/7 - Connected to Data API");

        // Create a collection 
        Collection<Document> collectionLyrics =  db.createCollection(collectionName, CollectionOptions.builder()
        .vectorDimension(5)
        .vectorSimilarity(SimilarityMetric.COSINE)
        .build(),
        System.out.println("5/7 - Collection created");

    // Insert some documents
    collection.insertMany(
        new Document("1")
                .append("text", "ChatGPT integrated sneakers that talk to you")
                .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
        new Document("2")
                .append("text", "An AI quilt to help you sleep forever")
                .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
        new Document("3")
                .append("text", "A deep learning display that controls your mood")
                .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
    System.out.println("6/7 - Collection populated");

    }
}

Use Vectorize with other embedding providers

The $vector and $vectorize Data API methods support embedding providers other than OpenAI. For the complete list of supported embeddings providers, see Supported embeddings providers.

The Data API client initialization is different for DataStax Enterprise (DSE), but once you have created the database admin object, you can create a collection with any supported embedding model provider.

You will need to provide an API key for the provider.

Create Mistral embeddings

This example uses the Data API client to create a collection, compute embeddings, and find results. It is the same as the quickstart, but uses the Mistral embeddings provider. You will need a Mistral API key to run this example.

  1. Set the EMBEDDING_API_KEY environment variable to your provider’s API key or add it to a .env file:

    export EMBEDDING_API_KEY=MISTRAL_API_KEY
  2. Replace the settings in Embeddings provider settings with the values for your provider. The Mistral values have been filled in for the example. If you’re unsure of the values for your embedding provider, use the find_embedding_providers method in the example to get a list of available providers and their models.

    • EMBEDDING_PROVIDER: mistral - Name of the embedding provider.

    • EMBEDDING_MODEL_NAME: mistral-embed - Specific model name from the embedding provider.

    • EMBEDDING_DIMENSIONS: 1024 - Number of dimensions for vector embeddings.

    • EMBEDDING_API_KEY: Value retrieved from environment variable EMBEDDING_API_KEY - API key for embedding service authentication.

  3. Use the same database credentials found in Identify your credentials to connect to your database.

  4. Run the example to create the collection, insert documents, and retrieve the most similar document to a query vector.

    Quickstart-with-mistral.py
    import os
    
    from astrapy import DataAPIClient
    from astrapy.constants import Environment
    from astrapy.authentication import UsernamePasswordTokenProvider
    from astrapy.constants import VectorMetric
    from astrapy.ids import UUID
    from astrapy.exceptions import InsertManyException
    from astrapy.info import CollectionVectorServiceOptions
    
    # Database settings
    DB_USERNAME = "cassandra"
    DB_PASSWORD = "cassandra"
    DB_API_ENDPOINT = "http://localhost:8181"
    DB_NAMESPACE = "my_namespace"
    DB_COLLECTION = "vector_test"
    
    # Embeddings provider settings
    EMBEDDING_PROVIDER = "mistral";
    EMBEDDING_MODEL_NAME = "mistral-embed";
    EMBEDDING_DIMENSIONS = 1024
    EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY");
    
    # Build a token
    tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD)
    
    # Initialize the client and get a "Database" object
    client = DataAPIClient(environment=Environment.DSE)
    database = client.get_database(DB_API_ENDPOINT, token=tp)
    database.get_database_admin().create_namespace(DB_NAMESPACE, update_db_namespace=True)
    
    '''
    FIND EMBEDDING PROVIDERS
    '''
    
    ep=database.get_database_admin().find_embedding_providers()
    print(ep)
    
    print(ep.embedding_providers['mistral'])
    print(ep.embedding_providers['mistral'].models[0])
    print(ep.embedding_providers['mistral'].models[0].vector_dimension)
    
    '''
    CREATE COLLECTION
    '''
    
    collection = database.create_collection(
        DB_COLLECTION,
        dimension=EMBEDDING_DIMENSIONS,
        metric=VectorMetric.COSINE,
        service={
            "provider": EMBEDDING_PROVIDER,
            "modelName": EMBEDDING_MODEL_NAME,
        },
        embedding_api_key=EMBEDDING_API_KEY,
        namespace=DB_NAMESPACE,
        check_exists=False,
    )
    print(f"* Collection: {collection.full_name}\n")
    
    '''
    INSERT DOCUMENTS WITH UUID
    '''
    
    documents = [
        {
            "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"),
            "$vectorize": "Chat bot integrated sneakers that talk to you",
        },
        {
            "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"),
            "$vectorize": "An AI quilt to help you sleep forever",
        },
        {
            "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"),
            "$vectorize": "A deep learning display that controls your mood",
        },
    ]
    try:
        insertion_result = collection.insert_many(documents)
        print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")
    except InsertManyException:
        print("* Documents found on DB already. Let's move on.\n")
    
    '''
    FIND ONE
    '''
    
    for doc in collection.find(
        sort={"$vectorize": "I am looking for a sleeping aid."},
        limit=1,
        projection={"$vectorize": True},
    ):
        print(doc)
Result
FindEmbeddingProvidersResult(embedding_providers=azureOpenAI, bedrock, huggingface, huggingfaceDedicated, jinaAI, mistral, nvidia, openai, upstageAI, voyageAI)
EmbeddingProvider(display_name='Mistral AI', models=[EmbeddingProviderModel(name='mistral-embed')])
EmbeddingProviderModel(name='mistral-embed')
1024
* Collection: my_namespace.vector_test

* Documents found on DB already. Let's move on.

{'_id': UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'), '$vectorize': 'An AI quilt to help you sleep forever'}

See the complete list of supported embedding model providers at Supported embeddings providers. Different providers' calls may differ slightly, but you can use the same Data API client to interact with them from DSE.

Find documents that are close to a specific vector embedding. (The code also shows the optional step of dropping the collection at the end.)

  • Python

  • TypeScript

  • Java

QuickStartDSE69.py
import os

from astrapy import DataAPIClient
from astrapy.constants import Environment
from astrapy.authentication import UsernamePasswordTokenProvider
from astrapy.constants import VectorMetric
from astrapy.ids import UUID
from astrapy.exceptions import InsertManyException
from astrapy.info import CollectionVectorServiceOptions

# Database settings
DB_USERNAME = "cassandra"
DB_PASSWORD = "cassandra"
DB_API_ENDPOINT = "http://localhost:8181"
DB_NAMESPACE = "my_namespace"
DB_COLLECTION = "vector_test"

# Database settings if you exported them as environment variables
# DB_USERNAME = os.environ.get("DB_USERNAME")
# DB_PASSWORD = os.environ.get("DB_PASSWORD")
# DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT")

# Embedding provider settings
EMBEDDING_PROVIDER = "openai";
EMBEDDING_MODEL_NAME = "text-embedding-3-small";
EMBEDDING_DIMENSIONS = 1024
EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY");

# Build a token
tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD)

# Initialize the client and get a "Database" object
client = DataAPIClient(environment=Environment.DSE)
database = client.get_database(DB_API_ENDPOINT, token=tp)
database.get_database_admin().create_namespace(DB_NAMESPACE, update_db_namespace=True)

# Create a collection. The default similarity metric is cosine. If you're not
# sure what dimension to set, use whatever dimension vector your embeddings
# model produces.
collection = database.create_collection(
    DB_COLLECTION,
    dimension=EMBEDDING_DIMENSIONS,
    metric=VectorMetric.COSINE,
    service={
        "provider": EMBEDDING_PROVIDER,
        "modelName": EMBEDDING_MODEL_NAME,
    },
    embedding_api_key=EMBEDDING_API_KEY,
    namespace=DB_NAMESPACE,
    check_exists=False,
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents into the collection.
# (UUIDs here are version 7.)
documents = [
    {
        "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"),
        "text": "Chat bot integrated sneakers that talk to you",
         "$vector": [0.25, 0.25, 0.25, 0.25, 0.45],
    },
    {
        "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"),
        "text": "An AI quilt to help you sleep forever",
         "$vector": [0.10, 0.15, 0.25, 0.25, 0.15],
    },
    {
        "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"),
        "text": "A deep learning display that controls your mood",
         "$vector": [0.55, 0.25, 0.25, 0.28, 0.25],
    },
]
try:
    insertion_result = collection.insert_many(documents)
    print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")
except InsertManyException:
    print("* Documents found on DB already. Let's move on.\n")

# Perform a similarity search
query = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
    vector=query,
    limit=10,
)
print("Vector search results:")
for document in results:
    print("    ", document)
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts';

// Database settings
const DB_USERNAME = "cassandra";
const DB_PASSWORD = "cassandra";
const DB_API_ENDPOINT = "http://localhost:8181";
const DB_ENVIRONMENT = "dse";
const DB_NAMESPACE = "cycling";

// Database settings if you exported them as environment variables
// const DB_USERNAME = process.env.DB_USERNAME;
// const DB_PASSWORD = process.env.DB_PASSWORD;
// const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT;

// OpenAI settings
const OPEN_AI_PROVIDER = "openai";
const OPENAI_API_KEY = process.env.OPENAI_API_KEY
const MODEL_NAME = "text-embedding-3-small";

// Build a token in the required format
const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD);

// Initialize the client and get a "Db" object
const client = new DataAPIClient({ environment: DB_ENVIRONMENT });
const db = client.db(DB_API_ENDPOINT, { token: tp });
const dbAdmin = db.admin({ environment: DB_ENVIRONMENT });

// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
  idea: string,
}

(async function () {
  // Create a typed, vector-enabled collection. The default metric is cosine.
  // If you're not sure what dimension to set, use whatever dimension vector
  // your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    namespace: DB_NAMESPACE,
    vector: {
      service: {
        provider: OPEN_AI_PROVIDER,
        modelName: MODEL_NAME
      },
      dimension: 5,
      metric: 'cosine',
    },
    embeddingApiKey: OPENAI_API_KEY,
    checkExists: false
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

  // Insert documents into the collection (using UUIDv7s)
  const documents = [
    {
      _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'),
      text: 'ChatGPT integrated sneakers that talk to you',
      $vector: [0.25, 0.25, 0.25, 0.25, 0.45],
    },
    {
      _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'),
      text: 'An AI quilt to help you sleep forever',
      $vector: [0.10, 0.15, 0.25, 0.25, 0.15],
    },
    {
      _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'),
      text: 'A deep learning display that controls your mood',
      $vector: 'I do not want my mood controlled!',
    },
  ];

  try {
    const inserted = await collection.insertMany(documents);
    console.log(`* Inserted ${inserted.insertedCount} items.`);
  } catch (e) {
    console.log('* Documents found on DB already. Let\'s move on!');
  }

  // Perform a similarity search
  const cursor = await collection.find({}, {
    vector: [0.15, 0.1, 0.1, 0.35, 0.55],
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:')
  for await (const doc of cursor) {
    console.log('  ', doc.text, doc.$similarity);
  }

  // Cleanup (if desired)
  await db.dropCollection('vector_test');
  console.log('* Collection dropped.');

  // Close the client
  await client.close();

})();
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.admin.DataAPIDatabaseAdmin;
import com.datastax.astra.client.model.CollectionOptions;
import com.datastax.astra.client.model.CommandOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.NamespaceOptions;
import com.datastax.astra.client.model.SimilarityMetric;
import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider;

import java.util.Optional;

import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL;
import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD;
import static com.datastax.astra.client.DataAPIOptions.builder;
import static com.datastax.astra.client.model.Filters.eq;

public class QuickStartDSE69 {

    public static void main(String[] args) {

        // Database Settings
        String cassandraUserName     = "cassandra";
        String cassandraPassword     = "cassandra";
        String dataApiUrl            = DEFAULT_ENDPOINT_LOCAL;  // http://localhost:8181
        String databaseEnvironment   = "DSE" // DSE, HCD, or ASTRA
        String keyspaceName          = "ks1";
        String collectionName        = "lyrics";

        // Database settings if you export them as environment variables
        // String cassandraUserName            = System.getenv("DB_USERNAME");
        // String cassandraPassword            = System.getenv("DB_PASSWORD");
        // String dataApiUrl                   = System.getenv("DB_API_ENDPOINT");
        
        // OpenAI Embeddings
        String openAiProvider        = "openai";
        String openAiKey             = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY
        String openAiModel           = "text-embedding-3-small";
        int openAiEmbeddingDimension = 1536;

        // Build a token in the form of Cassandra:base64(username):base64(password)
        String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString();
        System.out.println("1/7 - Creating Token: " + token);

        // Initialize the client
        DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build());
        System.out.println("2/7 - Connected to Data API");

        // Create a collection 
        Collection<Document> collectionLyrics =  db.createCollection(collectionName, CollectionOptions.builder()
        .vectorDimension(5)
        .vectorSimilarity(SimilarityMetric.COSINE)
        .build(),
        System.out.println("5/7 - Collection created");

    // Insert some documents
    collection.insertMany(
        new Document("1")
                .append("text", "ChatGPT integrated sneakers that talk to you")
                .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
        new Document("2")
                .append("text", "An AI quilt to help you sleep forever")
                .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
        new Document("3")
                .append("text", "A deep learning display that controls your mood")
                .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
    System.out.println("6/7 - Collection populated");

FindIterable<Document> resultsSet = collection.find(
    new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
    10
);
resultsSet.forEach(System.out::println);
collection.drop();
System.out.println("Deleted the collection"); 

    }
}

You get a sorted list of the documents you inserted. The database sorts documents by their similarity to the query vector, most similar documents first. The calculation uses cosine similarity by default.

Run the code

Run the code you defined above.

  • Python

  • TypeScript

  • Java

python QuickStartDSE69.py
npm
npx tsx QuickStartDSE69.ts
Yarn
yarn dlx tsx QuickStartDSE69.ts
Maven
mvn clean compile
export OPENAI_API_KEY=<your-api-key>
mvn exec:java -Dexec.mainClass="com.example.QuickStartDSE"
Gradle
gradle build
gradle run

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com