Data API quickstart
Learn how to create a DSE keyspace, connect to your keyspace, load a set of vector embeddings using vectorize, and perform a similarity search to find vectors that are close to the one in your query.
Install DSE
Go install DSE if you haven’t already. You’ll also want to install the Data API. For exploration, use the Docker installation with Data API.
Install a terminal to run your client
The clients can be tested by running them in a terminal. You’ll want Xterm, Terminal, or another terminal emulator.
Identify your credentials
-
You need to identify the credentials, or token, for your database. For initial exploration, you can use the default superuser credentials set in the database. The default superuser credentials are:
cassandrais the username andcassandrais the password. These values will be used with a token provider in the client to generate a TOKEN used for authentication to run the client.-
Before going much further, you should create a new user with a secure password. This new user, once created, will be used to generate a token for authentication.
-
-
You need the API endpoint that your clients will connect to. The API endpoint is the URL of the database you installed. The port number is
8181by default. For example, if you installed the database using a Docker container, the API endpoint ishttp://localhost:8181. This value will be your DB_API_ENDPOINT. -
You may either assign your username/password and API endpoint to environment variables in your terminal, or modify the client code to include them directly, as shown in the examples below. Another value that you will want to set is the
OPENAI_API_KEY. This is the API key that you received when you signed up for the OpenAI API. This key is used to authenticate your requests to the OpenAI API, and the clients use it to vectorize the text that you provide.- Linux or macOS
-
export DB_API_ENDPOINT=DB_API_ENDPOINT # Your database API endpoint export OPENAI_API_KEY=API_KEY # Your OpenAI API key - Windows
-
set DB_DB_API_ENDPOINT=DB_API_ENDPOINT # Your database API endpointset OPENAI_API_KEY=API_KEY # Your OpenAI API key
Install the client
Install the library for the language and package manager you’re using.
- Python
-
To install the Python client with pip:
-
Verify that pip is version 23.0 or later:
pip --version -
Upgrade pip if needed:
python -m pip install --upgrade pip -
Install the
astrapypackage. You must have Python 3.9 to 3.14.pip install astrapy
-
- TypeScript
-
To install the TypeScript client:
-
Verify that Node is version 14 or later:
node --version -
Use npm or Yarn to install the TypeScript client.
-
npm:
npm install @datastax/astra-db-ts -
Yarn 2.0 or later:
yarn add @datastax/astra-db-ts
-
-
- Java
-
Use Maven or Gradle to install the Java client.
-
Maven:
To install the Java client with Maven, install Java 11 or later and Maven 3.9 or later. Then, create a
pom.xmlfile in the root of your project:pom.xml<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>test-java-client</artifactId> <version>1.0-SNAPSHOT</version> <!-- The Java client --> <dependencies> <dependency> <groupId>com.datastax.astra</groupId> <artifactId>astra-db-java</artifactId> <version>1.0.0</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>3.0.0</version> <configuration> <executable>java</executable> <mainClass>com.example.Quickstart</mainClass> </configuration> <executions> <execution> <goals> <goal>java</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>11</source> <target>11</target> </configuration> </plugin> </plugins> </build> </project> -
Gradle:
To install the Java client with Gradle, install Java 11 or later and Gradle. Then, create a
build.gradlefile in the root of your project:build.gradleplugins { id 'java' id 'application' } repositories { mavenCentral() } dependencies { implementation 'com.datastax.astra:astra-db-java:1.0.0' } application { mainClassName = 'com.example.Quickstart' }
-
Initialize the client
Paste the following code into a new file on your computer. If you created the environment variables, you don’t need to include the variables in the code.
- Python
-
To avoid a namespace collision, don’t name the file
astrapy.py.QuickStartDSE69.pyimport os from astrapy import DataAPIClient from astrapy.constants import Environment from astrapy.authentication import UsernamePasswordTokenProvider from astrapy.constants import VectorMetric from astrapy.ids import UUID from astrapy.exceptions import InsertManyException from astrapy.info import CollectionVectorServiceOptions # Database settings DB_USERNAME = "cassandra" DB_PASSWORD = "cassandra" DB_API_ENDPOINT = "http://localhost:8181" DB_KEYSPACE = "my_keyspace" DB_COLLECTION = "vector_test" # Database settings if you exported them as environment variables # DB_USERNAME = os.environ.get("DB_USERNAME") # DB_PASSWORD = os.environ.get("DB_PASSWORD") # DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT") # Embedding provider settings EMBEDDING_PROVIDER = "openai"; EMBEDDING_MODEL_NAME = "text-embedding-3-small"; EMBEDDING_DIMENSIONS = 1024 EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY"); # Build a token tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD) # Initialize the client and get a "Database" object client = DataAPIClient(environment=Environment.DSE) database = client.get_database(DB_API_ENDPOINT, token=tp) database.get_database_admin().create_keyspace(DB_KEYSPACE, update_db_keyspace=True) - TypeScript
-
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts'; // Database settings const DB_USERNAME = "cassandra"; const DB_PASSWORD = "cassandra"; const DB_API_ENDPOINT = "http://localhost:8181"; const DB_ENVIRONMENT = "dse"; const DB_KEYSPACE = "cycling"; // Database settings if you exported them as environment variables // const DB_USERNAME = process.env.DB_USERNAME; // const DB_PASSWORD = process.env.DB_PASSWORD; // const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT; // OpenAI settings const OPEN_AI_PROVIDER = "openai"; const OPENAI_API_KEY = process.env.OPENAI_API_KEY const MODEL_NAME = "text-embedding-3-small"; // Build a token in the required format const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD); // Initialize the client and get a "Db" object const client = new DataAPIClient({ environment: DB_ENVIRONMENT }); const db = client.db(DB_API_ENDPOINT, { token: tp }); const dbAdmin = db.admin({ environment: DB_ENVIRONMENT }); - Java
-
import com.datastax.astra.client.Collection; import com.datastax.astra.client.DataAPIClient; import com.datastax.astra.client.Database; import com.datastax.astra.client.admin.DataAPIDatabaseAdmin; import com.datastax.astra.client.model.CollectionOptions; import com.datastax.astra.client.model.CommandOptions; import com.datastax.astra.client.model.Document; import com.datastax.astra.client.model.FindOneOptions; import com.datastax.astra.client.model.KeyspaceOptions; import com.datastax.astra.client.model.SimilarityMetric; import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider; import java.util.Optional; import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL; import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD; import static com.datastax.astra.client.DataAPIOptions.builder; import static com.datastax.astra.client.model.Filters.eq; public class QuickStartDSE69 { public static void main(String[] args) { // Database Settings String cassandraUserName = "cassandra"; String cassandraPassword = "cassandra"; String dataApiUrl = DEFAULT_ENDPOINT_LOCAL; // http://localhost:8181 String databaseEnvironment = "DSE" // DSE, HCD, or ASTRA String keyspaceName = "ks1"; String collectionName = "lyrics"; // Database settings if you export them as environment variables // String cassandraUserName = System.getenv("DB_USERNAME"); // String cassandraPassword = System.getenv("DB_PASSWORD"); // String dataApiUrl = System.getenv("DB_API_ENDPOINT"); // OpenAI Embeddings String openAiProvider = "openai"; String openAiKey = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY String openAiModel = "text-embedding-3-small"; int openAiEmbeddingDimension = 1536; // Build a token in the form of Cassandra:base64(username):base64(password) String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString(); System.out.println("1/7 - Creating Token: " + token); // Initialize the client DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build()); System.out.println("2/7 - Connected to Data API"); } }
Create a keyspace
- Python
-
QuickStartDSE69.py
database.get_database_admin().create_keyspace(DB_KEYSPACE) - TypeScript
-
QuickStartDSE69.ts
(async () => { await dbAdmin.createKeyspace(DB_KEYSPACE); console.log(await dbAdmin.listKeyspaces()); })(); - Java
-
src/main/java/QuickStartDSE69.java
// Create a default keyspace ((DataAPIDatabaseAdmin) client .getDatabase(dataApiUrl) .getDatabaseAdmin()).createKeyspace(keyspaceName, KeyspaceOptions.simpleStrategy(1)); System.out.println("3/7 - Keyspace '" + keyspaceName + "'created "); Database db = client.getDatabase(dataApiUrl, keyspaceName); System.out.println("4/7 - Connected to Database");
Create a collection
Create a collection in your keyspace.
Choose dimensions that match your vector data and pick an appropriate similarity metric: cosine (default), dot_product, or euclidean.
The embeddings will be generated using the vectorize method, so the collection needs the parameters for using an embedding service.
- Python
-
QuickStartDSE69.py
import os from astrapy import DataAPIClient from astrapy.constants import Environment from astrapy.authentication import UsernamePasswordTokenProvider from astrapy.constants import VectorMetric from astrapy.ids import UUID from astrapy.exceptions import InsertManyException from astrapy.info import CollectionVectorServiceOptions # Database settings DB_USERNAME = "cassandra" DB_PASSWORD = "cassandra" DB_API_ENDPOINT = "http://localhost:8181" DB_KEYSPACE = "my_keyspace" DB_COLLECTION = "vector_test" # Database settings if you exported them as environment variables # DB_USERNAME = os.environ.get("DB_USERNAME") # DB_PASSWORD = os.environ.get("DB_PASSWORD") # DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT") # Embedding provider settings EMBEDDING_PROVIDER = "openai"; EMBEDDING_MODEL_NAME = "text-embedding-3-small"; EMBEDDING_DIMENSIONS = 1024 EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY"); # Build a token tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD) # Initialize the client and get a "Database" object client = DataAPIClient(environment=Environment.DSE) database = client.get_database(DB_API_ENDPOINT, token=tp) database.get_database_admin().create_keyspace(DB_KEYSPACE, update_db_keyspace=True) # Create a collection. The default similarity metric is cosine. If you're not # sure what dimension to set, use whatever dimension vector your embeddings # model produces. collection = database.create_collection( DB_COLLECTION, dimension=EMBEDDING_DIMENSIONS, metric=VectorMetric.COSINE, service={ "provider": EMBEDDING_PROVIDER, "modelName": EMBEDDING_MODEL_NAME, }, embedding_api_key=EMBEDDING_API_KEY, keyspace=DB_KEYSPACE, check_exists=False, ) print(f"* Collection: {collection.full_name}\n") - TypeScript
-
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts'; // Database settings const DB_USERNAME = "cassandra"; const DB_PASSWORD = "cassandra"; const DB_API_ENDPOINT = "http://localhost:8181"; const DB_ENVIRONMENT = "dse"; const DB_KEYSPACE = "cycling"; // Database settings if you exported them as environment variables // const DB_USERNAME = process.env.DB_USERNAME; // const DB_PASSWORD = process.env.DB_PASSWORD; // const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT; // OpenAI settings const OPEN_AI_PROVIDER = "openai"; const OPENAI_API_KEY = process.env.OPENAI_API_KEY const MODEL_NAME = "text-embedding-3-small"; // Build a token in the required format const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD); // Initialize the client and get a "Db" object const client = new DataAPIClient({ environment: DB_ENVIRONMENT }); const db = client.db(DB_API_ENDPOINT, { token: tp }); const dbAdmin = db.admin({ environment: DB_ENVIRONMENT }); // Schema for the collection (VectorDoc adds the $vector field) interface Idea extends VectorDoc { idea: string, } (async function () { // Create a typed, vector-enabled collection. The default metric is cosine. // If you're not sure what dimension to set, use whatever dimension vector // your embeddings model produces. const collection = await db.createCollection<Idea>('vector_test', { keyspace: DB_KEYSPACE, vector: { service: { provider: OPEN_AI_PROVIDER, modelName: MODEL_NAME }, dimension: 5, metric: 'cosine', }, embeddingApiKey: OPENAI_API_KEY, checkExists: false }); console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`); })(); - Java
-
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection; import com.datastax.astra.client.DataAPIClient; import com.datastax.astra.client.Database; import com.datastax.astra.client.admin.DataAPIDatabaseAdmin; import com.datastax.astra.client.model.CollectionOptions; import com.datastax.astra.client.model.CommandOptions; import com.datastax.astra.client.model.Document; import com.datastax.astra.client.model.FindOneOptions; import com.datastax.astra.client.model.KeyspaceOptions; import com.datastax.astra.client.model.SimilarityMetric; import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider; import java.util.Optional; import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL; import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD; import static com.datastax.astra.client.DataAPIOptions.builder; import static com.datastax.astra.client.model.Filters.eq; public class QuickStartDSE69 { public static void main(String[] args) { // Database Settings String cassandraUserName = "cassandra"; String cassandraPassword = "cassandra"; String dataApiUrl = DEFAULT_ENDPOINT_LOCAL; // http://localhost:8181 String databaseEnvironment = "DSE" // DSE, HCD, or ASTRA String keyspaceName = "ks1"; String collectionName = "lyrics"; // Database settings if you export them as environment variables // String cassandraUserName = System.getenv("DB_USERNAME"); // String cassandraPassword = System.getenv("DB_PASSWORD"); // String dataApiUrl = System.getenv("DB_API_ENDPOINT"); // OpenAI Embeddings String openAiProvider = "openai"; String openAiKey = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY String openAiModel = "text-embedding-3-small"; int openAiEmbeddingDimension = 1536; // Build a token in the form of Cassandra:base64(username):base64(password) String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString(); System.out.println("1/7 - Creating Token: " + token); // Initialize the client DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build()); System.out.println("2/7 - Connected to Data API"); // Create a collection Collection<Document> collectionLyrics = db.createCollection(collectionName, CollectionOptions.builder() .vectorDimension(5) .vectorSimilarity(SimilarityMetric.COSINE) .build(), System.out.println("5/7 - Collection created"); })();
Load vector embeddings
Insert a few documents into the collection.
Two methods are available for inserting vector data:
-
The
$vectorizemethod generates embeddings using a specified embedding service. -
The
$vectormethod is used when you already have embeddings.
These methods are mutually exclusive, but you can use both in the same database.
For example, you can use $vectorize as your primary embedding generation method, and then use $vector to insert some bespoke embeddings as needed.
Use the $vectorize method
The following examples use OpenAI as the embedding provider.
The $vectorize Data API method supports embedding providers other than OpenAI.
To get available providers, see your client’s reference for methods to get providers.
The Data API client initialization is different for DataStax Enterprise (DSE), but once you have created the database admin object, you can create a collection with any supported embedding model provider.
You need an API key for the provider.
- Python
-
QuickStartDSE69.py
import os from astrapy import DataAPIClient from astrapy.constants import Environment from astrapy.authentication import UsernamePasswordTokenProvider from astrapy.constants import VectorMetric from astrapy.ids import UUID from astrapy.exceptions import InsertManyException from astrapy.info import CollectionVectorServiceOptions # Database settings DB_USERNAME = "cassandra" DB_PASSWORD = "cassandra" DB_API_ENDPOINT = "http://localhost:8181" DB_KEYSPACE = "my_keyspace" DB_COLLECTION = "vector_test" # Database settings if you exported them as environment variables # DB_USERNAME = os.environ.get("DB_USERNAME") # DB_PASSWORD = os.environ.get("DB_PASSWORD") # DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT") # Embedding provider settings EMBEDDING_PROVIDER = "openai"; EMBEDDING_MODEL_NAME = "text-embedding-3-small"; EMBEDDING_DIMENSIONS = 1024 EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY"); # Build a token tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD) # Initialize the client and get a "Database" object client = DataAPIClient(environment=Environment.DSE) database = client.get_database(DB_API_ENDPOINT, token=tp) database.get_database_admin().create_keyspace(DB_KEYSPACE, update_db_keyspace=True) # Create a collection. The default similarity metric is cosine. If you're not # sure what dimension to set, use whatever dimension vector your embeddings # model produces. collection = database.create_collection( DB_COLLECTION, dimension=EMBEDDING_DIMENSIONS, metric=VectorMetric.COSINE, service={ "provider": EMBEDDING_PROVIDER, "modelName": EMBEDDING_MODEL_NAME, }, embedding_api_key=EMBEDDING_API_KEY, keyspace=DB_KEYSPACE, check_exists=False, ) print(f"* Collection: {collection.full_name}\n") # Insert documents into the collection. # (UUIDs here are version 7.) documents = [ { "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"), "text": "Chat bot integrated sneakers that talk to you", "$vectorize": "Wild! How can they do that?" }, { "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"), "text": "An AI quilt to help you sleep forever", "$vectorize": "Sleep like a baby soft and cuddly" }, { "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"), "text": "A deep learning display that controls your mood", "$vectorize": "I do not want my mood controlled!" }, ] try: insertion_result = collection.insert_many(documents) print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n") except InsertManyException: print("* Documents found on DB already. Let's move on.\n") - TypeScript
-
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts'; // Database settings const DB_USERNAME = "cassandra"; const DB_PASSWORD = "cassandra"; const DB_API_ENDPOINT = "http://localhost:8181"; const DB_ENVIRONMENT = "dse"; const DB_KEYSPACE = "cycling"; // Database settings if you exported them as environment variables // const DB_USERNAME = process.env.DB_USERNAME; // const DB_PASSWORD = process.env.DB_PASSWORD; // const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT; // OpenAI settings const OPEN_AI_PROVIDER = "openai"; const OPENAI_API_KEY = process.env.OPENAI_API_KEY const MODEL_NAME = "text-embedding-3-small"; // Build a token in the required format const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD); // Initialize the client and get a "Db" object const client = new DataAPIClient({ environment: DB_ENVIRONMENT }); const db = client.db(DB_API_ENDPOINT, { token: tp }); const dbAdmin = db.admin({ environment: DB_ENVIRONMENT }); // Schema for the collection (VectorDoc adds the $vector field) interface Idea extends VectorDoc { idea: string, } (async function () { // Create a typed, vector-enabled collection. The default metric is cosine. // If you're not sure what dimension to set, use whatever dimension vector // your embeddings model produces. const collection = await db.createCollection<Idea>('vector_test', { keyspace: DB_KEYSPACE, vector: { service: { provider: OPEN_AI_PROVIDER, modelName: MODEL_NAME }, dimension: 5, metric: 'cosine', }, embeddingApiKey: OPENAI_API_KEY, checkExists: false }); console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`); // Insert documents into the collection (using UUIDv7s) const documents = [ { _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'), text: 'ChatGPT integrated sneakers that talk to you', $vectorize: 'Wild! How can they do that?', }, { _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'), text: 'An AI quilt to help you sleep forever', $vectorize: 'Sleep like a baby soft and cuddly', }, { _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'), text: 'A deep learning display that controls your mood', $vectorize: 'I do not want my mood controlled!', }, ]; try { const inserted = await collection.insertMany(documents); console.log(`* Inserted ${inserted.insertedCount} items.`); } catch (e) { console.log('* Documents found on DB already. Let\'s move on!'); } })(); - Java
-
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection; import com.datastax.astra.client.DataAPIClient; import com.datastax.astra.client.Database; import com.datastax.astra.client.admin.DataAPIDatabaseAdmin; import com.datastax.astra.client.model.CollectionOptions; import com.datastax.astra.client.model.CommandOptions; import com.datastax.astra.client.model.Document; import com.datastax.astra.client.model.FindOneOptions; import com.datastax.astra.client.model.KeyspaceOptions; import com.datastax.astra.client.model.SimilarityMetric; import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider; import java.util.Optional; import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL; import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD; import static com.datastax.astra.client.DataAPIOptions.builder; import static com.datastax.astra.client.model.Filters.eq; public class QuickStartDSE69 { public static void main(String[] args) { // Database Settings String cassandraUserName = "cassandra"; String cassandraPassword = "cassandra"; String dataApiUrl = DEFAULT_ENDPOINT_LOCAL; // http://localhost:8181 String databaseEnvironment = "DSE" // DSE, HCD, or ASTRA String keyspaceName = "ks1"; String collectionName = "lyrics"; // Database settings if you export them as environment variables // String cassandraUserName = System.getenv("DB_USERNAME"); // String cassandraPassword = System.getenv("DB_PASSWORD"); // String dataApiUrl = System.getenv("DB_API_ENDPOINT"); // OpenAI Embeddings String openAiProvider = "openai"; String openAiKey = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY String openAiModel = "text-embedding-3-small"; int openAiEmbeddingDimension = 1536; // Build a token in the form of Cassandra:base64(username):base64(password) String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString(); System.out.println("1/7 - Creating Token: " + token); // Initialize the client DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build()); System.out.println("2/7 - Connected to Data API"); // Create a collection Collection<Document> collectionLyrics = db.createCollection(collectionName, CollectionOptions.builder() .vectorDimension(5) .vectorSimilarity(SimilarityMetric.COSINE) .build(), System.out.println("5/7 - Collection created"); // Insert some documents collectionLyrics.insertMany( new Document(1).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("A lovestruck Romeo sings the streets a serenade"), new Document(2).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("Says something like, You and me babe, how about it?"), new Document(4).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("Juliet says,Hey, it's Romeo, you nearly gimme a heart attack"), new Document(5).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("He's underneath the window"), new Document(6).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("She's singing, Hey la, my boyfriend's back"), new Document(7).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("You shouldn't come around here singing up at people like that"), new Document(8).append("band", "Dire Straits").append("song", "Romeo And Juliet").vectorize("Anyway, what you gonna do about it?")); System.out.println("6/7 - Collection populated"); } }
Use the $vector method
The $vector method can be used if you generate embeddings before loading data.
- Python
-
QuickStartDSE69.py
import os from astrapy import DataAPIClient from astrapy.constants import Environment from astrapy.authentication import UsernamePasswordTokenProvider from astrapy.constants import VectorMetric from astrapy.ids import UUID from astrapy.exceptions import InsertManyException from astrapy.info import CollectionVectorServiceOptions # Database settings DB_USERNAME = "cassandra" DB_PASSWORD = "cassandra" DB_API_ENDPOINT = "http://localhost:8181" DB_KEYSPACE = "my_keyspace" DB_COLLECTION = "vector_test" # Database settings if you exported them as environment variables # DB_USERNAME = os.environ.get("DB_USERNAME") # DB_PASSWORD = os.environ.get("DB_PASSWORD") # DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT") # Embedding provider settings EMBEDDING_PROVIDER = "openai"; EMBEDDING_MODEL_NAME = "text-embedding-3-small"; EMBEDDING_DIMENSIONS = 1024 EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY"); # Build a token tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD) # Initialize the client and get a "Database" object client = DataAPIClient(environment=Environment.DSE) database = client.get_database(DB_API_ENDPOINT, token=tp) database.get_database_admin().create_keyspace(DB_KEYSPACE, update_db_keyspace=True) # Create a collection. The default similarity metric is cosine. If you're not # sure what dimension to set, use whatever dimension vector your embeddings # model produces. collection = database.create_collection( DB_COLLECTION, dimension=EMBEDDING_DIMENSIONS, metric=VectorMetric.COSINE, service={ "provider": EMBEDDING_PROVIDER, "modelName": EMBEDDING_MODEL_NAME, }, embedding_api_key=EMBEDDING_API_KEY, keyspace=DB_KEYSPACE, check_exists=False, ) print(f"* Collection: {collection.full_name}\n") # Insert documents into the collection. # (UUIDs here are version 7.) documents = [ { "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"), "$vectorize": "Chat bot integrated sneakers that talk to you", }, { "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"), "$vectorize": "An AI quilt to help you sleep forever", }, { "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"), "$vectorize": "A deep learning display that controls your mood", }, ] try: insertion_result = collection.insert_many(documents) print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n") except InsertManyException: print("* Documents found on DB already. Let's move on.\n") - TypeScript
-
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts'; // Database settings const DB_USERNAME = "cassandra"; const DB_PASSWORD = "cassandra"; const DB_API_ENDPOINT = "http://localhost:8181"; const DB_ENVIRONMENT = "dse"; const DB_KEYSPACE = "cycling"; // Database settings if you exported them as environment variables // const DB_USERNAME = process.env.DB_USERNAME; // const DB_PASSWORD = process.env.DB_PASSWORD; // const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT; // OpenAI settings const OPEN_AI_PROVIDER = "openai"; const OPENAI_API_KEY = process.env.OPENAI_API_KEY const MODEL_NAME = "text-embedding-3-small"; // Build a token in the required format const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD); // Initialize the client and get a "Db" object const client = new DataAPIClient({ environment: DB_ENVIRONMENT }); const db = client.db(DB_API_ENDPOINT, { token: tp }); const dbAdmin = db.admin({ environment: DB_ENVIRONMENT }); // Schema for the collection (VectorDoc adds the $vector field) interface Idea extends VectorDoc { idea: string, } (async function () { // Create a typed, vector-enabled collection. The default metric is cosine. // If you're not sure what dimension to set, use whatever dimension vector // your embeddings model produces. const collection = await db.createCollection<Idea>('vector_test', { keyspace: DB_KEYSPACE, vector: { service: { provider: OPEN_AI_PROVIDER, modelName: MODEL_NAME }, dimension: 5, metric: 'cosine', }, embeddingApiKey: OPENAI_API_KEY, checkExists: false }); console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`); // Insert documents into the collection (using UUIDv7s) const documents = [ { _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'), text: 'ChatGPT integrated sneakers that talk to you', $vector: [0.25, 0.25, 0.25, 0.25, 0.45], }, { _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'), text: 'An AI quilt to help you sleep forever', $vector: [0.10, 0.15, 0.25, 0.25, 0.15], }, { _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'), text: 'A deep learning display that controls your mood', $vector: 'I do not want my mood controlled!', }, ]; try { const inserted = await collection.insertMany(documents); console.log(`* Inserted ${inserted.insertedCount} items.`); } catch (e) { console.log('* Documents found on DB already. Let\'s move on!'); } })(); - Java
-
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection; import com.datastax.astra.client.DataAPIClient; import com.datastax.astra.client.Database; import com.datastax.astra.client.admin.DataAPIDatabaseAdmin; import com.datastax.astra.client.model.CollectionOptions; import com.datastax.astra.client.model.CommandOptions; import com.datastax.astra.client.model.Document; import com.datastax.astra.client.model.FindOneOptions; import com.datastax.astra.client.model.KeyspaceOptions; import com.datastax.astra.client.model.SimilarityMetric; import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider; import java.util.Optional; import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL; import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD; import static com.datastax.astra.client.DataAPIOptions.builder; import static com.datastax.astra.client.model.Filters.eq; public class QuickStartDSE69 { public static void main(String[] args) { // Database Settings String cassandraUserName = "cassandra"; String cassandraPassword = "cassandra"; String dataApiUrl = DEFAULT_ENDPOINT_LOCAL; // http://localhost:8181 String databaseEnvironment = "DSE" // DSE, HCD, or ASTRA String keyspaceName = "ks1"; String collectionName = "lyrics"; // Database settings if you export them as environment variables // String cassandraUserName = System.getenv("DB_USERNAME"); // String cassandraPassword = System.getenv("DB_PASSWORD"); // String dataApiUrl = System.getenv("DB_API_ENDPOINT"); // OpenAI Embeddings String openAiProvider = "openai"; String openAiKey = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY String openAiModel = "text-embedding-3-small"; int openAiEmbeddingDimension = 1536; // Build a token in the form of Cassandra:base64(username):base64(password) String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString(); System.out.println("1/7 - Creating Token: " + token); // Initialize the client DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build()); System.out.println("2/7 - Connected to Data API"); // Create a collection Collection<Document> collectionLyrics = db.createCollection(collectionName, CollectionOptions.builder() .vectorDimension(5) .vectorSimilarity(SimilarityMetric.COSINE) .build(), System.out.println("5/7 - Collection created"); // Insert some documents collection.insertMany( new Document("1") .append("text", "ChatGPT integrated sneakers that talk to you") .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}), new Document("2") .append("text", "An AI quilt to help you sleep forever") .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}), new Document("3") .append("text", "A deep learning display that controls your mood") .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f})); System.out.println("6/7 - Collection populated"); } }
Perform a similarity search
Find documents that are close to a specific vector embedding.
The output is a sorted list of the documents you inserted.
The database sorts documents by their similarity to the query vector, most similar documents first.
The calculation uses cosine similarity by default.
This code drops the collection at the end of the script. If you want to keep the collection, delete or comment out this part of the code.
- Python
-
QuickStartDSE69.py
import os from astrapy import DataAPIClient from astrapy.constants import Environment from astrapy.authentication import UsernamePasswordTokenProvider from astrapy.constants import VectorMetric from astrapy.ids import UUID from astrapy.exceptions import InsertManyException from astrapy.info import CollectionVectorServiceOptions # Database settings DB_USERNAME = "cassandra" DB_PASSWORD = "cassandra" DB_API_ENDPOINT = "http://localhost:8181" DB_KEYSPACE = "my_keyspace" DB_COLLECTION = "vector_test" # Database settings if you exported them as environment variables # DB_USERNAME = os.environ.get("DB_USERNAME") # DB_PASSWORD = os.environ.get("DB_PASSWORD") # DB_API_ENDPOINT = os.environ.get("DB_API_ENDPOINT") # Embedding provider settings EMBEDDING_PROVIDER = "openai"; EMBEDDING_MODEL_NAME = "text-embedding-3-small"; EMBEDDING_DIMENSIONS = 1024 EMBEDDING_API_KEY = os.environ.get("EMBEDDING_API_KEY"); # Build a token tp = UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD) # Initialize the client and get a "Database" object client = DataAPIClient(environment=Environment.DSE) database = client.get_database(DB_API_ENDPOINT, token=tp) database.get_database_admin().create_keyspace(DB_KEYSPACE, update_db_keyspace=True) # Create a collection. The default similarity metric is cosine. If you're not # sure what dimension to set, use whatever dimension vector your embeddings # model produces. collection = database.create_collection( DB_COLLECTION, dimension=EMBEDDING_DIMENSIONS, metric=VectorMetric.COSINE, service={ "provider": EMBEDDING_PROVIDER, "modelName": EMBEDDING_MODEL_NAME, }, embedding_api_key=EMBEDDING_API_KEY, keyspace=DB_KEYSPACE, check_exists=False, ) print(f"* Collection: {collection.full_name}\n") # Insert documents into the collection. # (UUIDs here are version 7.) documents = [ { "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"), "$vectorize": "Chat bot integrated sneakers that talk to you", }, { "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"), "$vectorize": "An AI quilt to help you sleep forever", }, { "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"), "$vectorize": "A deep learning display that controls your mood", }, ] try: insertion_result = collection.insert_many(documents) print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n") except InsertManyException: print("* Documents found on DB already. Let's move on.\n") # Perform a similarity search query = [0.15, 0.1, 0.1, 0.35, 0.55] results = collection.find( sort={"$vector": query}, limit=10, ) print("Vector search results:") for document in results: print(" ", document) - TypeScript
-
QuickStartDSE69.ts
import { DataAPIClient, UsernamePasswordTokenProvider, VectorDoc, UUID } from '@datastax/astra-db-ts'; // Database settings const DB_USERNAME = "cassandra"; const DB_PASSWORD = "cassandra"; const DB_API_ENDPOINT = "http://localhost:8181"; const DB_ENVIRONMENT = "dse"; const DB_KEYSPACE = "cycling"; // Database settings if you exported them as environment variables // const DB_USERNAME = process.env.DB_USERNAME; // const DB_PASSWORD = process.env.DB_PASSWORD; // const DB_API_ENDPOINT = process.env.DB_API_ENDPOINT; // OpenAI settings const OPEN_AI_PROVIDER = "openai"; const OPENAI_API_KEY = process.env.OPENAI_API_KEY const MODEL_NAME = "text-embedding-3-small"; // Build a token in the required format const tp = new UsernamePasswordTokenProvider(DB_USERNAME, DB_PASSWORD); // Initialize the client and get a "Db" object const client = new DataAPIClient({ environment: DB_ENVIRONMENT }); const db = client.db(DB_API_ENDPOINT, { token: tp }); const dbAdmin = db.admin({ environment: DB_ENVIRONMENT }); // Schema for the collection (VectorDoc adds the $vector field) interface Idea extends VectorDoc { idea: string, } (async function () { // Create a typed, vector-enabled collection. The default metric is cosine. // If you're not sure what dimension to set, use whatever dimension vector // your embeddings model produces. const collection = await db.createCollection<Idea>('vector_test', { keyspace: DB_KEYSPACE, vector: { service: { provider: OPEN_AI_PROVIDER, modelName: MODEL_NAME }, dimension: 5, metric: 'cosine', }, embeddingApiKey: OPENAI_API_KEY, checkExists: false }); console.log(`* Created collection ${collection.keyspace}.${collection.collectionName}`); // Insert documents into the collection (using UUIDv7s) const documents = [ { _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'), text: 'ChatGPT integrated sneakers that talk to you', $vector: [0.25, 0.25, 0.25, 0.25, 0.45], }, { _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'), text: 'An AI quilt to help you sleep forever', $vector: [0.10, 0.15, 0.25, 0.25, 0.15], }, { _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'), text: 'A deep learning display that controls your mood', $vector: 'I do not want my mood controlled!', }, ]; try { const inserted = await collection.insertMany(documents); console.log(`* Inserted ${inserted.insertedCount} items.`); } catch (e) { console.log('* Documents found on DB already. Let\'s move on!'); } // Perform a similarity search const cursor = await collection.find({}, { vector: [0.15, 0.1, 0.1, 0.35, 0.55], limit: 10, includeSimilarity: true, }); console.log('* Search results:') for await (const doc of cursor) { console.log(' ', doc.text, doc.$similarity); } // Cleanup (if desired) await db.dropCollection('vector_test'); console.log('* Collection dropped.'); // Close the client await client.close(); })(); - Java
-
src/main/java/com/example/QuickStartDSE69.java
import com.datastax.astra.client.Collection; import com.datastax.astra.client.DataAPIClient; import com.datastax.astra.client.Database; import com.datastax.astra.client.admin.DataAPIDatabaseAdmin; import com.datastax.astra.client.model.CollectionOptions; import com.datastax.astra.client.model.CommandOptions; import com.datastax.astra.client.model.Document; import com.datastax.astra.client.model.FindOneOptions; import com.datastax.astra.client.model.KeyspaceOptions; import com.datastax.astra.client.model.SimilarityMetric; import com.datastax.astra.internal.auth.UsernamePasswordTokenProvider; import java.util.Optional; import static com.datastax.astra.client.DataAPIClients.DEFAULT_ENDPOINT_LOCAL; import static com.datastax.astra.client.DataAPIOptions.DataAPIDestination.HCD; import static com.datastax.astra.client.DataAPIOptions.builder; import static com.datastax.astra.client.model.Filters.eq; public class QuickStartDSE69 { public static void main(String[] args) { // Database Settings String cassandraUserName = "cassandra"; String cassandraPassword = "cassandra"; String dataApiUrl = DEFAULT_ENDPOINT_LOCAL; // http://localhost:8181 String databaseEnvironment = "DSE" // DSE, HCD, or ASTRA String keyspaceName = "ks1"; String collectionName = "lyrics"; // Database settings if you export them as environment variables // String cassandraUserName = System.getenv("DB_USERNAME"); // String cassandraPassword = System.getenv("DB_PASSWORD"); // String dataApiUrl = System.getenv("DB_API_ENDPOINT"); // OpenAI Embeddings String openAiProvider = "openai"; String openAiKey = System.getenv("OPENAI_API_KEY"); // Need to export OPENAI_API_KEY String openAiModel = "text-embedding-3-small"; int openAiEmbeddingDimension = 1536; // Build a token in the form of Cassandra:base64(username):base64(password) String token = new UsernamePasswordTokenProvider(cassandraUserName, cassandraPassword).getTokenAsString(); System.out.println("1/7 - Creating Token: " + token); // Initialize the client DataAPIClient client = new DataAPIClient(token, builder().withDestination(databaseEnvironment).build()); System.out.println("2/7 - Connected to Data API"); // Create a collection Collection<Document> collectionLyrics = db.createCollection(collectionName, CollectionOptions.builder() .vectorDimension(5) .vectorSimilarity(SimilarityMetric.COSINE) .build(), System.out.println("5/7 - Collection created"); // Insert some documents collection.insertMany( new Document("1") .append("text", "ChatGPT integrated sneakers that talk to you") .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}), new Document("2") .append("text", "An AI quilt to help you sleep forever") .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}), new Document("3") .append("text", "A deep learning display that controls your mood") .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f})); System.out.println("6/7 - Collection populated"); FindIterable<Document> resultsSet = collection.find( new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f}, 10 ); resultsSet.forEach(System.out::println); collection.drop(); System.out.println("Deleted the collection"); } }
Run the code
Run the script:
- Python
-
python QuickStartDSE69.py - TypeScript
-
npm
npx tsx QuickStartDSE69.tsYarnyarn dlx tsx QuickStartDSE69.ts - Java
-
Maven
mvn clean compile export OPENAI_API_KEY=<your-api-key> mvn exec:java -Dexec.mainClass="com.example.QuickStartDSE"Gradlegradle build gradle run