Quickstart
Learn how to create a Serverless (Vector) database, connect to your database, load a few documents with vector embeddings, and perform a similarity search to find documents that are close to the query vector.
Create a Serverless (Vector) database
-
Create an Astra account or sign in to an existing Astra account.
-
In the Astra Portal, select Databases in the main navigation.
-
Click Create Database.
-
In the Create Database dialog, select the Serverless (Vector) deployment type.
-
In Configuration, enter a meaningful, human-readable Database name that meets the following requirements:
-
Starts and ends with a letter or number.
-
Contains only letters, numbers, and the following special characters:
& + - _ ( ) < > . , @
.After you create a database, you can’t change its name.
-
-
Select your preferred Provider and Region.
You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.
-
Click Create Database.
New databases start in Pending status, and then move to Initializing. Your database is ready to use when it reaches Active status.
-
Make sure the database is in Active status, and then, in the Database Details section, click Generate Token.
-
In the Application Token dialog, click
Copy, and then store the token securely. The token format isAstraCS:
followed by a unique token string.Application tokens created from Database Details have the Database Administrator role for the associated database.
-
In Database Details, copy your database’s API endpoint. The endpoint format is
https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com
. -
In your terminal, assign your token and API endpoint to environment variables.
-
Linux or macOS
-
Windows
-
Google Colab
export ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint export ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
set ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
set ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
import os os.environ["ASTRA_DB_API_ENDPOINT"] = "API_ENDPOINT" # Your database API endpoint os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "TOKEN" # Your database application token
-
Install a client
You can interact with Astra DB Serverless in the Astra Portal or programmatically. This tutorial uses the DataStax Python, TypeScript, and Java clients. For more information, see Connection methods comparison and Get started with the Data API.
Use a package manager to install the client library for your preferred language.
-
Python
-
TypeScript
-
Java
Install the Python client with pip:
-
Verify that pip is version 23.0 or higher:
pip --version
-
If needed, upgrade pip:
python -m pip install --upgrade pip
-
Install the
astrapy
package. You must have Python 3.8 or higher.pip install astrapy
Install the TypeScript client:
-
Verify that Node is version 18 or higher.
node --version
-
Install the TypeScript client with your preferred package manager:
-
npm
-
Yarn
-
pnpm
npm install @datastax/astra-db-ts
yarn add @datastax/astra-db-ts
pnpm add @datastax/astra-db-ts
-
Install the Java client with Maven or Gradle.
-
Maven
-
Gradle
-
Install Java 11+ and Maven 3.9+.
-
Create a
pom.xml
file in the root of your project, and then replaceVERSION
with the latest version of the Java client.pom.xml<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>test-java-client</artifactId> <version>1.0-SNAPSHOT</version> <!-- The Java client --> <dependencies> <dependency> <groupId>com.datastax.astra</groupId> <artifactId>astra-db-java</artifactId> <version>VERSION</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>3.0.0</version> <configuration> <executable>java</executable> <mainClass>com.example.Quickstart</mainClass> </configuration> <executions> <execution> <goals> <goal>java</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>11</source> <target>11</target> </configuration> </plugin> </plugins> </build> </project>
-
Install Java 11+ and Gradle.
-
Create a
build.gradle
file in the root of your project:build.gradleplugins { id 'java' id 'application' } repositories { mavenCentral() } dependencies { implementation 'com.datastax.astra:astra-db-java:1.+' } application { mainClassName = 'com.example.Quickstart' }
Create a script
Copy the following quickstart script to a Python, TypeScript, or Java file.
-
Python
-
TypeScript
-
Java
To avoid a namespace collision, don’t name your Python client script files astrapy.py
.
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
# Initialize the client and get a "Database" object
client = DataAPIClient(os.environ["ASTRA_DB_APPLICATION_TOKEN"])
database = client.get_database(os.environ["ASTRA_DB_API_ENDPOINT"])
print(f"* Database: {database.info().name}\n")
# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
"vector_test",
dimension=5,
metric=VectorMetric.COSINE, # Or just 'cosine'.
check_exists=False, # Optional.
)
print(f"* Collection: {collection.full_name}\n")
# Insert documents with embeddings into the collection.
documents = [
{
"text": "Chat bot integrated sneakers that talk to you",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
},
{
"text": "An AI quilt to help you sleep forever",
"$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
},
{
"text": "A deep learning display that controls your mood",
"$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
},
]
insertion_result = collection.insert_many(documents)
print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")
# Perform a similarity search.
query_vector = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
sort={"$vector": query_vector},
limit=10,
include_similarity=True,
)
print("Vector search results:")
for document in results:
print(" ", document)
import { DataAPIClient, VectorDoc, UUID } from '@datastax/astra-db-ts';
const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;
// Initialize the client and get a "Db" object
const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);
console.log(`* Connected to DB ${db.id}`);
// Schema for the collection (VectorDoc adds the $vector field)
interface Idea extends VectorDoc {
idea: string,
}
(async function () {
// Create a collection. The default similarity metric is cosine.
// Choose dimensions that match your vector data.
// If you're not sure, use the vector dimension that your embeddings model produces.
const collection = await db.createCollection<Idea>('vector_test', {
vector: {
dimension: 5,
metric: 'cosine',
},
checkExists: false, // Optional
});
console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);
// Insert documents with embeddings into the collection.
const documents = [
{
idea: 'Chat bot integrated sneakers that talk to you',
$vector: [0.1, 0.15, 0.3, 0.12, 0.05],
},
{
idea: 'An AI quilt to help you sleep forever',
$vector: [0.45, 0.09, 0.01, 0.2, 0.11],
},
{
idea: 'A deep learning display that controls your mood',
$vector: [0.1, 0.05, 0.08, 0.3, 0.6],
},
];
const inserted = await collection.insertMany(documents);
console.log(`* Inserted ${inserted.insertedCount} items.`);
// Perform a similarity search
const cursor = await collection.find({}, {
sort: { $vector: [0.15, 0.1, 0.1, 0.35, 0.55] },
limit: 10,
includeSimilarity: true,
});
console.log('* Search results:');
for await (const doc of cursor) {
console.log(' ', doc.idea, doc.$similarity);
}
})();
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.SimilarityMetric;
public class Quickstart {
public static void main(String[] args) {
// Loading Arguments
String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");
// Initialize the client
DataAPIClient client = new DataAPIClient(astraToken);
System.out.println("Connected to AstraDB");
Database db = client.getDatabase(astraApiEndpoint);
System.out.println("Connected to Database.");
// Create a collection. The default similarity metric is cosine.
// Choose dimensions that match your vector data.
// If you're not sure, use the vector dimension that your embeddings model produces.
Collection<Document> collection = db
.createCollection("vector_test", 5, SimilarityMetric.COSINE);
System.out.println("Created a collection");
// Insert documents with embeddings into the collection
collection.insertMany(
new Document("1")
.append("text", "Chat bot integrated sneakers that talk to you")
.vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
new Document("2")
.append("text", "An AI quilt to help you sleep forever")
.vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
new Document("3")
.append("text", "A deep learning display that controls your mood")
.vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
System.out.println("Inserted documents into the collection");
// Perform a similarity search
FindIterable<Document> resultsSet = collection.find(
new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
10
);
resultsSet.forEach(System.out::println);
}
}
Explore the script
The quickstart script does the following:
-
Initializes the client and connects to your database. For more information, see Get started with the Data API.
-
Creates a collection named
vector_test
that uses the default similarity metriccosine
and dimensionality of5
. For more information about collection settings, see Manage collections and tables. -
Loads documents (structured vector data) into the collection with pre-generated vector embeddings. With Astra DB, you can bring your own embeddings or use vectorize to automatically generate embeddings. For more information about loading data and supported data types, see Load your data.
-
Performs a similarity search to find documents that are close to the specified query vector. This search returns a list of documents sorted by their similarity to the query vector with the most similar documents first. The calculation uses the similarity metric specified in the collection settings. For more information, see Perform a vector search.
Run the script
Run the quickstart script.
-
Python
-
TypeScript
-
Java
python quickstart.py
-
npm
-
Yarn
-
pnpm
npx tsx quickstart.ts
yarn dlx tsx quickstart.ts
pnpx dlx tsx quickstart.ts
-
Maven
-
Gradle
mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
gradle build
gradle run
Next steps
-
In the Astra Portal, use the Data Explorer to view your new collection and documents.
-
Modify and rerun the script to load more documents.
The quickstart script automatically assigns document IDs; therefore, if you rerun the quickstart script, you will load the same documents with different IDs. To avoid returning trivial search results with literal exact matches, modify the script to load different documents, delete the collection, or delete documents from the collection.
-
Modify the vector search portion of the script. For more information, see Perform a vector search and Find documents using filtering options.
For example, if you want the script to return specific information about the queried documents, you can add the projection option. This is useful if you use vectorize and you want the output to include the
$vectorize
text or the generated embedding. -
Explore the Data API reference to learn more about interacting with your Serverless (Vector) database programmatically.
-
Learn how use Astra DB as the vector store for your AI applications through integrations guides, code examples, and tutorials.
-
If you want to clean up the resources you created in this tutorial, see Terminate a database.