Quickstart
The DataStax Astra DB Serverless (Vector) documentation site is currently in Public Preview and is provided on an “AS IS” basis, without warranty or indemnity of any kind. For more, see the DataStax Preview Terms. |
Objective
Learn how to create a new database, connect to your database, load a set of vector embeddings, and perform a similarity search to find vectors that are close to the one in your query.
Create a vector database
-
Create an Astra account or sign in to an existing Astra account.
-
In the Astra Portal, select Databases in the main navigation.
-
Click Create Database.
-
In the Create Database dialog, select the Serverless (Vector) deployment type.
-
In the Configuration section, enter a name for the new database in the Database name field.
Since database names can’t be changed later, it’s best to name your database something meaningful. Database names must start and end with an alphanumeric character, and may contain only the following special characters:
& + - _ ( ) < > . , @
. -
Select your preferred Provider and Region.
You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.
Not all regions may be available. If you don’t see your preferred region listed, please submit a support ticket or send us a message using our live chat in the bottom right of the Astra Portal.
-
Click Create Database.
You are redirected to your new database’s Overview screen. Your database starts in Pending status before transitioning to Initializing. You’ll receive a notification once your database is initialized.
-
Generate an application token.
Once your database is initialized, go to your database’s Overview screen. Under Database Details > Application Tokens, click Generate Token. In the Application Token dialog, click the clipboard icon to copy the token (e.g.
AstraCS:WSnyFUhRxsrg…
). Make sure to store the token in a secure location before closing the dialog.Your token is automatically assigned the Database Administrator role.
-
Copy your database’s API endpoint, located under Database Details > API Endpoint (e.g.
https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com
) -
Assign your token and API endpoint to environment variables in your terminal.
-
Linux or macOS
-
Windows
-
Google Colab
export ASTRA_DB_API_ENDPOINT="<Astra DB API endpoint>" (1) export ASTRA_DB_APPLICATION_TOKEN="<AstraCS:...>" (2)
1 Replace <Astra DB API endpoint>
with your API endpoint.2 Replace <AstraCS:…>
with your Application token.set ASTRA_DB_API_ENDPOINT=<Astra DB API endpoint> (1) set ASTRA_DB_APPLICATION_TOKEN=<AstraCS:...> (2)
1 Replace <Astra DB API endpoint>
with your API endpoint.2 Replace <AstraCS:…>
with your Application token.import os os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "<Astra DB API endpoint>" (1) os.environ["ASTRA_DB_API_ENDPOINT"] = "<AstraCS:...>" (2)
1 Replace <Astra DB API endpoint>
with your API endpoint.2 Replace <AstraCS:…>
with your Application token. -
Set up the client
Install the library for the language and package manager you’re using.
-
Python
-
TypeScript
-
Java
-
Verify that pip is version 23.0 or higher.
pip --version
-
Upgrade pip if needed.
python -m pip install --upgrade pip
-
Install the AstraPy package. You must have Python 3.7 or higher.
pip install astrapy
-
Verify that Node is version 14 or higher.
node --version
-
Install the tsx package globally
npm install -g tsx
-
Install the
astra-db-ts
packagenpm install @datastax/astra-db-ts@latest
Requires Java 11+ and Maven 3.9+ (or Gradle).
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>test-java-client</artifactId>
<version>1.0-SNAPSHOT</version>
<!-- The Java client -->
<dependencies>
<dependency>
<groupId>com.datastax.astra</groupId>
<artifactId>astra-db-client</artifactId>
<version>1.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<executable>java</executable>
<mainClass>com.example.Quickstart</mainClass>
</configuration>
<executions>
<execution>
<goals>
<goal>java</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>11</source>
<target>11</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
plugins {
id 'java'
id 'application'
}
repositories {
mavenCentral()
}
dependencies {
implementation 'com.datastax.astra:astra-db-client:1.0'
}
application {
mainClassName = 'com.example.Quickstart'
}
Paste the following code into a new file on your computer.
-
Python
-
TypeScript
-
Java
import os
from astrapy.db import AstraDB
# Initialization
db = AstraDB(
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)
# Create collection
col = db.create_collection("vector_test", dimension=5, metric="cosine")
Don’t name the file |
import { AstraDB } from "@datastax/astra-db-ts";
async function main() {
const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;
const db = new AstraDB(ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT);
// Create a collection
await db.createCollection(
"vector_test",
{
"vector": {
"dimension": 5,
"metric": "cosine"
}
}
);
const col = await db.collection("vector_test");
// ...
package com.example;
import com.dtsx.astra.sdk.AstraDB;
import io.stargate.sdk.json.CollectionClient;
import io.stargate.sdk.json.domain.JsonDocument;
import io.stargate.sdk.json.domain.JsonResult;
import io.stargate.sdk.json.domain.SimilarityMetric;
import io.stargate.sdk.json.domain.CollectionDefinition;
import java.util.List;
public class Quickstart {
public static void main(String[] args) {
// Loading Arguments
String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");
// Initialization
AstraDB db = new AstraDB(astraToken, astraApiEndpoint);
// Create a collection
CollectionDefinition colDefinition = CollectionDefinition.builder()
.name("vector_test")
.vector(5, SimilarityMetric.cosine)
.build();
db.createCollection(colDefinition);
CollectionClient col = db.collection("vector_test");
// ...
Load vector embeddings
Insert a few documents with embeddings into the vector database.
-
Python
-
TypeScript
-
Java
documents = [
{
"_id": "1",
"text": "ChatGPT integrated sneakers that talk to you",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
},
{
"_id": "2",
"text": "An AI quilt to help you sleep forever",
"$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
},
{
"_id": "3",
"text": "A deep learning display that controls your mood",
"$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
},
]
res = col.insert_many(documents)
// ...
const documents = [
{
"_id": "1",
"text": "ChatGPT integrated sneakers that talk to you",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
},
{
"_id": "2",
"text": "An AI quilt to help you sleep forever",
"$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
},
{
"_id": "3",
"text": "A deep learning display that controls your mood",
"$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
}
];
const results = await col.insertMany(documents);
// ...
// ...
col.insertMany(List.of(
new JsonDocument()
.id("1")
.put("text", "ChatGPT integrated sneakers that talk to you")
.vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
new JsonDocument()
.id("2")
.put("text", "An AI quilt to help you sleep forever")
.vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
new JsonDocument()
.id("3")
.put("text", "A deep learning display that controls your mood")
.vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f})
));
// ...
Use the Data Explorer to inspect your loaded data. |
Perform a similarity search
Find documents that are close to a specific vector embedding.
-
Python
-
TypeScript
-
Java
query = [0.15, 0.1, 0.1, 0.35, 0.55]
results = col.vector_find(query, limit=2, fields={"text", "$vector"})
for document in results:
print(document)
// ...
interface Document {
_id: string;
text: string;
$vector: number[];
}
const options = {
sort: {
"$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
},
limit: 5
};
const document_list = await col.find({}, options).toArray();
document_list.forEach((doc: Document) => console.log(doc));
}
main().catch(console.error);
// ...
List<JsonResult> resultsSet = col.similaritySearch(
new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
10
);
resultsSet.stream().forEach(System.out::println);
}
}
Run the code
Run the code you defined above.
-
Python
-
TypeScript
-
Java
python quickstart.py
tsx quickstart.ts
mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
gradle build
gradle run
You will get a sorted list of the documents you inserted. The database sorts documents by their similarity to the query vector, most similar documents first. The calculation uses cosine similarity by default.
Next steps
Learn more with one of the following tutorials:
-
Build a chatbot with LangChain
Use content scraped from a website to answer questions.
-
Learn how to load your own data. Use the Astra Portal or any of our clients.