Astra DB Serverless quickstart for collections

network_check Beginner

query_builder 15 min

If your data is fully structured and you want to use a fixed schema, see the quickstart for tables instead.

This quickstart requires a Serverless (vector) database. For Serverless (non-vector) databases, see Get started with the Data API.

This quickstart demonstrates how to create a collection, insert data to the collection, generate vector embeddings, and perform a vector search to find similar data.

The Next steps section discusses how to insert other types of data, use a different embedding model, insert data with pre-generated vector embeddings, or skip embedding generation.

To learn more about vector databases and vector search, see What are vector databases? and What is Vector Search.

Create a database and store your credentials

Sign in or create an Astra account.
Click Create database.
For this quickstart, select the following:
- Type: Serverless (vector)
- Provider: Amazon Web Services
- Region: us-east-2
If applicable to your organization, you can select or create a PCU group for the database.
Click Create database.

Wait for your database to initialize and reach Active status. This can take several minutes.
Under Database Details, copy your database’s API endpoint.
Under Database Details, click Generate Token, then copy the token.

For this quickstart, store the endpoint and token in environment variables:

Linux or macOS
Windows

export API_ENDPOINT=API_ENDPOINT
export APPLICATION_TOKEN=APPLICATION_TOKEN

set API_ENDPOINT=API_ENDPOINT

set APPLICATION_TOKEN=APPLICATION_TOKEN

Install a client

Install one of the Data API clients to facilitate interactions with the Data API.

Python
TypeScript
Go
Java
C#

Update to Python version 3.10 to 3.14 or later if needed.
Update to pip version 23.0 or later if needed.
Install the latest version of the astrapy package .
```
pip install "astrapy>=2.0,<3.0"
```

Update to Node version 18 or later if needed.
Update to TypeScript version 5 or later if needed. This is unnecessary if you are using JavaScript instead of TypeScript.
Install the latest version of the @datastax/astra-db-ts package .

For example:
```
npm install @datastax/astra-db-ts
```

The Go client is in preview. For more information, see astra-db-go.

Update to Go version 1.24 or later if needed.
Install the latest version of the astra-db-go package .

For example:
```
go get github.com/datastax/astra-db-go/v2
```

Maven
Gradle

Update to Java version 17 or later if needed. DataStax recommends Java 21.
Update to Apache Maven™ version 3.9 or later if needed.

Add a dependency to the latest version of the astra-db-java package .

pom.xml

<dependencies>
  <dependency>
    <groupId>com.datastax.astra</groupId>
    <artifactId>astra-db-java</artifactId>
    <version>VERSION</version>
  </dependency>
</dependencies>

Update to Java version 17 or later if needed. DataStax recommends Java 21.
Update to Gradle version 11 or later if needed.
Add a dependency to the latest version of the astra-db-java package .
build.gradle(.kts)
```
dependencies {
    implementation 'com.datastax.astra:astra-db-java:VERSION'
}
```

Update to one of the following:
- .NET version 8 or later
- .NET Framework 4.6.2 or later
- .NET Standard 2.1 or later
Install the latest version of the astra-db-csharp package .
```
dotnet add package DataStax.AstraDB.DataApi
```

Connect to your database

The following function will connect to your database.

Copy the file into your project. You don’t need to execute the function now; the subsequent code examples will import and use this function.

Python
TypeScript
Go
Java
C#

quickstart_connect.py

import os

from astrapy import DataAPIClient, Database


def connect_to_database() -> Database:
    """
    Connects to a DataStax Astra database.
    This function retrieves the database endpoint and application token from the
    environment variables `API_ENDPOINT` and `APPLICATION_TOKEN`.

    Returns:
        Database: An instance of the connected database.

    Raises:
        RuntimeError: If the environment variables `API_ENDPOINT` or
        `APPLICATION_TOKEN` are not defined.
    """
    endpoint = os.environ.get("API_ENDPOINT")  (1)
    token = os.environ.get("APPLICATION_TOKEN")

    if not token or not endpoint:
        raise RuntimeError(
            "Environment variables API_ENDPOINT and APPLICATION_TOKEN must be defined"
        )

    # Create an instance of the `DataAPIClient` class
    client = DataAPIClient()

    # Get the database specified by your endpoint and provide the token
    database = client.get_database(endpoint, token=token)

    print(f"Connected to database {database.info().name}")

    return database

1	Store your database’s endpoint and application token in environment variables named `API_ENDPOINT` and `APPLICATION_TOKEN`, as instructed in Create a database and store your credentials.

quickstart-connect.ts

import { DataAPIClient, Db } from "@datastax/astra-db-ts";

/**
 * Connects to a DataStax Astra database.
 * This function retrieves the database endpoint and application token from the
 * environment variables `API_ENDPOINT` and `APPLICATION_TOKEN`.
 *
 * @returns An instance of the connected database.
 * @throws Will throw an error if the environment variables
 * `API_ENDPOINT` or `APPLICATION_TOKEN` are not defined.
 */
export function connectToDatabase(): Db {
  const { API_ENDPOINT: endpoint, APPLICATION_TOKEN: token } = process.env; (1)

  if (!token || !endpoint) {
    throw new Error(
      "Environment variables API_ENDPOINT and APPLICATION_TOKEN must be defined.",
    );
  }

  // Create an instance of the `DataAPIClient` class
  const client = new DataAPIClient();

  // Get the database specified by your endpoint and provide the token
  const database = client.db(endpoint, { token });

  console.log(`Connected to database ${database.id}`);

  return database;
}

1	Store your database’s endpoint and application token in environment variables named `API_ENDPOINT` and `APPLICATION_TOKEN`, as instructed in Create a database and store your credentials.

The Go client is in preview. For more information, see astra-db-go.

shared/quickstart-connect.go

package shared

import (
	"fmt"
	"log"
	"os"

	"github.com/datastax/astra-db-go/v2/astra"
	"github.com/datastax/astra-db-go/v2/astra/options"
)

// ConnectToDatabase connects to a DataStax Astra database.
// This function retrieves the database endpoint and application token
// from the environment variables `API_ENDPOINT` and `APPLICATION_TOKEN`.
//
// Returns an instance of the connected database.
// Exits with an error if the environment variables
// `API_ENDPOINT` or `APPLICATION_TOKEN` are not defined.
func ConnectToDatabase() *astra.Db {
	endpoint := os.Getenv("API_ENDPOINT") (1)
	token := os.Getenv("APPLICATION_TOKEN")

	if token == "" || endpoint == "" {
		log.Fatal(
			"Environment variables API_ENDPOINT and APPLICATION_TOKEN must be defined.",
		)
	}

	// Create an instance of `DataAPIClient`
	client := astra.NewClient()

	// Get the database specified by your endpoint and provide the token
	database := client.Database(endpoint, options.API().SetToken(token))

	fmt.Printf("Connected to database %s\n", database.Endpoint())

	return database
}

1	Store your database’s endpoint and application token in environment variables named `API_ENDPOINT` and `APPLICATION_TOKEN`, as instructed in Create a database and store your credentials.

src/main/java/com/quickstart/QuickstartConnect.java

package com.quickstart;

import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.databases.Database;

public class QuickstartConnect {

  /**
   * Connects to a DataStax Astra database. This function retrieves the database endpoint and
   * application token from the environment variables `API_ENDPOINT` and `APPLICATION_TOKEN`.
   *
   * @return an instance of the connected database
   * @throws IllegalStateException if the environment variables `API_ENDPOINT` or
   *     `APPLICATION_TOKEN` are not defined
   */
  public static Database connectToDatabase() {
    String endpoint = System.getenv("API_ENDPOINT"); (1)
    String token = System.getenv("APPLICATION_TOKEN");

    if (endpoint == null || token == null) {
      throw new IllegalStateException(
          "Environment variables API_ENDPOINT and APPLICATION_TOKEN must be defined");
    }

    // Create an instance of `DataAPIClient` with your token.
    DataAPIClient client = new DataAPIClient(token);

    // Get the database specified by your endpoint.
    Database database = client.getDatabase(endpoint);

    System.out.println("Connected to database.");

    return database;
  }
}

1	Store your database’s endpoint and application token in environment variables named `API_ENDPOINT` and `APPLICATION_TOKEN`, as instructed in Create a database and store your credentials.

QuickstartConnect.cs

using DataStax.AstraDB.DataApi;
using DataStax.AstraDB.DataApi.Core;

namespace Quickstart
{
  public class QuickstartConnect
  {
    public static Database ConnectToDatabase()
    {
      string? endpoint = Environment.GetEnvironmentVariable(
        "API_ENDPOINT"
      ); (1)
      string? token = Environment.GetEnvironmentVariable(
        "APPLICATION_TOKEN"
      );

      if (string.IsNullOrEmpty(endpoint) || string.IsNullOrEmpty(token))
      {
        throw new InvalidOperationException(
          "Environment variables API_ENDPOINT and APPLICATION_TOKEN must be defined"
        );
      }

      // Create an instance of the `DataAPIClient` class
      var client = new DataAPIClient();

      // Get the database specified by your endpoint and provide the token
      var database = client.GetDatabase(endpoint, token);

      Console.WriteLine("Connected to database.");

      return database;
    }
  }
}

1	Store your database’s endpoint and application token in environment variables named `API_ENDPOINT` and `APPLICATION_TOKEN`, as instructed in Create a database and store your credentials.

Create a collection

The following code will create an empty collection in your database.

Copy the code into your project.
If needed, update the import path to the "connect to database" function from the previous section.
Execute the code.

For information about executing code, refer to the documentation for your programming language.

Once the code completes, you should see a printed message confirming the collection creation.

Python
TypeScript
Go
Java
C#

quickstart_create_collection.py

from astrapy.constants import VectorMetric
from astrapy.info import (
    CollectionDefinition,
    CollectionVectorOptions,
    VectorServiceOptions,
)
from quickstart_connect import connect_to_database  (1)


def main() -> None:
    database = connect_to_database()

    collection = database.create_collection(
        "quickstart_collection",  (2)
        definition=CollectionDefinition(
            vector=CollectionVectorOptions(
                metric=VectorMetric.COSINE,
                service=VectorServiceOptions(
                    provider="nvidia",  (3)
                    model_name="nvidia/nv-embedqa-e5-v5",
                ),
            )
        ),
    )

    print(f"Created collection {collection.full_name}")


if __name__ == "__main__":
    main()

1	This is the `connect_to_database` function from the previous section. Update the import path if necessary. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	This code creates a collection named `quickstart_collection`. If you want to use a different name, change the name before running the code.
3	This collection will use the Astra-hosted NVIDIA embedding model to generate vector embeddings. This is currently only supported in certain regions. Ensure that your database is in the Amazon Web Services us-east-2 region, as instructed in Create a database and store your credentials.

This example creates an untyped collection, but you can define a client-side type for your collection to help statically catch errors. For examples, see Create a collection and Typing Collections and Tables.

quickstart-create-collection.ts

import { connectToDatabase } from "./quickstart-connect"; (1)

(async function () {
  const database = connectToDatabase();

  const collection = await database.createCollection(
    "quickstart_collection", (2)
    {
      vector: {
        service: {
          provider: "nvidia", (3)
          modelName: "nvidia/nv-embedqa-e5-v5",
        },
      },
    },
  );

  console.log(`Created collection ${collection.keyspace}.${collection.name}`);
})();

1	This is the `connectToDatabase` function from the previous section. Update the import path if necessary. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	This code creates a collection named `quickstart_collection`. If you want to use a different name, change the name before running the code.
3	This collection will use the Astra-hosted NVIDIA embedding model to generate vector embeddings. This is currently only supported in certain regions. Ensure that your database is in the Amazon Web Services us-east-2 region, as instructed in Create a database and store your credentials.

The Go client is in preview. For more information, see astra-db-go.

quickstart-create-collection/main.go

package main

import (
	"context"
	"fmt"
	"log"

	"quickstart/shared"

	"github.com/datastax/astra-db-go/v2/astra/options"
)

func main() {
	ctx := context.Background()
	database := shared.ConnectToDatabase() (1)

	collection, err := database.CreateCollection(
		ctx,
		"quickstart_collection", (2)
		options.CreateCollection().UpdateVector(
			options.Vector().UpdateService(
				options.VectorService(). (3)
								SetProvider("nvidia").
								SetModelName("nvidia/nv-embedqa-e5-v5"),
			),
		),
	)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("Created collection %s\n", collection.Name())
}

1	This is the `connectToDatabase` function from the previous section. Update the import path if necessary. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	This code creates a collection named `quickstart_collection`. If you want to use a different name, change the name before running the code.
3	This collection will use the Astra-hosted NVIDIA embedding model to generate vector embeddings. This is currently only supported in certain regions. Ensure that your database is in the Amazon Web Services us-east-2 region, as instructed in Create a database and store your credentials.

src/main/java/com/quickstart/QuickstartCreateCollectionDemo.java

package com.quickstart;

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.definition.CollectionDefinition;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.vector.SimilarityMetric;
import com.datastax.astra.client.databases.Database;

public class QuickstartCreateCollectionDemo {

  public static void main(String[] args) {
    Database database = QuickstartConnect.connectToDatabase(); (1)

    Collection<Document> collection =
        database.createCollection(
            "quickstart_collection", (2)
            new CollectionDefinition()
                .vectorSimilarity(SimilarityMetric.COSINE)
                .vectorize("nvidia", "nvidia/nv-embedqa-e5-v5")); (3)

    System.out.println("Created collection " + collection.getCollectionName());
  }
}

1	This is the `connectToDatabase` function from the previous section. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	This code creates a collection named `quickstart_collection`. If you want to use a different name, change the name before running the code.
3	This collection will use the Astra-hosted NVIDIA embedding model to generate vector embeddings. This is currently only supported in certain regions. Ensure that your database is in the Amazon Web Services us-east-2 region, as instructed in Create a database and store your credentials.

This example creates an untyped collection, but you can define a client-side type for your collection to help statically catch errors. For examples, see Create a collection and Custom typing for collections.

QuickstartCreateCollection.cs

using DataStax.AstraDB.DataApi.Collections;
using DataStax.AstraDB.DataApi.Core;

namespace Quickstart
{
  public class QuickstartCreateCollection
  {
    public static async Task Main()
    {
      var database = QuickstartConnect.ConnectToDatabase(); (1)

      await database.CreateCollectionAsync<Document>(
        "quickstart_collection", (2)
        new CollectionDefinition
        {
          Vector = new VectorOptions
          {
            Metric = SimilarityMetric.Cosine,
            Service = new VectorServiceOptions
            {
              Provider = "nvidia", (3)
              ModelName = "nvidia/nv-embedqa-e5-v5",
            },
          },
        }
      );

      Console.WriteLine("Created collection.");
    }
  }
}

1	This is the `ConnectToDatabase` function from the previous section. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	This code creates a collection named `quickstart_collection`. If you want to use a different name, change the name before running the code.
3	This collection will use the Astra-hosted NVIDIA embedding model to generate vector embeddings. This is currently only supported in certain regions. Ensure that your database is in the Amazon Web Services us-east-2 region, as instructed in Create a database and store your credentials.

Insert data to your collection

The following code will insert data from a JSON file to your collection.

Copy the code into your project.
Download the quickstart_dataset.json sample dataset (76 kB). This dataset is a JSON array describing library books.
Replace PATH_TO_DATA_FILE in the code with the path to the dataset.
If needed, update the import path to the "connect to database" function from the previous section.
Execute the code.

For information about executing code, refer to the documentation for your programming language.

Once the code completes, you should see a printed message confirming the insertion of 100 documents.

Python
TypeScript
Go
Java
C#

quickstart_insert_to_collection.py

import json

from astrapy.data_types import DataAPIDate
from quickstart_connect import connect_to_database  (1)


def main() -> None:
    database = connect_to_database()

    collection = database.get_collection("quickstart_collection")  (2)

    data_file_path = "PATH_TO_DATA_FILE"  (3)

    # Read the JSON file and parse it into a JSON array
    with open(data_file_path, "r", encoding="utf8") as file:
        json_data = json.load(file)

    # Assemble the documents to insert:
    # - Convert the date string into a DataAPIDate
    # - Add a $vectorize field (4)
    documents = [
        {
            **data,
            "due_date": (
                DataAPIDate.from_string(data["due_date"])
                if data.get("due_date")
                else None
            ),
            "$vectorize": (
                f"summary: {data['summary']} | genres: {', '.join(data['genres'])}"
            ),
        }
        for data in json_data
    ]

    # Insert the data
    inserted = collection.insert_many(documents)

    print(f"Inserted {len(inserted.inserted_ids)} documents.")


if __name__ == "__main__":
    main()

1	This is the `connect_to_database` function from the previous section. Update the import path if necessary. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	If you changed the collection name in the previous section, change it here as well.
3	Replace `PATH_TO_DATA_FILE` with the path to the JSON data file.
4	When you insert data to a collection that can automatically generate embeddings, you can specify a `$vectorize` value for the data. The `$vectorize` value will be used to generate vector embeddings. `$vectorize` can be any string and should include the parts of the data that you want to be considered when you search for similar data with a vector search.

quickstart-insert-to-collection.ts

import { connectToDatabase } from "./quickstart-connect"; (1)
import fs from "fs";

(async function () {
  const database = connectToDatabase();

  const collection = database.collection("quickstart_collection"); (2)

  const dataFilePath = "PATH_TO_DATA_FILE"; (3)

  // Read the JSON file and parse it into a JSON array
  const rawData = fs.readFileSync(dataFilePath, "utf8");
  const jsonData = JSON.parse(rawData);

  // Assemble the documents to insert:
  // - Convert the date string into a Date
  // - Add a $vectorize field (4)
  const documents = jsonData.map((data: any) => ({
    ...data,
    due_date: data.due_date ? new Date(data.due_date) : null,
    $vectorize: `summary: ${data["summary"]} | genres: ${data["genres"].join(", ")}`,
  }));

  // Insert the data
  const inserted = await collection.insertMany(documents);

  console.log(`Inserted ${inserted.insertedCount} documents.`);
})();

1	This is the `connectToDatabase` function from the previous section. Update the import path if necessary. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	If you changed the collection name in the previous section, change it here as well.
3	Replace `PATH_TO_DATA_FILE` with the path to the JSON data file.
4	When you insert data to a collection that can automatically generate embeddings, you can specify a `$vectorize` value for the data. The `$vectorize` value will be used to generate vector embeddings. `$vectorize` can be any string and should include the parts of the data that you want to be considered when you search for similar data with a vector search.

The Go client is in preview. For more information, see astra-db-go.

quickstart-insert-to-collection/main.go

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"os"
	"strings"
	"time"

	"quickstart/shared"

	"github.com/datastax/astra-db-go/v2/astra"
)

func main() {
	ctx := context.Background()
	database := shared.ConnectToDatabase() (1)

	collection := database.Collection("quickstart_collection") (2)

	dataFilePath := "PATH_TO_DATA_FILE" (3)

	// Read the JSON file and parse it into a JSON array
	rawData, err := os.ReadFile(dataFilePath)
	if err != nil {
		log.Fatal(err)
	}

	var jsonData []map[string]any
	if err := json.Unmarshal(rawData, &jsonData); err != nil {
		log.Fatal(err)
	}

	// Assemble the documents to insert:
	// - Convert the date string into a time.Time
	// - Add a $vectorize field
	documents := make([]astra.Document, len(jsonData))
	for i, data := range jsonData {
		// Convert due_date string to time.Time if present
		if dueDateStr, ok := data["due_date"].(string); ok &&
			dueDateStr != "" {
			dueDate, err := time.Parse("2006-01-02", dueDateStr)
			if err != nil {
				log.Fatalf("invalid due_date %q: %v", dueDateStr, err)
			}
			data["due_date"] = dueDate
		}

		// Build $vectorize field from summary and genres (4)
		summary, _ := data["summary"].(string)

		genres := []string{}
		if g, ok := data["genres"].([]any); ok {
			for _, genre := range g {
				if genreStr, ok := genre.(string); ok {
					genres = append(genres, genreStr)
				}
			}
		}

		data["$vectorize"] = fmt.Sprintf(
			"summary: %s | genres: %s",
			summary,
			strings.Join(genres, ", "),
		)

		documents[i] = astra.NewDocument(data)
	}

	// Insert the data
	insertedResult, err := collection.InsertMany(
		ctx,
		documents,
	)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf(
		"Inserted %d documents.\n",
		insertedResult.InsertedCount(),
	)
}

1	This is the `connectToDatabase` function from the previous section. Update the import path if necessary. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	If you changed the collection name in the previous section, change it here as well.
3	Replace `PATH_TO_DATA_FILE` with the path to the JSON data file.
4	When you insert data to a collection that can automatically generate embeddings, you can specify a `$vectorize` value for the data. The `$vectorize` value will be used to generate vector embeddings. `$vectorize` can be any string and should include the parts of the data that you want to be considered when you search for similar data with a vector search.

src/main/java/com/quickstart/QuickstartInsertToCollectionDemo.java

package com.quickstart;

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.results.CollectionInsertManyResult;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.databases.Database;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ArrayNode;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.StreamSupport;

public class QuickstartInsertToCollectionDemo {

  public static void main(String[] args) throws IOException {
    Database database = QuickstartConnect.connectToDatabase(); (1)

    Collection<Document> collection = database.getCollection("quickstart_collection"); (2)

    // Initialize Jackson ObjectMapper
    ObjectMapper objectMapper = new ObjectMapper();

    // Read the JSON file and parse it into a JSON array (ArrayNode). (3)
    String rawData = Files.readString(Paths.get("PATH_TO_DATA_FILE"), StandardCharsets.UTF_8);

    ArrayNode jsonData = (ArrayNode) objectMapper.readTree(rawData);

    // Convert the data to a list of Documents, and
    // add a $vectorize field to each piece of data. (4)
    List<Document> documents = new ArrayList<>();

    for (JsonNode data : jsonData) {
      if (data instanceof ObjectNode obj) {
        String summary = obj.path("summary").asText("");
        String genres =
            String.join(
                ", ",
                StreamSupport.stream(obj.withArray("genres").spliterator(), false)
                    .map(JsonNode::asText)
                    .toList());
        String summary_genres = String.format("summary: %s | genres: %s", summary, genres);
        obj.put("$vectorize", summary_genres);
      }

      documents.add(Document.parse(data.toString()));
    }

    CollectionInsertManyResult result = collection.insertMany(documents);

    System.out.println("Inserted " + result.getInsertedIds().size() + " documents.");
  }
}

1	This is the `connectToDatabase` function from the previous section. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	If you changed the collection name in the previous section, change it here as well.
3	Replace `PATH_TO_DATA_FILE` with the path to the JSON data file.
4	When you insert data to a collection that can automatically generate embeddings, you can specify a `$vectorize` value for the data. The `$vectorize` value will be used to generate vector embeddings. `$vectorize` can be any string and should include the parts of the data that you want to be considered when you search for similar data with a vector search.

QuickstartInsertToCollection.cs

using System.Text.Json;
using System.Text.Json.Nodes;
using DataStax.AstraDB.DataApi.Collections;

namespace Quickstart
{
  public class QuickstartInsertToCollection
  {
    public static async Task Main()
    {
      var database = QuickstartConnect.ConnectToDatabase(); (1)

      var collection = database.GetCollection("quickstart_collection"); (2)

      var dataFilePath = "PATH_TO_DATA_FILE"; (3)

      // Read the JSON file and parse it into a JSON array
      string rawData = await File.ReadAllTextAsync(dataFilePath);
      JsonArray jsonArray =
        JsonNode.Parse(rawData)?.AsArray() ?? new JsonArray();

      // Assemble the documents to insert
      var documents = new List<Document>();

      foreach (var node in jsonArray)
      {
        if (node is JsonObject obj)
        {
          var document = new Document();

          foreach (var prop in obj)
          {
            document[prop.Key] = prop.Value switch
            {
              JsonValue val => val.GetValue<object>(),
              JsonArray arr => arr.Deserialize<List<object>>()!,
              JsonObject subObj => subObj.Deserialize<
                Dictionary<string, object>
              >(),
              _ => prop.Value?.ToString(),
            };
          }

          // Add a $vectorize field (4)
          string summary = obj["summary"]?.ToString() ?? "";
          string genresStr = obj["genres"] is JsonArray genres
            ? string.Join(", ", genres)
            : "";
          document["$vectorize"] =
            $"summary: {summary} | genres: {genresStr}";

          documents.Add(document);
        }
      }

      // Insert the data
      var result = await collection.InsertManyAsync(documents);

      Console.WriteLine(
        $"Inserted {result.InsertedIds.Count} documents."
      );
    }
  }
}

1	This is the `ConnectToDatabase` function from the previous section. To use the function, ensure you stored your database’s endpoint and application token in environment variables as instructed in Create a database and store your credentials.
2	If you changed the collection name in the previous section, change it here as well.
3	Replace `PATH_TO_DATA_FILE` with the path to the JSON data file.
4	When you insert data to a collection that can automatically generate embeddings, you can specify a `$vectorize` value for the data. The `$vectorize` value will be used to generate vector embeddings. `$vectorize` can be any string and should include the parts of the data that you want to be considered when you search for similar data with a vector search.

Find data in your collection

After you insert data to your collection, you can search the data. In addition to traditional database filtering, you can perform a vector search to find data that is most similar to a search string.

The following code performs three searches on the sample data that you loaded in Insert data to your collection.

Python
TypeScript
Go
Java
C#

quickstart_find.py

from quickstart_connect import connect_to_database  (1)


def main() -> None:
    database = connect_to_database()

    collection = database.get_collection("quickstart_collection")  (2)

    # Find documents that match a filter
    print("\nFinding books with rating greater than 4.7...")

    rating_cursor = collection.find({"rating": {"$gt": 4.7}})

    for document in rating_cursor:
        print(f"{document['title']} is rated {document['rating']}")

    # Perform a vector search to find the closest match to a search string
    print("\nUsing vector search to find a single scary novel...")

    single_vector_match = collection.find_one(
        sort={"$vectorize": "A scary novel"}
    )

    print(f"{single_vector_match['title']} is a scary novel")

    # Combine a filter, vector search, and projection to find the 3 books with
    # more than 400 pages that are the closest matches to a search string,
    # and just return the title and author
    print(
        "\nUsing filters and vector search to find 3 books with more than 400 pages that are set in the arctic, returning just the title and author..."
    )

    vector_cursor = collection.find(
        {"number_of_pages": {"$gt": 400}},
        sort={"$vectorize": "A book set in the arctic"},
        limit=3,
        projection={"title": True, "author": True},
    )

    for document in vector_cursor:
        print(document)


if __name__ == "__main__":
    main()

1	This is the `connect_to_database` function from the previous section. Update the import path if necessary.
2	If you changed the collection name in the previous section, change it here as well.

quickstart-find.ts

import { connectToDatabase } from "./quickstart-connect"; (1)

(async function () {
  const database = connectToDatabase();

  const collection = database.collection("quickstart_collection"); (2)

  // Find documents that match a filter
  console.log("\nFinding books with rating greater than 4.7...");

  const ratingCursor = collection.find({ rating: { $gt: 4.7 } }, { limit: 10 });

  for await (const document of ratingCursor) {
    console.log(`${document.title} is rated ${document.rating}`);
  }

  // Perform a vector search to find the closest match to a search string
  console.log("\nUsing vector search to find a single scary novel...");

  const singleVectorMatch = await collection.findOne(
    {},
    { sort: { $vectorize: "A scary novel" } },
  );

  console.log(`${singleVectorMatch?.title} is a scary novel`);

  // Combine a filter, vector search, and projection to find the 3 books with
  // more than 400 pages that are the closest matches to a search string,
  // and just return the title and author
  console.log(
    "\nUsing filters and vector search to find 3 books with more than 400 pages that are set in the arctic, returning just the title and author...",
  );

  const vectorCursor = collection.find(
    { number_of_pages: { $gt: 400 } },
    {
      sort: { $vectorize: "A book set in the arctic" },
      limit: 3,
      projection: { title: true, author: true },
    },
  );

  for await (const document of vectorCursor) {
    console.log(document);
  }
})();

1	This is the `connectToDatabase` function from the previous section. Update the import path if necessary.
2	If you changed the collection name in the previous section, change it here as well.

The Go client is in preview. For more information, see astra-db-go.

quickstart-find/main.go

package main

import (
	"context"
	"fmt"
	"log"

	"quickstart/shared"

	"github.com/datastax/astra-db-go/v2/astra"
	"github.com/datastax/astra-db-go/v2/astra/filter"
	"github.com/datastax/astra-db-go/v2/astra/options"
	"github.com/datastax/astra-db-go/v2/astra/sort"
)

func main() {
	ctx := context.Background()

	database := shared.ConnectToDatabase() (1)

	collection := database.Collection("quickstart_collection") (2)

	// Find documents that match a filter
	fmt.Println("\nFinding books with rating greater than 4.7...")

	ratingCursor := collection.Find(
		filter.Gt("rating", 4.7),
		options.CollectionFind().SetLimit(10),
	)
	defer ratingCursor.Close()

	for ratingCursor.Next(ctx) {
		var document astra.Document
		if err := ratingCursor.Decode(&document); err != nil {
			log.Fatal(err)
		}
		fmt.Printf(
			"%v is rated %v\n",
			document.MustGet("title"),
			document.MustGet("rating"),
		)
	}

	if err := ratingCursor.Err(); err != nil {
		log.Fatal(err)
	}

	// Perform a vector search to find the closest match to a search string
	fmt.Println("\nUsing vector search to find a single scary novel...")

	var singleVectorMatch astra.Document
	err := collection.FindOne(
		ctx,
		nil,
		options.CollectionFindOne().
			SetSort(sort.Vectorize("A scary novel")),
	).Decode(&singleVectorMatch)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%v is a scary novel\n", singleVectorMatch.MustGet("title"))

	// Combine a filter, vector search, and projection to find the 3 books
	// with more than 400 pages that are the closest matches to a search
	// string, and just return the title and author
	fmt.Println(
		"\nUsing filters and vector search to find 3 books with more than 400 pages that are set in the arctic, returning just the title and author...",
	)

	vectorCursor := collection.Find(
		filter.Gt("number_of_pages", 400),
		options.CollectionFind().
			SetSort(sort.Vectorize("A book set in the arctic")).
			SetLimit(3).
			SetProjection(map[string]any{"title": true, "author": true}),
	)
	defer vectorCursor.Close()

	for vectorCursor.Next(ctx) {
		var document astra.Document
		if err := vectorCursor.Decode(&document); err != nil {
			log.Fatal(err)
		}
		fmt.Println(document.ToMap())
	}

	if err := vectorCursor.Err(); err != nil {
		log.Fatal(err)
	}
}

1	This is the `connectToDatabase` function from the previous section. Update the import path if necessary.
2	If you changed the collection name in the previous section, change it here as well.

src/main/java/com/quickstart/QuickstartFindDemo.java

package com.quickstart;

import static com.datastax.astra.client.core.query.Projection.include;

import com.datastax.astra.client.collections.Collection;
import com.datastax.astra.client.collections.commands.options.CollectionFindOneOptions;
import com.datastax.astra.client.collections.commands.options.CollectionFindOptions;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.core.query.Filter;
import com.datastax.astra.client.core.query.Filters;
import com.datastax.astra.client.core.query.Sort;
import com.datastax.astra.client.databases.Database;

public class QuickstartFindDemo {

  public static void main(String[] args) {

    Database database = QuickstartConnect.connectToDatabase(); (1)

    Collection<Document> collection =
        database.getCollection(
            "quickstart_collection" (2)
            );

    // Find documents that match a filter
    System.out.println("\nFinding books with rating greater than 4.7...");

    Filter filter = Filters.gt("rating", 4.7);

    CollectionFindOptions options = new CollectionFindOptions().limit(10);

    collection
        .find(filter, options)
        .forEach(
            document -> {
              System.out.println(
                  document.getString("title") + " is rated " + document.get("rating"));
            });

    // Perform a vector search to find the closest match to a search string
    System.out.println("\nUsing vector search to find a single scary novel...");

    CollectionFindOneOptions options2 =
        new CollectionFindOneOptions().sort(Sort.vectorize("A scary novel"));

    collection
        .findOne(options2)
        .ifPresent(
            document -> {
              System.out.println(document.getString("title") + " is a scary novel");
            });

    // Combine a filter, vector search, and projection to find the 3 books with
    // more than 400 pages that are the closest matches to a search string,
    // and just return the title and author
    System.out.println(
        "\nUsing filters and vector search to find 3 books with more than 400 pages that are set in the arctic, returning just the title and author...");

    Filter filter3 = Filters.gt("number_of_pages", 400);

    CollectionFindOptions options3 =
        new CollectionFindOptions()
            .limit(3)
            .sort(Sort.vectorize("A book set in the arctic"))
            .projection(include("title", "author"));

    collection
        .find(filter3, options3)
        .forEach(
            document -> {
              System.out.println(document);
            });
  }
}

1	This is the `connectToDatabase` function from the previous section.
2	If you changed the collection name in the previous section, change it here as well.

QuickstartFind.cs

using DataStax.AstraDB.DataApi.Collections;
using DataStax.AstraDB.DataApi.Core;
using DataStax.AstraDB.DataApi.Core.Query;

namespace Quickstart
{
  public class QuickstartFind
  {
    public static async Task Main()
    {
      var database = QuickstartConnect.ConnectToDatabase(); (1)

      var collection = database.GetCollection("quickstart_collection"); (2)

      // Find documents that match a filter
      Console.WriteLine(
        "\nFinding books with rating greater than 4.7..."
      );

      var filter = Builders<Document>.CollectionFilter.Gt("rating", 4.7);
      var ratingCursor = collection.Find(
        filter,
        new CollectionFindOptions<Document>() { Limit = 10 }
      );
      foreach (var document in ratingCursor)
      {
        Console.WriteLine(
          $"{document["title"]} is rated {document["rating"]}"
        );
      }

      // Perform a vector search to find the closest match to a search string
      Console.WriteLine(
        "\nUsing vector search to find a single scary novel..."
      );

      var singleVectorMatch = await collection.FindOneAsync(
        new CollectionFindOneOptions<Document>()
        {
          Sort = Builders<Document>.CollectionSort.Vectorize(
            "A scary novel"
          ),
        }
      );

      if (singleVectorMatch != null)
      {
        Console.WriteLine(
          $"{singleVectorMatch["title"]} is a scary novel"
        );
      }

      // Combine a filter, vector search, and projection
      // to find the 3 books with more than 400 pages that are
      // the closest matches to a search string,
      // and just return the title and author
      Console.WriteLine(
        "\nUsing filters and vector search to find 3 books with more than 400 pages that are set in the arctic, returning just the title and author..."
      );

      var filter3 = Builders<Document>.CollectionFilter.Gt(
        "number_of_pages",
        400
      );

      var vectorCursor = collection.Find(
        filter3,
        new CollectionFindOptions<Document>()
        {
          Limit = 3,
          Projection = Builders<Document>
            .Projection.Include("title")
            .Include("author"),
          Sort = Builders<Document>.CollectionSort.Vectorize(
            "A book set in the arctic"
          ),
        }
      );
      foreach (var document in vectorCursor)
      {
        Console.WriteLine($"{document["title"]} by {document["author"]}");
      }
    }
  }
}

1	This is the `ConnectToDatabase` function from the previous section.
2	If you changed the collection name in the previous code, change it in this code as well.

Next steps

For more practice, you can continue building with the collection that you created here. For example, try inserting more data to the collection, or try different searches. The Data API reference provides code examples for various operations.

Insert data from different sources

This quickstart demonstrated how to insert data from a JSON file, but you can insert data from many sources, including CSV and PDF files.

If you can convert your data into JSON, you can use the example from this quickstart.
If your data is in unstructured files, such as PDF files, you can use the Astra Portal or write code to use the Unstructured.io integration to insert your data.

This quickstart also demonstrated how to insert data to a collection, which uses a flexible schema. If your data is structured and you want to use a fixed schema, you can use a table instead of a collection. See the quickstart for tables.

Use a different method to generate vector embeddings

This quickstart used the Astra-hosted NVIDIA embedding model to generate vector embeddings. You can also use other embedding models, or you can insert data with pre-generated vector embeddings (or without vector embeddings) and skip embedding.

To use a different embedding model, see Generate and store embeddings in Astra DB Serverless databases.
To insert pre-embedded data, you need to specify the vector dimensions and similarity metric instead of specifying the embedding provider. See the API documentation for collections.
To skip embedding, you can create a collection without any vector options. See the API documentation for collections.

Perform more complex searches

This quickstart demonstrated how to find data using filters and vector search. To learn more about the searches you can perform, see Ways to find data in Astra DB Serverless.

Use different database settings

For this quickstart, you need a Serverless (vector) database in the Amazon Web Services us-east-2 region, which is required for the Astra-hosted NVIDIA embedding model integration. For production databases, you might use different database settings. For more information, see Astra DB Serverless database regions and maintenance schedules and Create an Astra DB Serverless database.

More tutorials and examples

Try building a chatbot that uses this collection for retrieval-augmented generation (RAG) with OpenAI:

For more examples, see the integration guides, code examples, and tutorials.

Astra DB Serverless quickstart for collections

Create a database and store your credentials

Install a client

Connect to your database

Create a collection

Insert data to your collection

Find data in your collection

Next steps

Was this helpful?

Give Feedback