Documents reference

Documents represent a single row or record of data in Astra DB Serverless databases. You use the Collection class to work with documents through the Data API. For instructions to get a Collection object, see the Collections reference.

Prerequisites

Insert a single document

Insert a single document into a collection.

When you create a collection, you decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.

When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data:

  • The $vector parameter is a reserved field that stores vector arrays.

    • If the collection requires that you bring your own embeddings, you can include this parameter when you load data.

    • If the collection uses vectorize, you don’t include $vector when you load data. Instead, Astra DB populates the $vector field with the automatically generated embeddings.

    Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use $vector to fetch documents by vector search. You can also use projections to include $vector in responses.

  • The $vectorize parameter is a reserved field that generates embeddings automatically based on a given text string.

    • If the collection requires that you bring your own embeddings, you can not use this parameter.

    • If the collection uses vectorize, you must include this parameter when you load data. The value of $vectorize is the text string from which you want to generate a document’s embedding. Astra DB stores the resulting vector array in $vector.

    When you find, update, replace, or delete documents in a collection that uses vectorize, you can use $vectorize to fetch documents by vector search with vectorize. You can also use projections to include $vectorize in responses.

If you load a document that doesn’t need an embedding, then you can omit $vector and $vectorize.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

insert_result = collection.insert_one({"name": "Jane Doe"})

Insert a document with an associated vector:

insert_result = collection.insert_one(
    {
      "name": "Jane Doe",
      "$vector": [.08, .68, .30],
    },
)

Insert a document and generate a vector automatically:

insert_result = collection.insert_one(
    {
      "name": "Jane Doe",
      "$vectorize": "Text to vectorize",
    },
)

Returns:

InsertOneResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertOneResult(inserted_id='92b4c4f4-db44-4440-b4c4-f4db44e440b8', raw_results=...)

Parameters:

Name Type Summary

document

Dict

The dictionary expressing the document to insert. The _id field of the document can be left out, in which case it will be created automatically. The document may contain the $vector or the $vectorize fields, but not both.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

# Insert a document with a specific ID
response1 = collection.insert_one(
    {
        "_id": 101,
        "name": "John Doe",
        "$vector": [.12, .52, .32],
    },
)

# Insert a document without specifying an ID
# so that _id is generated automatically
response2 = collection.insert_one(
    {
        "name": "Jane Doe",
        "$vector": [.08, .68, .30],
    },
)

For more information, see the API reference.

const result = await collection.insertOne({ name: 'Jane Doe' });

Insert a document with an associated vector:

const result = await collection.insertOne({
  name: 'Jane Doe',
  $vector: [.08, .68, .30],
});

Insert a document and generate a vector automatically:

const result = await collection.insertOne({
  name: 'Jane Doe',
  $vectorize: 'Text to vectorize',
});

Parameters:

Name Type Summary

document

MaybeId<Schema>

The document to insert. If the document does not have an _id field, the server generates one. It may contain a $vector or $vectorize field to enable semantic searching.

options?

InsertOneOptions

The options for this operation.

Options (InsertOneOptions):

Name Type Summary

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertOneResult<Schema>> - A promise that resolves to the inserted ID.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert a document with a specific ID
  await collection.insertOne({ _id: '1', name: 'John Doe' });

  // Insert a document with an autogenerated ID
  await collection.insertOne({ name: 'Jane Doe' });

  // Insert a document with a vector
  await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();

Operations on documents are performed at the Collection level. For more information, see the API reference.

Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

InsertOneResult insertOne(DOC document);
InsertOneResult insertOne(DOC document, float[] embeddings);

// Equivalent in asynchronous
CompletableFuture<InsertOneResult> insertOneAsync(DOC document);
CompletableFuture<InsertOneResult> insertOneAsync(DOC document, float[] embeddings);

Returns:

InsertOneResult - Wrapper with the inserted document Id.

Parameters:

Name Type Summary

document

DOC

Object representing the document to insert. The _id field of the document can be left out, in which case it will be created automatically. If the collection is associated with an embedding service, it will generate a vector automatically from the $vectorize field.

embeddings

float[]

A vector of embeddings (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the $vector field of the document itself.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertOneOptions;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

public class InsertOne {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        InsertOneResult res1 = collectionDoc.insertOne(doc1);
        System.out.println(res1.getInsertedId()); // should be "1"

        // Insert a document with embeddings
        Document doc2 = new Document("2").append("name", "joe");
        collectionDoc.insertOne(doc2, new float[] {.1f, .2f});

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with custom bean
        collectionProduct.insertOne(new Product("1", "joe"));
        collectionProduct.insertOne(new Product("2", "joe"), new float[] {.1f, .2f});

    }
}

Insert a document with a predefined vector:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "key1": "value1",
      "key2": "value2"
    }
  }
}' | jq
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "$vectorize": "Text to use to generate a vector",
      "key1": "value1",
      "key2": "value2"
    }
  }
}' | jq

Parameters:

Name Type Summary

insertOne

command

Data API command to insert one document in a collection.

document

object

Contains the details of the record to add.

With the exception of reserved fields (_id, $vector, and $vectorize), document data can be any valid JSON, including strings, integers, booleans, dates, objects, nested objects, and arrays:

    "document": {
      "string_example": "string value",
      "object_example": {
        "a": "one",
        "b": 2,
        "nested_object": {
          "c": false
        }
      },
      "date_example": { "$date": 1690045891 },
      "array_example": [
        {
          "d.e": "hello",
          "f.g": "goodbye"
        },
        "arbitrary string in an array"
      ]
    }

_id

reserved, multi-type

An optional identifier for the document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Work with document IDs and The defaultId option.

$vector

reserved array

An optional reserved property used to store an array of numbers representing a vector embedding. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.

$vector and $vectorize are mutually exclusive.

$vectorize

reserved string

An optional reserved property used to store a string that you want to use to automatically generate an embedding with vectorize.

$vector and $vectorize are mutually exclusive.

Response

A successful response contains the _id of the inserted document:

{
  "status": {
    "insertedIds": [
      "12"
    ]
  }
}

The insertedIds content depends on the ID type and how it was generated, for example:

  • "insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]

  • `"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"

For more information, see Work with document IDs.

Examples:

Example with $vector
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "purchase_type": "Online",
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": { "$date": 1690045891 },
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car": "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status": "active",
      "preferred_customer": true
    }
  }
}' | jq
Example with $vectorize
curl --location 'REPLACE ME/api/json/v1/default_keyspace/REPLACE ME' \
--header 'Token: REPLACE ME' \
--header 'Content-Type: application/json' \
--header 'x-embedding-api-key;' \
--data '{
  "insertOne": {
    "document": {
      "_id": "1",
      "purchase_type": "Online",
      "$vectorize": "Purchase of a silver BMW sedan in New York.",
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": { "$date": 1690045891 },
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car": "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status": "active",
      "preferred_customer": true
    }
  }
}'

Work with dates

  • Python

  • TypeScript

  • Java

  • curl

Date and datetime objects are instances of the Python standard library datetime.datetime and datetime.date classes that you can use anywhere in documents.

The following example uses dates in insert, update, and find commands. Read operations from a collection always return the datetime class, regardless of whether the original command used date or datetime.

import datetime

from astrapy import DataAPIClient
from astrapy.ids import ObjectId, uuid8, UUID
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

# Insert documents containing date and datetime values:
collection.insert_one({"when": datetime.datetime.now()})
collection.insert_one({"date_of_birth": datetime.date(2000, 1, 1)})
collection.insert_one({"registered_at": datetime.date(1999, 11, 14)})

# Update a document, using a date in the filter:
collection.update_one(
    {"registered_at": datetime.date(1999, 11, 14)},
    {"$set": {"message": "happy Sunday!"}},
)

# Update a document, setting "last_reviewed" to the current date:
collection.update_one(
    {"date_of_birth": {"$exists": True}},
    {"$currentDate": {"last_reviewed": True}},
)

# Find documents by inequality on a date value:
print(
    collection.find_one(
        {"date_of_birth": {"$lt": datetime.date(2001, 1, 1)}},
        projection={"_id": False},
    )
)
# will print:
# {'date_of_birth': datetime.datetime(2000, 1, 1, 0, 0), 'last_reviewed': datetime.datetime(...now...)}

You can use standard JS Date objects anywhere in documents to represent dates and times. Read operations also return Date objects for document fields stored using { $date: number }.

The following example uses dates in insert, update, and find commands:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });

(async function () {
  // Create an untyped collection
  const collection = await db.createCollection('dates_test', { checkExists: false });

  // Insert documents with some dates
  await collection.insertOne({ dateOfBirth: new Date(1394104654000) });
  await collection.insertOne({ dateOfBirth: new Date('1863-05-28') });

  // Update a document with a date and setting lastModified to now
  await collection.updateOne(
    {
      dateOfBirth: new Date('1863-05-28'),
    },
    {
      $set: { message: 'Happy Birthday!' },
      $currentDate: { lastModified: true },
    },
  );

  // Will print around new Date()
  const found = await collection.findOne({ dateOfBirth: { $lt: new Date('1900-01-01') } });
  console.log(found?.lastModified);
})();

The Data API uses the ejson standard to represents time-related objects. The Java client introduces custom serializers as three types of objects: java.util.Date, java.util.Calendar, java.util.Instant. You can use these objects in documents as well as filter clauses and update clauses.

The following example uses dates in insert, update, and find commands:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.Projections;

import java.time.Instant;
import java.util.Calendar;
import java.util.Date;

import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Updates.set;

public class WorkingWithDates {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        Calendar c = Calendar.getInstance();
        collection.insertOne(new Document().append("registered_at", c));
        collection.insertOne(new Document().append("date_of_birth", new Date()));
        collection.insertOne(new Document().append("just_a_date", Instant.now()));

        collection.updateOne(
                eq("registered_at", c), // filter clause
                set("message", "happy Sunday!")); // update clause

        collection.findOne(
                lt("date_of_birth", new Date(System.currentTimeMillis() - 1000 * 1000)),
                new FindOneOptions().projection(Projections.exclude("_id")));
    }
}

You can use $date to represent dates as Unix timestamps in the JSON payload of a Data API command:

"date_of_birth": { "$date": 1690045891 }

The following example includes a date in an insertOne command:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "date_of_birth": { "$date": 1690045891 }
    }
  }
}' | jq

The following example uses the date to find and update a document with the updateOne command:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "updateOne": {
    "filter": {
      "date_of_birth": { "$date": 1690045891 }
    },
    "update": { "$set": { "message": "Happy birthday!" } }
  }
}' | jq

The following example uses the $currentDate update operator to set a property to the current date:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOneAndUpdate": {
    "filter": { "_id": "doc1" },
    "update": {
      "$currentDate": {
        "createdAt": true
        }
      }
    }
}' | jq

Work with document IDs

Documents in a collection are always identified by an ID that is unique within the collection. There are multiple types of document identifiers, such as string, integer, or datetime; however, the uuid and ObjectId types are recommended. The Data API supports uuid identifiers up to version 8 and ObjectId identifiers as provided by the bson library.

When you create a collection, you can set a default ID type that specifies how the Data API generates an _id for any document that doesn’t have an explicit _id field when you insert it into the collection. However, if you provide an explicit _id value, such as "_id": "12", then the server uses this value instead of generating an ID.

Regardless of the defaultId setting, the Data API honors document identifiers of any type, anywhere in a document, that you explicitly provide at any time:

  • You can include identifiers anywhere in a document, not only in the _id field.

  • You can include different types of identifiers in different parts of the same document.

  • You can define identifiers at any time, such as when inserting or updating a document.

  • You can use any of a document’s identifiers for filter clauses and update/replace operations, just like any other data type.

  • Python

  • TypeScript

  • Java

  • curl

AstraPy recognizes uuid versions 1 and 3 through 8, as provided by the uuid and uuid6 Python libraries. AstraPy also recognizes the ObjectId from the bson package. For convenience, these utilities are exposed in AstraPy directly:

from astrapy.ids import (
    ObjectId,
    uuid1,
    uuid3,
    uuid4,
    uuid5,
    uuid6,
    uuid7,
    uuid8,
    UUID,
)

You can generate new identifiers with statements such as new_id = uuid8() or new_obj_id = ObjectId():

from astrapy import DataAPIClient
from astrapy.ids import ObjectId, uuid8, UUID
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"_id": uuid8(), "tag": "new_id_v_8"})
collection.insert_one(
    {"_id": UUID("018e77bc-648d-8795-a0e2-1cad0fdd53f5"), "tag": "id_v_8"}
)
collection.insert_one({"id": ObjectId(), "tag": "new_obj_id"})
collection.insert_one(
    {"id": ObjectId("6601fb0f83ffc5f51ba22b88"), "tag": "obj_id"}
)
collection.find_one_and_update(
    {"_id": ObjectId("6601fb0f83ffc5f51ba22b88")},
    {"$set": {"item_inventory_id": UUID("1eeeaf80-e333-6613-b42f-f739b95106e6")}},
)

All uuid versions are instances of the UUID class, which exposes a version property, if you need to access it.

To use and generate identifiers, astra-db-ts provides the UUID and ObjectId classes. These are not the same as those exported from the bson or uuid libraries. Instead, these are custom classes that you must import from the astra-db-ts package:

import { UUID, ObjectId } from '@datastax/astra-db-ts';

To generate new identifiers, you can use UUID.v4(), UUID.v7(), or new ObjectId():

import { DataAPIClient, UUID, ObjectId } from '@datastax/astra-db-ts';

// Schema for the collection
interface Person {
  _id: UUID | ObjectId;
  name: string;
  friendId?: UUID;
}

// Reference the DB instance
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });

(async function () {
  // Create the collection
  const collection = await db.createCollection<Person>('people', { checkExists: false });

  // Insert documents w/ various IDs
  await collection.insertOne({ name: 'John', _id: UUID.v4() });
  await collection.insertOne({ name: 'Jane', _id: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') });

  await collection.insertOne({ name: 'Dan', _id: new ObjectId()});
  await collection.insertOne({ name: 'Tim', _id: new ObjectId('65fd9b52d7fabba03349d013') });

  // Update a document with a UUID in a non-_id field
  await collection.updateOne(
    { name: 'John' },
    { $set: { friendId: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') } },
  );

  // Find a document by a UUID in a non-_id field
  const john = await collection.findOne({ name: 'John' });
  const jane = await collection.findOne({ _id: john!.friendId });

  // Prints 'Jane 016b1cac-14ce-660e-8974-026c927b9b91 6'
  console.log(jane?.name, jane?._id.toString(), (<UUID>jane?._id).version);
})();

All UUID methods return an instance of the same class, which exposes a version property, if you need to access it. UUIDs can also be constructed from a string representation of the IDs, if you want to use custom generation.

The Java client defines dedicated classes to support different implementations of UUID, particularly v6 and v7.

When a unique identifier is retrieved from the server, it is returned as a uuid, and then it is converted to the appropriate UUID class, based on the class definition in the defaultId option.

ObjectId classes are extracted from the BSON package, and they represent the ObjectId type. UUIDs from the Java UUID class are implemented in the UUID v4 standard.

To generate new identifiers, you can use methods like new UUIDv6(), new UUIDv7(), or new ObjectId():

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.ObjectId;
import com.datastax.astra.client.model.UUIDv6;
import com.datastax.astra.client.model.UUIDv7;

import java.time.Instant;
import java.util.UUID;

import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Updates.set;

public class WorkingWithDocumentIds {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Ids can be different Json scalar
        // ('defaultId' options NOT set for collection)
        new Document().id("abc");
        new Document().id(123);
        new Document().id(Instant.now());

        // Working with UUIDv4
        new Document().id(UUID.randomUUID());

        // Working with UUIDv6
        collection.insertOne(new Document().id(new UUIDv6()).append("tag", "new_id_v_6"));
        UUID uuidv4 = UUID.fromString("018e77bc-648d-8795-a0e2-1cad0fdd53f5");
        collection.insertOne(new Document().id(new UUIDv6(uuidv4)).append("tag", "id_v_8"));

        // Working with UUIDv7
        collection.insertOne(new Document().id(new UUIDv7()).append("tag", "new_id_v_7"));

        // Working with ObjectIds
        collection.insertOne(new Document().id(new ObjectId()).append("tag", "obj_id"));
        collection.insertOne(new Document().id(new ObjectId("6601fb0f83ffc5f51ba22b88")).append("tag", "obj_id"));

        collection.findOneAndUpdate(
                eq((new ObjectId("6601fb0f83ffc5f51ba22b88"))),
                set("item_inventory_id", UUID.fromString("1eeeaf80-e333-6613-b42f-f739b95106e6")));
    }
}

When you insert a document, you can omit _id to automatically generate an ID or you can manually specify an _id, such as "_id": "12".

The following example inserts two documents with manually-defined _id values. One document uses the objectId type, and the other uses the uuid type.

"insertMany": {
  "documents": [
    {
      "_id": { "$objectId": "6672e1cbd7fabb4e5493916f" },
      "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
      "key": "value",
      "amount": 53990
    },
    {
      "_id": { "$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739" },
      "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
      "key": "value",
      "amount": 4600
    }
  ]
}

When you add or update a document, you can include additional identifiers in any document property, other than _id, just as you would any other data type.

Insert many documents

Insert multiple documents into a collection.

When you create a collection, you decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.

When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data:

  • The $vector parameter is a reserved field that stores vector arrays.

    • If the collection requires that you bring your own embeddings, you can include this parameter when you load data.

    • If the collection uses vectorize, you don’t include $vector when you load data. Instead, Astra DB populates the $vector field with the automatically generated embeddings.

    Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use $vector to fetch documents by vector search. You can also use projections to include $vector in responses.

  • The $vectorize parameter is a reserved field that generates embeddings automatically based on a given text string.

    • If the collection requires that you bring your own embeddings, you can not use this parameter.

    • If the collection uses vectorize, you must include this parameter when you load data. The value of $vectorize is the text string from which you want to generate a document’s embedding. Astra DB stores the resulting vector array in $vector.

    When you find, update, replace, or delete documents in a collection that uses vectorize, you can use $vectorize to fetch documents by vector search with vectorize. You can also use projections to include $vectorize in responses.

If you load a document that doesn’t need an embedding, then you can omit $vector and $vectorize.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Insert documents with vector embeddings:

response = collection.insert_many(
    [
        {
            "_id": 101,
            "name": "John Doe",
            "$vector": [.12, .52, .32],
        },
        {
            # ID is generated automatically
            "name": "Jane Doe",
            "$vector": [.08, .68, .30],
        },
    ],
)

Insert multiple documents and generate vectors automatically:

response = collection.insert_many(
    [
        {
            "name": "John Doe",
            "$vectorize": "Text to vectorize for John Doe",
        },
        {
            "name": "Jane Doe",
            "$vectorize": "Text to vectorize for Jane Doe",
        },
    ],
)

Returns:

InsertManyResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertManyResult(inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'], raw_results=...)

Parameters:

Name Type Summary

documents

Iterable[Dict[str, Any]]

An iterable of dictionaries, each a document to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. Each document may contain the $vector or the $vectorize fields, but not both.

ordered

bool

If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance.

DataStax recommends ordered = False, which typically results in a much higher insert throughput than an equivalent ordered insertion.

chunk_size

Optional[int]

How many documents to include in a single API request. The default is 50, and the maximum is 100.

concurrency

Optional[int]

Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])

collection.insert_many(
    [{"seq": i} for i in range(50)],
    concurrency=5,
)

collection.insert_many(
    [
        {"tag": "a", "$vector": [1, 2]},
        {"tag": "b", "$vector": [3, 4]},
    ]
)

For more information, see the API reference.

Insert multiple documents with vectors:

const result = await collection.insertMany([
  {
    _id: '1',
    name: 'John Doe',
    $vector: [.12, .52, .32],
  },
  {
    name: 'Jane Doe',
    $vector: [.08, .68, .30],
  },
], {
  ordered: true,
});

Insert multiple documents and generate vectors automatically:

const result = await collection.insertMany([
  {
    name: 'John Doe',
    $vectorize: 'Text to vectorize for John Doe',
  },
  {
    name: 'Jane Doe',
    $vectorize: 'Text to vectorize for Jane Doe',
  },
], {
  ordered: true,
});

Parameters:

Name Type Summary

documents

MaybeId<Schema>[]

The documents to insert. If any document does not have an _id field, the server generates one. They may each contain a $vector or $vectorize field to enable semantic searching.

options?

InsertManyOptions

The options for this operation.

Options (InsertManyOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all documents may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

DataStax recommends ordered: false, which typically results in a much higher insert throughput than an equivalent ordered insertion.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered insertions. Defaults to 8. This is not available for ordered insertions.

chunkSize?

number

Control how many documents are sent with each network request. The default is 50, and the maximum is 100.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertManyResult<Schema>> - A promise that resolves to the inserted IDs.

Example:

import { DataAPIClient, InsertManyError } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  try {
    // Insert many documents
    await collection.insertMany([
      { _id: '1', name: 'John Doe' },
      { name: 'Jane Doe' }, // Will autogen ID
    ], { ordered: true });

    // Insert many with vectors
    await collection.insertMany([
      { name: 'John Doe', $vector: [.12, .52, .32] },
      { name: 'Jane Doe', $vector: [.32, .52, .12] },
    ]);
  } catch (e) {
    if (e instanceof InsertManyError) {
      console.log(e.partialResult);
    }
  }
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);

// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);

Returns:

InsertManyResult - Wrapper with the list of inserted document ids.

Parameters:

Name Type Summary

docList

List<? extends DOC>

A list of documents to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. If the collection is associated with an embedding service, it will generate vectors automatically from the $vectorize field in each document. You can also set the $vector field directly.

options (optional)

InsertManyOptions

Set the different options for the insert operation. The options are ordered, concurrency, chunkSize.

The java operation insertMany can take as many documents as you want as long as it fits in your JVM memory. It will split the documents in chunks of chunkSize and send them to the server in a distributed way through an ExecutorService.

As a best practice, try to always provide InsertManyOptions, even when using defaults, because it brings visibility to the readers:

InsertManyOptions.Builder
  .chunkSize(20)  // batch size, 100 is max
  .concurrency(8) // concurrent insertions
  .ordered(false) // unordered insertions
  .build();

The default value of chunkSize is 50, and the maximum value is 100. To set the size of the executor use concurrency. DataStax recommends ordered(false) for performance reasons because it can insert chunks in parallel.

If not provided the default values are chunkSize=50, concurrency=1 and ordered=false.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

import java.util.List;

public class InsertMany {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        Document doc2 = new Document("2").append("name", "joe");
        InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
        System.out.println("Identifiers inserted: " + res1.getInsertedIds());

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with embeddings
        InsertManyOptions options = new InsertManyOptions()
                .chunkSize(20)  // how many process per request
                .concurrency(1) // parallel processing
                .ordered(false) // allows parallel processing
                .timeout(1000); // timeout in millis

        InsertManyResult res2 = collectionProduct.insertMany(
                List.of(new Product("1", "joe"),
                        new Product("2", "joe")),
                options);
    }
}

With insertMany, you provide an array of document objects. The document objects have the same format as insertOne.

The Data API accepts up to 100 documents per insertMany request.

Insert multiple documents with vectors:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "key1": "value1",
        "key2": "value2"
      },
      {
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
        "key1": "value3",
        "key2": "value4"
      },
      {
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "key1": "value3",
        "key2": "value4"
      },
    ]
    "options": {
      "ordered": false
    }
  }
}' | jq

Insert multiple documents and generate vectors automatically:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "$vectorize": "text to vectorize for first document",
        "key1": "value1",
        "key2": "value2"
      },
      {
        "$vectorize": "text to vectorize for second document",
        "key1": "value3",
        "key2": "value4"
      },
      {
        "$vectorize": "text to vectorize for third document",
        "key1": "value3",
        "key2": "value4"
      },
    ]
    "options": {
      "ordered": false
    }
  }
}' | jq

Parameters:

Name Type Summary

insertMany

command

Data API command to insert multiple documents. You can insert up to 100 documents at a time.

documents

array of objects

Contains the details of the records to add. It is an array of objects where each object represents a document.

With the exception of reserved fields (_id, $vector, and $vectorize), document data can be any valid JSON, including strings, integers, booleans, dates, objects, nested objects, and arrays:

    "documents": [
      {
        "string_example": "string value",
        "object_example": {
          "a": "one",
          "b": 2,
          "nested_object": {
            "c": false
          }
        },
        "date_example": { "$date": 1690045891 },
        "array_example": [
          {
            "d.e": "hello",
            "f.g": "goodbye"
          },
          "arbitrary string in an array"
        ]
      }
    ]

_id

reserved multi-type

An optional identifier for a document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Work with document IDs and The defaultId option.

$vector

reserved array

An optional reserved property used to store an array of numbers representing a vector embedding for a document. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.

$vector and $vectorize are mutually exclusive.

$vectorize

reserved string

An optional reserved property used to store a string that you want to use to automatically generate an embedding for a document.

$vector and $vectorize are mutually exclusive.

options.ordered

boolean

If false, insertions occur in an arbitrary order with possible concurrency. If true, insertions occur sequentially. If you don’t need ordered inserts, DataStax recommends "ordered": false, which typically results in a much higher insert throughput than an equivalent ordered insertion.

Response

A successful response contains the _id of the inserted documents:

{
  "status": {
    "insertedIds": [
      "4",
      "7",
      "10"
    ]
  }
}

The insertedIds content depends on the ID type and how it was generated, for example:

  • "insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]

  • `"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"

For more information, see Work with document IDs.

Example

The following insertMany request adds three documents to a collection:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "customer": {
          "name": "Jack B.",
          "phone": "123-456-2222",
        "age": 34,
        "credit_score": 700,
          "address": {
            "address_line": "888 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1690391491 },
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [
          {
            "car": "Tesla Model 3",
            "color": "White"
          },
          "Extended warranty - 10 years",
            "Service - 5 years"
        ],
        "amount": 53990,
      "status": "active"
      },
      {
        "purchase_type": "Online",
        "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
        "customer": {
          "name": "Jill D.",
          "phone": "123-456-3333",
        "age": 30,
        "credit_score": 742,
          "address": {
            "address_line": "12345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1690564291 },
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": "Extended warranty - 10 years",
        "amount": 4600,
        "status": "active"
      },
      {
        "purchase_type": "In Person",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Rachel I.",
          "phone": null,
        "age": 62,
        "credit_score": 786,
          "address": {
            "address_line": "1234 Park Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1706202691 },
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [
          {
            "car": "BMW M440i Gran Coupe",
            "color": "Silver"
          },
          "Extended warranty - 5 years",
          "Gap Insurance - 5 years"
        ],
        "amount": 65250,
        "status": "active"
      }
    ],
    "options": {
      "ordered": false
    }
  }
}' | jq

Find a document

Retrieve a single document from a collection using various filter and query options.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Retrieve a single document from a collection by its _id:

document = collection.find_one({"_id": 101})

Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:

document = collection.find_one({"location": "warehouse_C"})

Retrieve a single document from a collection by an arbitrary filtering clause:

document = collection.find_one({"tag": {"$exists": True}})

Retrieve the document that is most similar to a given vector:

result = collection.find_one({}, sort={"$vector": [.12, .52, .32]})

Retrieve the most similar document by running a vector search with vectorize:

result = collection.find_one({}, sort={"$vectorize": "Text to vectorize"})

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

result = collection.find_one({"_id": 101}, projection={"name": True})

Returns:

Union[Dict[str, Any], None] - Either the found document as a dictionary or None if no matching document is found.

Example response
{'_id': 101, 'name': 'John Doe', '$vector': [0.12, 0.52, 0.32]}

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

include_similarity

Optional[bool]

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. Only valid for vector ANN search with $vector or $vectorize.

sort

Optional[Dict[str, Any]]

Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.find_one()
# prints: {'_id': '68d1e515-...', 'seq': 37}
collection.find_one({"seq": 10})
# prints: {'_id': 'd560e217-...', 'seq': 10}
collection.find_one({"seq": 1011})
# (returns None for no matches)
collection.find_one(projection={"seq": False})
# prints: {'_id': '68d1e515-...'}
collection.find_one(
    {},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: {'_id': '97e85f81-...', 'seq': 69}
collection.find_one(sort={"$vector": [1, 0]}, projection={"*": True})
# prints: {'_id': '...', 'tag': 'D', '$vector': [4.0, 1.0]}

For more information, see the API reference.

Retrieve a single document from a collection by its _id:

const doc = await collection.findOne({ _id: '101' });

Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:

const doc = await collection.findOne({ location: 'warehouse_C' });

Retrieve a single document from a collection by an arbitrary filtering clause:

const doc = await collection.findOne({ tag: { $exists: true } });

Retrieve the document that is most similar to a given vector:

const doc = await collection.findOne({}, { sort: { $vector: [.12, .52, .32] } });

Retrieve the most similar document by running a vector search with vectorize:

const doc = await collection.findOne({}, { sort: { $vectorize: 'Text to vectorize' } });

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

const doc = await collection.findOne({ _id: '101' }, { projection: { name: 1 } });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to find. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

options?

FindOneOptions

The options for this operation.

Options (FindOneOptions):

Name Type Summary

projection?

Projection

Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

When specifying a projection, make sure that you handle the return type carefully. Consider type-casting.

includeSimilarity?

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid when performing a vector search with $vector or $vectorize.

sort?

Sort

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, sort can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<FoundDoc<Schema> | null> - A promise that resolves to the found document (inc. $similarity if applicable), or null if no matching document is found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
    { name: 'Jane', age: 25, },
    { name: 'Dave', age: 40, },
  ]);

  // Unpredictably prints one of their names
  const unpredictable = await collection.findOne({});
  console.log(unpredictable?.name);

  // Failed find by name (null)
  const failed = await collection.findOne({ name: 'Carrie' });
  console.log(failed);

  // Find by $gt age (Dave)
  const dave = await collection.findOne({ age: { $gt: 30 } });
  console.log(dave?.name);

  // Find by sorting by age (Jane)
  const jane = await collection.findOne({}, { sort: { age: 1 } });
  console.log(jane?.name);

  // Find by vector similarity (John, 1)
  const john = await collection.findOne({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true });
  console.log(john?.name, john?.$similarity);
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
Optional<T> findOne(Filter filter);
Optional<T> findOne(Filter filter, FindOneOptions options);
Optional<T> findById(Object id); // build the filter for you

// Asynchronous
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter);
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter, FindOneOptions options);
CompletableFuture<Optional<DOC>> findByIdAsync(Filter filter);

You can retrieve documents in various ways, for example:

Additionally, you can use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

In the underlying HTTP request, a filter is a JSON object containing filter and sort parameters, for example:

{
  "findOne": {
    "filter": {
      "$and": [
        { "field2": { "$gt": 10 } },
        { "field3": { "$lt": 20 } },
        { "field4": { "$eq": "value" } }
      ]
    },
    "projection": {
      "_id": 0,
      "field": 1,
      "field2": 1,
      "field3": 1
    },
    "sort": {
      "$vector": [0.25, 0.25, 0.25,0.25, 0.25]
    },
    "options": {
      "includeSimilarity": true
    }
  }
}

You can define the preceding JSON object in Java as follows:

collection.findOne(
  Filters.and(
   Filters.gt("field2", 10),
   Filters.lt("field3", 20),
   Filters.eq("field4", "value")
  ),
  new FindOneOptions()
   .projection(Projections.include("field", "field2", "field3"))
   .projection(Projections.exclude("_id"))
   .vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
   .includeSimilarity()
  )
);

// with the import Static Magic
collection.findOne(
  and(
   gt("field2", 10),
   lt("field3", 20),
   eq("field4", "value")
  ),
  vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
   .projection(Projections.include("field", "field2", "field3"))
   .projection(Projections.exclude("_id"))
   .includeSimilarity()
);

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

options (optional)

FindOneOptions

Set the different options for the findOne operation, including the following:

  • sort(): Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

  • projection(): A list of flags that select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

  • includeSimilarity(): If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

Returns:

Optional<T> - Return the working document matching the filter or Optional.empty() if no document is found.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneOptions;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.and;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.gt;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class FindOne {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Complete FindOne
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));
        FindOneOptions options = new FindOneOptions()
                .projection(include("field", "field2", "field3"))
                .projection(exclude("_id"))
                .sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
                .includeSimilarity();
        Optional<Document> result = collection.findOne(filter, options);

        // with the import Static Magic
        collection.findOne(and(
                gt("field2", 10),
                lt("field3", 20),
                eq("field4", "value")),
               new FindOneOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
                .projection(include("field", "field2", "field3"))
                .projection(exclude("_id"))
                .includeSimilarity()
        );

        // find one with a vectorize
        collection.findOne(and(
                        gt("field2", 10),
                        lt("field3", 20),
                        eq("field4", "value")),
                new FindOneOptions().sort("Life is too short to be living somebody else's dream.")
                        .projection(include("field", "field2", "field3"))
                        .projection(exclude("_id"))
                        .includeSimilarity()
        );

        collection.insertOne(new Document()
                .append("field", "value")
                .append("field2", 15)
                .append("field3", 15)
                .vectorize("Life is too short to be living somebody else's dream."));

    }
}

Use the findOne command to retrieve a document.

Retrieve a single document from a collection by its _id:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOne": {
    "filter": { "_id": "018e65c9-df45-7913-89f8-175f28bd7f74" }
  }
}' | jq

Retrieve a single document from a collection by any property, as long as the property is covered by the collection’s indexing configuration:

"findOne": {
  "filter": { "purchase_date": { "$date": 1690045891 } }
}

Retrieve a single document from a collection by an arbitrary filtering clause:

"findOne": {
  "filter": { "preferred_customer": { "$exists": true } }
}

Retrieve the document that is most similar to a given vector:

"findOne": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] }
}

Retrieve the most similar document by running a vector search with vectorize:

"findOne": {
  "sort": { "$vectorize": "I'd like some talking shoes" }
}

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

"findOne": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
  "projection": { "$vector": 1 }
}

Parameters:

Name Type Summary

findOne

command

The Data API command to retrieve a document in a collection based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

sort

object

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

projection

object

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

"options": { "includeSimilarity": true }

Returns:

A successful response includes a data object that contains a document object representing the document matching the given query. The returned document fields depend on the findOne parameters, namely the projection and options.

"data": {
  "document": {
    "_id": "14"
  }
}
Example

This request retrieves a document from a collection by its _id with the default projection and options:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOne": {
    "filter": { "_id": "14" }
  }
}' | jq

The response contains the document’s _id and all regular fields. The default projection excludes $vector and $vectorize.

{
  "data": {
    "document": {
      "_id": "14",
      "amount": 110400,
      "customer": {
        "address": {
          "address_line": "1414 14th Pl",
          "city": "Brooklyn",
          "state": "NY"
        },
        "age": 44,
        "credit_score": 702,
        "name": "Kris S.",
        "phone": "123-456-1144"
      },
      "items": [
        {
          "car": "Tesla Model X",
          "color": "White"
        }
      ],
      "purchase_date": {
        "$date": 1698513091
      },
      "purchase_type": "In Person",
      "seller": {
        "location": "Brooklyn NYC",
        "name": "Jasmine S."
      }
    }
  }
}

Find documents using filtering options

Where you use findOne to fetch one document that matches a query, you use find to fetch multiple documents that match a query.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:

doc_iterator = collection.find({"category": "house_appliance"}, limit=10)

Find documents matching a filter operator:

document = collection.find({"tag": {"$exists": True}}, limit=10)

Iterate over the documents most similar to a given vector:

doc_iterator = collection.find(
    {},
    sort={"$vector": [0.55, -0.40, 0.08]},
    limit=5,
)

Iterate over similar documents by running a vector search with vectorize:

doc_iterator = collection.find(
    {},
    sort={"$vectorize": "Text to vectorize"},
    limit=5,
)

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

result = collection.find({"category": "house_appliance"}, limit=10, projection={"name": True})

Returns:

Cursor - A cursor for iterating over documents. AstraPy cursors are compatible with for loops, and they provide a few additional features. However, for vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

collection.find returns a cursor that must be iterated over to fetch matching documents.

If you need to materialize a list of all results, you can use list(). However, be aware that the time and memory required for this operation depend on the number of results.

A cursor, while it is consumed, transitions between initialized, running, and exhausted status. exhausted indicates there are no more documents to read.

Example response
Cursor("some_collection", new, retrieved so far: 0)

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

skip

Optional[int]

Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if skip=5, the first 5 documents are discarded, and the results begin at the 6th document.

You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

limit

Optional[int]

Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

include_similarity

Optional[bool]

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. Only valid for vector ANN search with $vector or $vectorize.

include_sort_vector

Optional[bool]

If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

You can’t use include_sort_vector with find_one(). However, you can use include_sort_vector and limit=1 with find().

sort

Optional[Dict[str, Any]]

Use this dictionary parameter to perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

max_time_ms

Optional[int]

A timeout, in milliseconds, for each underlying HTTP request used to fetch documents as you iterate over the cursor. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy

client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.COLLECTION

# Find all documents in the collection
# Not advisable if a very high number of matches is anticipated
for document in collection.find({}):
    print(document)

# Find all documents in the collection with a specific field value
for document in collection.find({"a": 123}):
    print(document)

# Find all documents in the collection matching a compound filter expression
matches = list(collection.find({
    "$and": [
      {"f1": 1},
      {"f2": 2},
    ]
}))

# Same as the preceding example, but using the implicit AND operator
matches = list(collection.find({
    "f1": 1,
    "f2": 2,
}))

# Use the "less than" operator in the filter expression
matches2 = list(collection.find({
    "$and": [
      {"name": "John"},
      {"price": {"$lt": 100}},
    ]
}))

# Run a $vectorize search, get back the query vector along with the documents
results_ite = collection.find(
    {},
    projection={"*": 1},
    limit=3,
    include_sort_vector=True,
    sort={"$vectorize": "Query text"},
)
query = results_ite.get_sort_vector()
for doc in results_ite:
    print(f"{doc['$vectorize']}: {doc['$vector'][:2]}... VS. {query[:2]}...")

For more information, see the API reference.

Find documents matching a property, as long as the property is covered by the collection’s indexing configuration:

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 });

Find documents matching a filter operator:

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { tag: { $exists: true } });

Iterate over the documents most similar to a given vector:

const cursor = collection.find({}, { sort: { $vector: [0.55, -0.40, 0.08] }, limit: 5 });

Iterate over similar documents by running a vector search with vectorize:

const cursor = collection.find({}, { sort: { $vectorize: 'Text to vectorize' }, limit: 5 });

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 }, { projection: { name: 1 } });

Returns:

FindCursor<FoundDoc<Schema>> - A cursor you can use to iterate over the matching documents. For vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

collection.find returns a cursor that must be iterated over to fetch matching documents.

If you need to materialize a list of all results, you can use list(). However, be aware that the time and memory required for this operation depend on the number of results.

A cursor, while it is consumed, transitions between initialized, running, and exhausted status. exhausted indicates there are no more documents to read.

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to find. For a list of available operators, see Data API operators.

options?

FindOptions

The options for this operation.

Options (FindOptions):

Name Type Summary

projection?

Projection

Specifies which fields to include or exclude in the returned documents. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

When specifying a projection, make sure that you handle the return type carefully. Consider type-casting.

includeSimilarity?

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. Only valid for vector ANN search with $vector or $vectorize.

includeSortVector?

boolean

If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

You can’t use includeSortVector with findOne(). However, you can use includeSortVector and limit: 1 with find().

You can also access this through await cursor.getSortVector().

sort?

Sort

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

skip?

number

Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if skip: 5, the first 5 documents are discarded, and the results begin at the 6th document.

You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

limit?

number

Limit the total number of documents returned in the lifetime of the cursor. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request as you iterate over the cursor.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
    { name: 'Jane', age: 25, },
    { name: 'Dave', age: 40, },
  ]);

  // Gets all 3 in some order
  const unpredictable = await collection.find({}).toArray();
  console.log(unpredictable);

  // Failed find by name ([])
  const matchless = await collection.find({ name: 'Carrie' }).toArray();
  console.log(matchless);

  // Find by $gt age (John, Dave)
  const gtAgeCursor = collection.find({ age: { $gt: 25 } });
  for await (const doc of gtAgeCursor) {
    console.log(doc.name);
  }

  // Find by sorting by age (Jane, John, Dave)
  const sortedAgeCursor = collection.find({}, { sort: { age: 1 } });
  await sortedAgeCursor.forEach(console.log);

  // Find first by vector similarity (John, 1)
  const john = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] }, includeSimilarity: true }).next();
  console.log(john?.name, john?.$similarity);
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
FindIterable<T> find(Filter filter, FindOptions options);
// Helper to build filter and options above ^
FindIterable<T> find(FindOptions options); // no filter
FindIterable<T> find(Filter filter); // default options
FindIterable<T> find(); // default options + no filters
FindIterable<T> find(float[] vector, int limit); // semantic search
FindIterable<T> find(Filter filter, float[] vector, int limit);

For more information, see Find a document and the API reference.

Returns:

FindIterable<T> - A cursor that fetches up to the first 20 documents, and it can be iterated to fetch additional documents as needed. However, for vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

The FindIterable is an Iterable that you can use in a for loop to iterate over the returned documents.

The FindIterable fetches chunks of documents, and then fetches more as needed. The FindIterable is a lazy iterator, meaning that it only fetches the next chunk of documents when needed.

You can use the .all() method to exhaust it, but use this with caution.

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter documents. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators.

options (optional)

FindOptions

Set the different options for the find operation, including the following:

  • sort(): Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

  • projection(): A list of flags that select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

  • includeSimilarity(): If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and the document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

  • includeSortVector(): If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

    You can’t use includeSortVector with findOne(). However, you can use includeSortVector and limit(1) with find().

  • limit: Limit the total number of documents returned. Once limit is reached, or the cursor is exhausted due to lack of matching documents, nothing more is returned.

  • skip: Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if skip: 5, the first 5 documents are discarded, and the results begin at the 6th document.

    You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Find {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Find Options
        FindOptions options = new FindOptions()
                .projection(include("field", "field2", "field3")) // select fields
                .projection(exclude("_id")) // exclude some fields
                .sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}) // similarity vector
                .skip(1) // skip first item
                .limit(10) // stop after 10 items (max records)
                .pageState("pageState") // used for pagination
                .includeSimilarity(); // include similarity

        // Execute a find operation
        FindIterable<Document> result = collection.find(filter, options);

        // Iterate over the result
        for (Document document : result) {
            System.out.println(document);
        }
    }
}

Use the find command to retrieve multiple documents matching a query.

Retrieve documents by any property, as long as the property is covered by the collection’s indexing configuration:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": { "purchase_date": { "$date": 1690045891 } }
  }
}' | jq

Retrieve documents matching a filter operator:

"find": {
  "filter": { "preferred_customer": { "$exists": true } }
}
More filter operator examples

Match values that are equal to the filter value:

"find": {
  "filter": {
    "customer": {
      "$eq": {
        "name": "Jasmine S.",
        "city": "Jersey City"
      }
    }
  }
}

Match values that are not the filter value:

"find": {
  "filter": {
    "$not": {
      "customer.address.state": "NJ"
    }
  }
}

You can use similar $not operators for arrays, such as $nin an $ne.

Match any of the specified values in an array:

"find": {
  "filter": {
    "customer.address.city": {
      "$in": [ "Jersey City", "Orange" ]
    }
  }
}

Match all in an array:

"find": {
  "filter": {
    "items": {
      "$all": [
        {
          "car": "Sedan",
          "color": "White"
        },
        "Extended warranty"
      ]
    }
  }
}

Compound and/or operators:

"find": {
  "filter": {
    "$and": [
      {
        "$or": [
          { "customer.address.city": "Jersey City" },
          { "customer.address.city": "Orange" }
        ]
      },
      {
        "$or": [
          { "seller.name": "Jim A." },
          { "seller.name": "Tammy S." }
        ]
      }
    ]
  }
}

Compound range operators:

"find": {
  "filter": {
    "$and": [
      { "customer.credit_score": { "$gte": 700 } },
      { "customer.credit_score": { "$lt": 800 } }
    ]
  }
}

Retrieve documents that are most similar to a given vector:

"find": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
  "options": {
    "limit": 100
  }
}

Retrieve similar documents by running a vector search with vectorize:

"find": {
  "sort": { "$vectorize": "I'd like some talking shoes" },
  "options": {
    "limit": 100
  }
}

Use a projection to specify the fields returned from each document. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

"find": {
  "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
  "projection": { "$vector": 1 },
  "options": {
    "includeSimilarity": true,
    "limit": 100
  }
}

Parameters:

Name Type Summary

find

command

The Data API command to retrieve multiple document in a collection based on one or more of filter, sort, projection, and options.

filter

object

An object that defines filter criteria using the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators.

sort

object

Perform a vector similarity search or set the order in which documents are returned. For similarity searches, this parameter can use either $vector or $vectorize, but not both in the same request. For more information and examples, see Example values for sort operations.

projection

object

Select a subset of fields to include in the response for each returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

options.includeSimilarity

boolean

If true, the response includes a $similarity key with the numeric similarity score that represents the closeness of the sort vector and each document’s vector. This is only valid for vector ANN search with $vector or $vectorize.

"options": { "includeSimilarity": true }

options.includeSortVector

boolean

If true, the response includes the sortVector. The default is false. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This can be useful for $vectorize because you don’t know the sort vector in advance.

"options": { "includeSortVector": true }

You can’t use includeSortVector with findOne. However, you can use includeSortVector and limit: 1 with find.

skip

integer

Specify a number of documents to bypass (skip) before returning documents. The first n documents matching the query are discarded from the results, and the results begin at the skip+1 document. For example, if "skip": 5, the first 5 documents are discarded, and the results begin at the 6th document.

You can use this parameter only in conjunction with an explicit sort criterion of the ascending/descending type. It is not valid with vector ANN search (with $vector or $vectorize).

limit

integer

Limit the total number of documents returned. Pagination can occur if more than 20 documents are returned in the current set of matching documents. Once the limit is reached, either in a single response or the last page of a paginated response, nothing more is returned.

Returns:

A successful response can include a data object and a status object:

  • The data object contains documents, which is an array of objects. Each object represents a document matching the given query. The returned fields in each document object depend on the findMany parameters, namely the projection and options.

    For vector ANN search (with $vector or $vectorize), the response is a single page of up to 1000 documents, unless you set a lower limit.

    For non-vector searches, pagination occurs if there are more than 20 matching documents, as indicated by the nextPageState key. If there are no more documents, nextPageState is null or omitted. If there are more documents, nextPageState contains an ID.

    {
      "data": {
        "documents": [
          {
            "_id": { "$uuid": "018e65c9-df45-7913-89f8-175f28bd7f74" }
          },
          {
            "_id": { "$uuid": "018e65c9-e33d-749b-9386-e848739582f0" }
          }
        ],
        "nextPageState": null
      }
    }

    In the event of pagination, you must issue a subsequent request with a pageState ID to fetch the next page of documents that matched the filter. As long as there is a subsequent page with matching documents, the transaction returns a nextPageState ID, which you use as the pageState for the subsequent request. Each paginated request is exactly the same as the original request, except for the addition of the pageState in the options object:

    {
      "find": {
        "filter": { "active_user": true },
        "options": { "pageState": "NEXT_PAGE_STATE_FROM_PRIOR_RESPONSE" }
      }
    }

    Continue issuing requests with the subsequent pageState ID until you have fetched all matching documents.

  • The status object contains the sortVector value if you set includeSortVector to true in the request:

    "status": { "sortVector": [0.4, 0.1, ...] }

Examples:

Example of simple property filter

This example uses a simple filter based on two document properties:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "customer.address.city": "Hoboken",
      "customer.address.state": "NJ"
    }
  }
}' | jq

The response returned one matching document:

{
  "data": {
    "documents": [
      {
        "$vector": [
          0.1,
          0.15,
          0.3,
          0.12,
          0.09
        ],
        "_id": "17",
        "amount": 54900,
        "customer": {
          "address": {
            "address_line": "1234 Main St",
            "city": "Hoboken",
            "state": "NJ"
          },
          "age": 61,
          "credit_score": 694,
          "name": "Yolanda Z.",
          "phone": "123-456-1177"
        },
        "items": [
          {
            "car": "Tesla Model 3",
            "color": "Blue"
          },
          "Extended warranty - 5 years"
        ],
        "purchase_date": {
          "$date": 1702660291
        },
        "purchase_type": "Online",
        "seller": {
          "location": "Jersey City NJ",
          "name": "Jim A."
        },
        "status": "active"
      }
    ],
    "nextPageState": null
  }
}
Example of logical operators in a filter

This example uses the $and and $or logical operators to retrieve documents matching one condition from each $or clause. In this case, the customer.address.city must be either Jersey City or Orange and the seller.name must be either Jim A. or Tammy S..

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "filter": {
      "$and": [
        {
          "$or": [
            { "customer.address.city": "Jersey City" },
            { "customer.address.city": "Orange" }
          ]
        },
        {
          "$or": [
            { "seller.name": "Jim A." },
            { "seller.name": "Tammy S." }
          ]
        }
      ]
    }
  }
}' | jq

The response returned two matching documents:

{
  "data": {
    "documents": [
      {
        "$vector": [
          0.3,
          0.23,
          0.15,
          0.17,
          0.4
        ],
        "_id": "8",
        "amount": 46900,
        "customer": {
          "address": {
            "address_line": "1234 Main St",
            "city": "Orange",
            "state": "NJ"
          },
          "age": 29,
          "credit_score": 710,
          "name": "Harold S.",
          "phone": "123-456-8888"
        },
        "items": [
          {
            "car": "BMW X3 SUV",
            "color": "Black"
          },
          "Extended warranty - 5 years"
        ],
        "purchase_date": {
          "$date": 1693329091
        },
        "purchase_type": "In Person",
        "seller": {
          "location": "Staten Island NYC",
          "name": "Tammy S."
        },
        "status": "active"
      },
      {
        "$vector": [
          0.25,
          0.045,
          0.38,
          0.31,
          0.67
        ],
        "_id": "5",
        "amount": 94990,
        "customer": {
          "address": {
            "address_line": "32345 Main Ave",
            "city": "Jersey City",
            "state": "NJ"
          },
          "age": 50,
          "credit_score": 800,
          "name": "David C.",
          "phone": "123-456-5555"
        },
        "items": [
          {
            "car": "Tesla Model S",
            "color": "Red"
          },
          "Extended warranty - 5 years"
        ],
        "purchase_date": {
          "$date": 1690996291
        },
        "purchase_type": "Online",
        "seller": {
          "location": "Jersey City NJ",
          "name": "Jim A."
        },
        "status": "active"
      }
    ],
    "nextPageState": null
  }
}

Example values for sort operations

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

Data API commands, such as find, findOne, deleteOne, updateOne, and so on, can use sort clauses to organize results based on similarity, or dissimilarity, to the given filter, such as a vector or field.

Additionally, you can use a projection to include specific document properties in the response. A projection is required if you want to return certain reserved fields, like $vector and $vectorize, that are excluded by default.

For more specific information, examples, and parameters for operations that support sorting, see the explanations of the find, update, replace, and delete operations elsewhere on this page.

  • Python

  • TypeScript

  • Java

  • curl

  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector ANN searches can’t return more than 1000 documents per search operation.

    • When using an ascending or descending sort criterion, the Data API returns a smaller number of documents (20) and then stops. The returned documents are the top results across the whole collection based on the requested criterion.

      These provisions can also apply when running subsequent commands on cursors, such as .distinct().

  • When you don’t specify sorting criteria (by vector or otherwise), the cursor can scroll through an arbitrary number of documents because the Data API and the client periodically exchange new chunks of documents.

    If documents are added or removed after starting a find operation, the cursor behavior depends on database internals. There is no guarantee as to whether or not the cursor will pick up such "real-time" changes in the data.

When no particular order is required:

sort={}  # (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

from astrapy.constants import SortDocuments
sort={"field": SortDocuments.ASCENDING}
sort={"field": SortDocuments.DESCENDING}

When sorting first by "field" and then by "subfield" (while modern Python versions preserve the order of dictionaries, it is suggested for clarity to employ a collections.OrderedDict in these cases):

sort={
    "field": SortDocuments.ASCENDING,
    "subfield": SortDocuments.ASCENDING,
}

When running a vector similarity (ANN) search based on a query vector, and then sorting by similarity:

sort={"$vector": [0.4, 0.15, -0.5]}

When running a vector similarity (ANN) search by generating a vector from text, and then sorting by similarity:

sort={"$vectorize": "Text to vectorize"}
Sort example
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
    print(doc["seq"])
...
# will print e.g.:
#   37
#   35
#   10
#   36
#   27
cursor1 = collection.find(
    {},
    limit=4,
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
    {"tag": "A", "$vector": [4, 5]},
    {"tag": "B", "$vector": [3, 4]},
    {"tag": "C", "$vector": [3, 2]},
    {"tag": "D", "$vector": [4, 1]},
    {"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
    document["tag"]
    for document in collection.find(
        {},
        sort={"$vector": [3, 3]},
        limit=3,
    )
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)
  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector ANN searches can’t return more than 1000 documents per search operation.

    • When using an ascending or descending sort criterion, the Data API returns a smaller number of documents (20) and then stops. The returned documents are the top results across the whole collection based on the requested criterion.

      These provisions can also apply when running subsequent commands on cursors, such as .distinct().

  • When you don’t specify sorting criteria (by vector or otherwise), the cursor can scroll through an arbitrary number of documents because the Data API and the client periodically exchange new chunks of documents.

    If documents are added or removed after starting a find operation, the cursor behavior depends on database internals. There is no guarantee as to whether or not the cursor will pick up such "real-time" changes in the data.

Sort is very weakly typed by default. See StrictSort<Schema> for a stronger typed alternative that provides full autocomplete as well.

When no particular order is required:

{ sort: {} }  // (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

{ sort: { field: +1 } }  // ascending
{ sort: { field: -1 } }  // descending

When sorting first by "field" and then by "subfield" (order matters! ES2015+ guarantees string keys in order of insertion):

{ sort: { field: 1, subfield: 1 } }

Run a vector similarity (ANN) search based on a query vector:

{ sort: { $vector: [0.4, 0.15, -0.5] } }

Generate a vector to perform a vector similarity search. The collection must be associated with an embedding service.

{ sort: { $vectorize: "Text to vectorize" } }

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'Jane', age: 25, $vector: [1.0, 1.0, 1.0, 1.0, 1.0] },
    { name: 'Dave', age: 40, $vector: [0.4, 0.5, 0.6, 0.7, 0.8] },
    { name: 'Jack', age: 40, $vector: [0.1, 0.9, 0.0, 0.5, 0.7] },
  ]);

  // Sort by age ascending, then by name descending (Jane, Jack, Dave)
  const sorted1 = await collection.find({}, { sort: { age: 1, name: -1 } }).toArray();
  console.log(sorted1.map(d => d.name));

  // Sort by vector distance (Jane, Dave, Jack)
  const sorted2 = await collection.find({}, { sort: { $vector: [1, 1, 1, 1, 1] } }).toArray();
  console.log(sorted2.map(d => d.name));
})();
  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector ANN searches can’t return more than 1000 documents per search operation.

    • When using an ascending or descending sort criterion, the Data API returns a smaller number of documents (20) and then stops. The returned documents are the top results across the whole collection based on the requested criterion.

      These provisions can also apply when running subsequent commands on cursors, such as .distinct().

  • When you don’t specify sorting criteria (by vector or otherwise), the cursor can scroll through an arbitrary number of documents because the Data API and the client periodically exchange new chunks of documents.

    If documents are added or removed after starting a find operation, the cursor behavior depends on database internals. There is no guarantee as to whether or not the cursor will pick up such "real-time" changes in the data.

The sort() operations are optional. Use them only when needed.

Be aware of the order when chaining multiple sorts:

Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
FindOptions.Builder.sort(s1, s2);

You can use sort to run a vector similarity (ANN) search:

FindOptions.Builder
 .sort(new float[] {0.4f, 0.15f, -0.5f});

For collections that use vectorize, you can run a similarity search based on a vector generated from a text query:

FindOptions.Builder
 .sort("Text to vectorize");

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sort;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;

public class WorkingWithSorts {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sort Clause for a vector
        Sorts.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;

        // Sort Clause for other fields
        Sort s1 = Sorts.ascending("field1");
        Sort s2 = Sorts.descending("field2");

        // Build the sort clause
        new FindOptions().sort(s1, s2);

        // Adding vector
        new FindOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}, s1, s2);

    }
}
  • You can’t use the $vector and $vectorize sort clauses together.

  • Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API:

    • Vector ANN searches can return no more than 1000 documents per search operation, regardless of the limit parameter.

    • If sort is ascending, descending, or unspecified, the Data API returns up to 20 documents, and then stops. The returned documents are the top results across the whole collection based on the filter criteria.

  • The search type and upper limit impact the response:

    • Vector search returns a single page of up to 1000 documents, unless you set a lower limit.

    • Searches without $vector or $vectorize return matching documents in batches of 20. Pagination occurs if there are more than 20 matching documents. For information about handling pagination, see Find documents using filtering options.

  • If documents are added or removed after starting a find operation, paging behavior depends on database internals. There is no guarantee as to whether or not pagination will pick up such "real-time" changes in the data.

When you run a find command, you can append nested JSON objects that define the search criteria (sort or filter), projection, and other options.

This example finds documents by performing a vector similarity search:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
    "projection": { "$vector": 1 },
    "options": {
      "includeSimilarity": true,
      "includeSortVector": false,
      "limit": 100
    }
  }
}' | jq

This request does the following:

  • sort compares the given vector, [0.15, 0.1, 0.1, 0.35, 0.55], against the vectors for documents in the collection, and then returns results ranked by similarity. The $vector key is a reserved property name for storing vector data.

  • projection requests that the response return the $vector for each document.

  • options.includeSimilarity requests that the response include the $similarity key with the numeric similarity score, which represents the closeness of the sort vector and the document’s vector.

  • options.includeSortVector is set to false to exclude the sortVector from the response. This is only relevant if sort includes either $vector or $vectorize and you want the response to include the sort vector. This is particularly useful with $vectorize because you don’t know the sort vector in advance.

  • options.limit specifies the maximum number of documents to return. This example limits the entire list of matching documents to 100 documents or less.

    Vector search returns a single page of up to 1000 documents, unless you set a lower limit. Other searches (without $vector or $vectorize) return matching documents in batches of 20. Pagination occurs if there are more than 20 matching documents. For information about handling pagination, see Find documents using filtering options.

The projection and options settings can make the response more focused and potentially reduce the amount of data transferred.

Response
{
  "data": {
    "documents": [
      {
        "$similarity": 1,
        "$vector": [
          0.15,
          0.1,
          0.1,
          0.35,
          0.55
        ],
        "_id": "3"
      },
      {
        "$similarity": 0.9953563,
        "$vector": [
          0.15,
          0.17,
          0.15,
          0.43,
          0.55
        ],
        "_id": "18"
      },
      {
        "$similarity": 0.9732053,
        "$vector": [
          0.21,
          0.22,
          0.33,
          0.44,
          0.53
        ],
        "_id": "21"
      }
    ],
    "nextPageState": null
  }
}

Example values for projection operations

Certain document operations, such as findOne, findMany, findOneAndUpdate, findOneAndReplace, and findOneAndDelete, support a projection option that specifies which part of a document to return. Typically, the projection specifies which fields to include or exclude.

If no projection, or an empty projection, is specified, the Data API applies a default projection. This default projection includes, at minimum, the document identifier (_id) and all regular fields, which are fields not prefixed by a dollar sign ($).

If you specify a projection, all special fields, such as _id, $vector, and $vectorize, have specific inclusion and exclusion defaults that you can override individually. However, for regular fields, the projection must either include or exclude those fields. The projection can’t include a mix of included and excluded regular fields.

If a projection includes fields that don’t exist in a returned document, then those fields are ignored for that document.

In order to optimize the response size and improve performance, DataStax recommends, when reading, to always providing an explicit projection tailored to the needs of the application.

If an application relies on the presence of $vector, or other special fields, in the returned documents, make sure the projection explicitly includes that field.

A quick, but possibly suboptimal, way to ensure the presence of special fields is to use the wildcard projection { "*": true }.

Projection syntax

A projection is expressed as a mapping of field names to boolean values.

Use true mapping to include only the specified fields. For example, the following true mapping returns the document ID, field1, and field2:

{ "_id": true, "field1": true, "field2": true }

Alternatively, use a false mapping to exclude the specified fields. All other non-excluded fields are returned.

{ "field1": false, "field2": false }

The values in a projection map can be objects, booleans, decimals, or integers, but the Data API ultimately evaluates all of these as booleans.

For example, the following projection evaluates to true (include) for all four fields:

{ "field1": true, "field2": 1, "field3": 90.0, "field4": { "keep": "yes!" } }

Whereas this project evaluates to false (exclude) for all four fields:

{ "field1": false, "field2": 0, "field3": 0.0, "field4": {} }

Passing null-like types (such as {}, null or 0) for the whole projection mapping is equivalent to omitting projection.

Projecting regular and special fields

For regular fields, a projection can’t mix include and exclude projections. It can contain only true or only false values for regular fields. For example, {"field1": true, "field2": false} is an invalid projection that results in an API error.

However, the special fields _id, $vector, and $vectorize have individual default inclusion and exclusion rules, regardless of the projection mapping. Unlike regular fields, you can set the projection values for special fields independently of regular fields:

  • The _id field is included by default. You can opt to exclude it in a true mapping, such as { "_id": false, "field1": true }.

  • The $vector and $vectorize fields are excluded by default. You can opt to include these in a false mapping, such as { "field1": false, "$vector": true }.

  • The $similarity key isn’t a document field, and you can’t use this key in a projection. The $similarity value is the result of a vector ANN search operation with $vector or $vectorize. Use the includeSimilarity parameter to control the presence of $similarity in the response.

Therefore, the following are all valid projections for regular and special fields:

{ "_id": true, "field1": true, "field2": true }
{ "_id": false, "field1": true, "field2": true }
{ "_id": false, "field1": false, "field2": false }
{ "_id": true, "field1": false, "field2": false }
{ "_id": true, "field1": true, "field2": true, "$vector": true }
{ "_id": true, "field1": true, "field2": true, "$vector": false }
{ "_id": false, "field1": true, "field2": true, "$vector": true }
{ "_id": false, "field1": true, "field2": true, "$vector": false }
{ "_id": false, "field1": false, "field2": false, "$vector": true }
{ "_id": false, "field1": false, "field2": false, "$vector": false }
{ "_id": true, "field1": false, "field2": false, "$vector": true }
{ "_id": true, "field1": false, "field2": false, "$vector": false }

The wildcard projection "*" represents the whole of the document. If you use this projection, it must be the only key in the projection.

If set to true ({ "*": true }), all fields are returned.

If set to false ({ "*": false }), no fields are returned, and each document is empty ({}).

Projecting arrays and nested objects

For array fields, you can use a $slice to specify which elements of the array to return. Use one of the following formats:

// Return the first two elements
{ "arr": { "$slice": 2 } }

// Return the last two elements
{ "arr": { "$slice": -2 } }

// Skip 4 elements (from 0th index), return the next 2
{ "arr": { "$slice": [4, 2] } }

// Skip backward 4 elements (from the end), return next 2 elements (forward)
{ "arr": { "$slice": [-4, 2] } }

If a projection refers to a nested field, the keys in the subdocument are includes or excluded as requested. If you exclude all keys of an existing subdocument, then the document is returned with the subdocument present and an empty nested object.

Examples of nested document projections

Given the following document:

{
  "_id": "z",
  "a": {
    "a1": 10,
    "a2": 20
  }
}

The results of various projections are as follows:

Projection Result

{ "a": true }

{ "_id": "z", "a": { "a1": 10, "a2": 20 } }

{ "a.a1": false}

{ "_id": "z", "a": { "a2": 20 } }

{ "a.a1": true}

{ "_id": "z", "a": { "a1": 10 } }

{ "a.a1": false, "a.a2": false }

{ "_id": "z", "a": {} }

{ "*": false }

{}

Referencing overlapping paths or subpaths in a projection can create conflicting clauses and return an API error. For example, this projection is invalid:

// Invalid:
{ "a.a1": true, "a": true }

Projection examples by language

  • Python

  • TypeScript

  • Java

  • curl

For the Python client, the projection can be any of the following:

  • A dictionary (Dict[str, Any]) to include specific fields in the response, like {field_name: True}.

  • A dictionary (Dict[str, Any]) to exclude specific fields from the response, like {field_name: False}.

  • A list or other iterable over key names that are implied to be included in the projection.

For information about default projections and handling for special fields, see the preceding explanation of projection operations.

The following two projections are equivalent:

document = collection.find_one(
   {"_id": 101},
   projection={"name": True, "city": True},
)

document = collection.find_one(
   {"_id": 101},
   projection=["name", "city"],
)

The Typescript client takes in an untyped Plain Old JavaScript Object (POJO) for the projection parameter. The client also offers a StrictProjection<Schema> type that provides full autocomplete and type checking for your document schema.

When specifying a projection, make sure that you handle the return type carefully. Consider type-casting.

import { StrictProjection } from '@datastax/astra-db-ts';

const doc = await collection.findOne({}, {
  projection: {
    'name': true,
    'address.city': true,
  },
});

interface MySchema {
  name: string,
  address: {
    city: string,
    state: string,
  },
}

const doc = await collection.findOne({}, {
  projection: {
    'name': 1,
    'address.city': 1,
    // @ts-expect-error - `'address.car'` does not exist in type `StrictProjection<MySchema>`
    'address.car': 0,
    // @ts-expect-error - Type `{ $slice: number }` is not assignable to type `boolean | 0 | 1 | undefined`
    'address.state': { $slice: 3 }
  } satisfies StrictProjection<MySchema>,
});

For information about default projections and handling for special fields, see the preceding explanation of projection operations.

To support the projection mechanism, the Java client has different Options classes that provide the projection method in the helpers. This method takes an array of Projection classes with the field name and a boolean flag indicating inclusion or exclusion. For information about default projections and handling for special fields, see the preceding explanation of projection operations.

Projection p1 = new Projection("field1", true);
Projection p2 = new Projection("field2", true);
FindOptions options1 = FindOptions.Builder.projection(p1, p2);

To simplify this syntax, you can use the Projections syntactic sugar:

FindOptions options2 = FindOptions.Builder
  .projection(Projections.include("field1", "field2"));

FindOptions options3 = FindOptions.Builder
  .projection(Projections.exclude("field1", "field2"));

The Projection class also provides a method to support $slice for array fields:

// {"arr": {"$slice": 2}}
Projection sliceOnlyStart = Projections.slice("arr", 2, null);

// {"arr": {"$slice": [-4, 2]}}
Projection sliceOnlyRange =Projections.slice("arr", -4, 2);

// An you can use then freely in the different builders
FindOptions options4 = FindOptions.Builder
  .projection(sliceOnlyStart);

In a curl request, include projection as a find parameter:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "find": {
    "sort": { "$vector": [0.15, 0.1, 0.1, 0.35, 0.55] },
    "projection": { "$vector": true, "name": true, "city": true }
    "options": {
      "includeSimilarity": true,
      "includeSortVector": false,
      "limit": 100
    }
  }
}' | jq

For information about default projections and handling for special fields, see the preceding explanation of projection operations.

Find and update a document

Find one document that matches a filter condition, apply changes to it, and then return the document itself.

This is effectively an expansion of the findOne command with additional support for update operators and related options.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find a document matching a filter condition, and then edit a property in that document:

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
)

Locate and update a document, returning the document itself, and create a new one if no match is found:

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
    upsert=True,
)

Locate and update the document most similar to a query vector from either $vector or $vectorize:

collection.find_one_and_update(
    {},
    {"$set": {"best_match": True}},
    sort={"$vector": [0.1, 0.2, 0.3]},
)

Returns:

Dict[str, Any] - The document that was found, either before or after the update (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 999, 'Marco': 'Polo'}

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. For example: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. For a list of available operators, see Data API operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

See Find a document and Example values for projection operations.

sort

Optional[Dict[str, Any]]

See Find a document and Example values for sort operations.

upsert

bool = False

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts a new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

return_document

str

A flag controlling what document is returned. If set to ReturnDocument.BEFORE or the string "before", then the original document is returned. If set to ReturnDocument.AFTER or the string "after", then the updated document is returned. The default is "before".

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
)
# prints: {'_id': 'a80106f2-...', 'Marco': 'Polo'}
collection.find_one_and_update(
    {"title": "Mr."},
    {"$inc": {"rank": 3}},
    projection={"title": True, "rank": True},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'a80106f2-...', 'title': 'Mr.', 'rank': 3}
collection.find_one_and_update(
    {"name": "Johnny"},
    {"$set": {"rank": 0}},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_update(
    {"name": "Johnny"},
    {"$set": {"rank": 0}},
    upsert=True,
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'cb4ef2ab-...', 'name': 'Johnny', 'rank': 0}

For more information, see the API reference.

Find a document matching a filter condition, and then edit a property in that document:

const docBefore = await collection.findOneAndUpdate(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
);

Locate and update a document, returning the updated document, and create a new one if no match is found:

const docAfter = await collection.findOneAndUpdate(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
  { upsert: true, returnDocument: 'after' },
);

Locate and update the document most similar to a query vector from either $vector or $vectorize:

const docBefore = await collection.findOneAndUpdate(
  {},
  { $set: { bestMatch: true } },
  { sort: { $vector: [0.1, 0.2, 0.3] } },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to update. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

update

UpdateFilter<Schema>

The update to apply to the selected document. For a list of available operators, see Data API operators.

options

FindOneAndUpdateOptions

The options for this operation.

Name Type Summary

returnDocument

'before' | 'after'

Specifies whether to return the original ('before') or updated ('after') document.

upsert?

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts a new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

projection?

Projection

See Find a document and Example values for projection operations.

sort?

Sort

See Find a document and Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

includeResultMetadata?

boolean

When true, returns ok: 1, in addition to the document, if the command executed successfully.

Returns:

Promise<WithId<Schema> | null> - The document before/after the update, depending on the type of returnDocument, or null if no matches are found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert a document
  await collection.insertOne({ 'Marco': 'Polo' });

  // Prints 'Mr.'
  const updated1 = await collection.findOneAndUpdate(
    { 'Marco': 'Polo' },
    { $set: { title: 'Mr.' } },
    { returnDocument: 'after' },
  );
  console.log(updated1?.title);

  // Prints { _id: ..., title: 'Mr.', rank: 3 }
  const updated2 = await collection.findOneAndUpdate(
    { title: 'Mr.' },
    { $inc: { rank: 3 } },
    { projection: { title: 1, rank: 1 }, returnDocument: 'after' },
  );
  console.log(updated2);

  // Prints null
  const updated3 = await collection.findOneAndUpdate(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
    { returnDocument: 'after' },
  );
  console.log(updated3);

  // Prints { _id: ..., name: 'Johnny', rank: 0 }
  const updated4 = await collection.findOneAndUpdate(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
    { upsert: true, returnDocument: 'after' },
  );
  console.log(updated4);
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
Optional<T> findOneAndUpdate(Filter filter, Update update);

// Synchronous
CompletableFuture<Optional<T>> findOneAndUpdateAsync(Filter filter, Update update);

Returns:

Optional<T> - Return the working document matching the filter or Optional.empty() if no document is found.

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For examples and options, including projection and sort, see Find documents using filtering options.

update

Update

The update prescription to apply to the document. For a list of available operators, see Data API operators.

To build the different parts of the requests, a set of helper classes are provided These are suffixed by an s, such as Filters for Filter and Updates for Update.

Update update = Updates
 .set("field1", "value1")
 .inc("field2", 1d)
 .unset("field3");

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.Updates;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndUpdate {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Building the update
        Update update = Updates.set("field1", "value1")
                .inc("field2", 1d)
                .unset("field3");

        Optional<Document> doc = collection.findOneAndUpdate(filter, update);

    }
}

Find a document matching a filter condition, and then edit a property in that document.

This example uses the $currentDate update operator to set a property to the current date:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOneAndUpdate": {
    "filter": { "_id": "doc1" },
    "update": {
      "$currentDate": {
        "createdAt": true
        }
      }
    }
}' | jq
More update operator examples

Unset a property:

"findOneAndUpdate": {
  "filter": {
    "_id": "12"
  },
  "update": { "$unset": { "amount": "" } },
  "options": { "returnDocument": "after" }
}

Increment a value:

"findOneAndUpdate": {
  "filter": {
    "_id": "12"
  },
  "update": { "$inc": { "counter": 1 } },
  "options": { "returnDocument": "after" }
}

Add an element to a specific position in an array:

"findOneAndUpdate": {
  "filter": {
    "_id": "12"
  },
  "update": { "$push": { "tags": { "$each": [ "new1", "new2" ], "$position": 0 } } },
  "options": { "returnDocument": "after" }
}

Rename a field:

"findOneAndUpdate": {
  "filter": {
    "_id": "12"
  },
  "update": { "$rename": { "old_field": "new_field", "other_old_field": "other_new_field" } },
  "options": { "returnDocument": "after" }
}

Locate and update a document, returning the updated document, and create a new one if no match is found:

"findOneAndUpdate": {
  "filter": {
    "_id": "14"
  },
  "update": { "$set": { "min_col": 2, "max_col": 99 } },
  "options": { "returnDocument": "after", "upsert": true }
}

If an upsert occurs, use the $setOnInsert operator to set additional document properties only for the new document:

"findOneAndUpdate": {
  "filter": {
    "_id": "27"
  },
  "update": {
    "$currentDate": {
      "field": true
    },
    "$setOnInsert": {
      "customer.name": "James B."
    }
  },
  "options": {
    "returnDocument": "after",
    "upsert": true
  }
}

Locate and update the document most similar to a query vector from either $vector or $vectorize:

"findOneAndUpdate": {
  "sort": {
    "$vector": [0.1, 0.2, 0.3]
  },
  "update": {
    "$set": {
      "status": "active"
    }
  },
  "options": {
    "returnDocument": "after"
  }
}

Parameters:

Name Type Summary

findOneAndUpdate

command

Data API command to find one document based on a query and then run an update operation on the document’s properties.

sort, filter

object

Search criteria to find the document to update. For a list of available operators, see Data API operators. For examples and parameters, see Find a document and Example values for sort operations.

update

object

The update prescription to apply to the document using Data API operators. For example: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}.

projection

object

See Find a document and Example values for projection operations.

options.upsert

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts a new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

options.returnDocument

string

A flag controlling what document is returned. If set to "before", then the original document is returned. If set to "after", then the updated document is returned. The default is "before".

Returns:

A successful response contains a data object and a status object:

  • The data object contains a single document object representing either the original or modified document, based on the returnDocument parameter.

    "data": {
      "document": {
        "_id": "5",
        "purchase_type": "Online",
        "$vector": [0.25, 0.045, 0.38, 0.31, 0.67],
        "customer": "David C.",
        "amount": 94990
      }
    }
  • The status object contains the matchedCount and modifiedCount fields, which indicate the number of documents that matched the filter and the number of documents that were modified, respectively. If the update operation didn’t change any parameters in the matching document, then the modifiedCount is 0.

    "status": {
      "matchedCount": 1,
      "modifiedCount": 0
    }

Update a document

updateOne is similar to findOneAndUpdate, except that the response includes only the result of the operation. The response doesn’t include a document object, and the request doesn’t support response-related parameters, such as projection or returnDocument.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find a document matching a filter condition, and then edit a property in that document:

update_result = collection.update_one(
    {"_id": 456},
    {"$set": {"name": "John Smith"}},
)

Locate and update a document or insert a new one if no match is found:

update_result = collection.update_one(
    {"_id": 456},
    {"$set": {"name": "John Smith"}},
    upsert=True,
)

Locate and update the document most similar to a query vector from either $vector or $vectorize:

update_result = collection.update_one(
    {},
    {"$set": {"best_match": True}},
    sort={"$vector": [0.1, 0.2, 0.3]},
)

Returns:

UpdateResult - An object representing the response from the database after the update operation. It includes information about the operation.

Example response
UpdateResult(update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1}, raw_results=...)

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find a document.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. For example: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. For examples and a list of available operators, see Find and update a document and Data API operators.

sort

Optional[Dict[str, Any]]

See Find a document and Example values for sort operations.

upsert

bool = False

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts a new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})

collection.update_one({"Marco": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1}, raw_results=...)
collection.update_one({"Mirko": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0}, raw_results=...)
collection.update_one(
    {"Mirko": {"$exists": True}},
    {"$inc": {"rank": 3}},
    upsert=True,
)
# prints: UpdateResult(update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '2a45ff60-...'}, raw_results=...)

For more information, see the API reference.

Find a document matching a filter condition, and then edit a property in that document:

const result = await collection.updateOne(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
);

Locate and update a document or insert a new one if no match is found:

const result = await collection.updateOne(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
  { upsert: true },
);

Locate and update the document most similar to a query vector from either $vector or $vectorize:

const result = await collection.updateOne(
  {},
  { $set: { bestMatch: true } },
  { sort: { $vector: [0.1, 0.2, 0.3] } },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to update. For a list of available operators, see Data API operators. For additional examples, see Find a document.

update

UpdateFilter<Schema>

The update to apply to the selected document. For examples and a list of available operators, see Find and update a document and Data API operators.

options?

UpdateOneOptions

The options for this operation.

Options (UpdateOneOptions):

Name Type Summary

upsert?

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts a new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

sort?

Sort

See Find a document and Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

Returns:

Promise<UpdateOneResult<Schema>> - The result of the update operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert a document
  await collection.insertOne({ 'Marco': 'Polo' });

  // Prints 1
  const updated1 = await collection.updateOne(
    { 'Marco': 'Polo' },
    { $set: { title: 'Mr.' } },
  );
  console.log(updated1?.modifiedCount);

  // Prints 0 0
  const updated2 = await collection.updateOne(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
  );
  console.log(updated2.matchedCount, updated2?.upsertedCount);

  // Prints 0 1
  const updated3 = await collection.updateOne(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
    { upsert: true },
  );
  console.log(updated3.matchedCount, updated3?.upsertedCount);
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
UpdateResult updateOne(Filter filter, Update update);

// Asynchronous
CompletableFuture<UpdateResult<T>> updateOneAsync(Filter filter, Update update);

Returns:

UpdateResults<T> - Result of the operation with the number of documents matched (matchedCount) and updated (modifiedCount).

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

update

Update

The update prescription to apply to the selected document. For examples and a list of available operators, see Find and update a document and Data API operators.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.UpdateResult;
import com.datastax.astra.client.model.Updates;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class UpdateOne {
    // Given an existing collection
    Collection<Document> collection = new DataAPIClient("TOKEN")
            .getDatabase("API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

    // Building a filter
    Filter filter = Filters.and(
            Filters.gt("field2", 10),
            lt("field3", 20),
            Filters.eq("field4", "value"));

    // Building the update
    Update update = Updates.set("field1", "value1")
            .inc("field2", 1d)
            .unset("field3");

    UpdateResult result = collection.updateOne(filter, update);
}

Find a document matching a filter condition, and then edit a property in that document:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "updateOne": {
    "filter": {
      "_id": "14"
    },
    "update": { "$set": { "name": "Xiala" } }
  }
}' | jq

Locate and update a document or insert a new one if no match is found:

"updateOne": {
  "filter": {
    "_id": "16"
  },
  "update": { "$set": { "name": "Serapio" } },
  "options": { "upsert": true }
}

If an upsert occurs, use the $setOnInsert operator to assign additional properties to the new document:

"findOneAndUpdate": {
  "filter": {
    "_id": "16"
  },
  "update": {
    "$currentDate": {
      "field": true
    },
    "$setOnInsert": {
      "customer.name": "James B."
    }
  },
  "options": {
    "upsert": true
  }
}

Locate and update the document most similar to a query vector from either $vector or $vectorize:

"findOneAndUpdate": {
  "sort": {
    "$vector": [0.1, 0.2, 0.3]
  },
  "update": {
    "$set": {
      "status": "active"
    }
  }
}

Parameters:

Name Type Summary

updateOne

command

The Data API command to updates a single document matching a query.

sort, filter

object

Used to select the document to be updated. For a list of available operators, see Data API operators. For examples and parameters, see Find a document and Example values for sort operations.

update

object

The update prescription to apply to the document. For example: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. For examples and a list of available operators, see Find and update a document and Data API operators.

options.upsert

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts a new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

Returns:

The updateOne command returns only the outcome of the operation, including the number of documents that matched the filter (matchedCount) and the number of documents that were modified (modifiedCount):

{
  "status": {
    "matchedCount": 1,
    "modifiedCount": 1
  }
}
Example

The following example uses the $set update operator to set the value of a property (which uses the dot notation customer.name) to a new value. In this example, zodiac can be a nested document or a property within the main document, and animal is a property within zodiac. The operation intends to update the nested animal field to lion.

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "updateOne": {
    "filter": {
      "_id": "18"
    },
    "update": { "$set": { "zodiac.animal": "lion" } }
  }
}' | jq

Update multiple documents

Use updateMany to find and update multiple documents at once.

This command is a combination of find and updateOne. However, updateMany doesn’t support sort operations.

Like updateOne, the updateMany response includes only the result of the operation. The response doesn’t include a document object, and the request doesn’t support response-related parameters, such as projection or returnDocument.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find documents matching a filter condition, and then edit a property in those documents:

results = collection.update_many(
    {"name": {"$exists": False}},
    {"$set": {"name": "unknown"}},
)

Locate and update multiple documents or insert a new one if no match is found:

results = collection.update_many(
    {"name": {"$exists": False}},
    {"$set": {"name": "unknown"}},
    upsert=True,
)

For more examples, see Update a document.

Returns:

UpdateResult - An object representing the response from the database after the update operation. It includes information about the operation.

Example response
UpdateResult(update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2}, raw_results=...)

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

update

Dict[str, Any]

The update prescription to apply to the documents, expressed as a dictionary as per Data API syntax. For example: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. For examples and a list of available operators, see Find and update a document and Data API operators.

upsert

bool

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts one new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. You may need to increase the timeout duration when updating a large number of documents because the update requires multiple sequential HTTP requests.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"c": "red"}, {"c": "green"}, {"c": "blue"}])

collection.update_many({"c": {"$ne": "green"}}, {"$set": {"nongreen": True}})
# prints: UpdateResult(update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2}, raw_results=...)
collection.update_many({"c": "orange"}, {"$set": {"is_also_fruit": True}})
# prints: UpdateResult(update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0}, raw_results=...)
collection.update_many(
    {"c": "orange"},
    {"$set": {"is_also_fruit": True}},
    upsert=True,
)
# prints: UpdateResult(update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '46643050-...'}, raw_results=...)

For more information, see the API reference.

Find documents matching a filter condition, and then edit a property in those documents:

const result = await collection.updateMany(
  { name: { $exists: false } },
  { $set: { title: 'unknown' } },
);

Locate and update multiple documents in a collection or insert a new one if no matches are found:

const result = await collection.updateMany(
  { name: { $exists: false } },
  { $set: { title: 'unknown' } },
  { upsert: true },
);

For more examples, see Update a document.

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to update. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

update

UpdateFilter<Schema>

The update to apply to the selected documents. For examples and a list of available operators, see Find and update a document and Data API operators.

options?

UpdateManyOptions

The options for this operation.

Options (UpdateManyOptions):

Name Type Summary

upsert?

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts one new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

Returns:

Promise<UpdateManyResult<Schema>> - The result of the update operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([{ c: 'red' }, { c: 'green' }, { c: 'blue' }]);

  // { modifiedCount: 2, matchedCount: 2, upsertedCount: 0 }
  await collection.updateMany({ c: { $ne: 'green' } }, { $set: { nongreen: true } });

  // { modifiedCount: 0, matchedCount: 0, upsertedCount: 0 }
  await collection.updateMany({ c: 'orange' }, { $set: { is_also_fruit: true } });

  // { modifiedCount: 0, matchedCount: 0, upsertedCount: 1, upsertedId: '...' }
  await collection.updateMany({ c: 'orange' }, { $set: { is_also_fruit: true } }, { upsert: true });
})();

Operations on documents are performed at the Collection level. For more information, see the API reference.

Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture.

// Synchronous
UpdateResult updateMany(Filter filter, Update update);
UpdateResult updateMany(Filter filter, Update update, UpdateManyOptions);

// Synchronous
CompletableFuture<UpdateResult<T>> updateManyAsync(Filter filter, Update update);
CompletableFuture<UpdateResult<T>> updateManyAsync(Filter filter, Update update, UpdateManyOptions);

Returns:

UpdateResults<T> - Result of the operation with the number of documents matched (matchedCount) and updated (modifiedCount)

Parameters:

Name Type Summary

filter

Filter

Filters to select documents. This object can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

update

Update

The update prescription ot apply to the documents. For examples and a list of available operators, see Find and update a document and Data API operators.

options

UpdateManyOptions

Contains the options for updateMany(), including the upsert flag that controls the behavior if there are no matches. If true and there are no matches, then the operation inserts one new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.UpdateManyOptions;
import com.datastax.astra.client.model.UpdateResult;
import com.datastax.astra.client.model.Updates;

import static com.datastax.astra.client.model.Filters.lt;

public class UpdateMany {

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        Update update = Updates.set("field1", "value1")
                .inc("field2", 1d)
                .unset("field3");

        UpdateManyOptions options =
                new UpdateManyOptions().upsert(true);

        UpdateResult result = collection.updateMany(filter, update, options);
    }
}

Find documents matching a filter condition, and then edit a property in those documents:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "updateMany": {
    "filter": { "status": "active" },
    "update": { "$set": { "status": "inactive" } }
  }
}' | jq

For more examples, see Update a document.

Parameters:

Name Type Summary

updateMany

command

The Data API command to update multiple documents in a collection in a database.

filter

object

Defines the criteria to selecting documents to update. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

update

object

The update prescription to apply to the documents. For example: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. For additional examples and a list of available operators, see Find and update a document and Data API operators.

options.upsert

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts one new document by applying the update to an empty document. If false and there are no matches, then the operation silently does nothing.

Returns:

The updateMany command returns the outcome of the operation, including the number of documents that matched the filter (matchedCount) and the number of documents that were modified (modifiedCount).

Pagination occurs if there are more than 20 matching documents. In this case, the Count values are capped at 20, and the moreData flag is set to true.

{
  "status": {
    "matchedCount": 20,
    "modifiedCount": 20,
    "moreData": true,
    "nextPageState": "NEXT_PAGE_STATE_ID"
  }
}

In the event of pagination, you must issue a subsequent request with a pageState ID to update the next page of documents that matched the filter. As long as there is a subsequent page with matching documents to update, the transaction returns a nextPageState ID, which you use as the pageState for the subsequent request.

Each paginated request is exactly the same as the original request, except for the addition of the pageState in the options object:

{
  "updateMany": {
    "filter": { "active_user": true },
    "update": { "$set": { "new_data": "new_data_value" } },
    "options": { "pageState": "*NEXT_PAGE_STATE_ID" }
  }
}

Continue issuing requests with the subsequent pageState ID until all matching documents have been updated.

Find distinct values across documents

Get a list of the distinct values of a certain key in a collection.

distinct is a client-side operation, which effectively browses all required documents using the logic of the find method, and then collects the unique values found for key. There can be performance, latency, and billing implications if there are many matching documents.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

collection.distinct("category")

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

collection.distinct(
    "food.allergies",
    filter={"registered_for_dinner": True},
)

Returns:

List[Any] - A list of the distinct values encountered. Documents that lack the requested key are ignored.

Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]

Parameters:

Name Type Summary

key

str

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: "field", "field.subfield", "field.3", and "field.3.subfield". If lists are encountered and no numeric index is specified, all items in the list are visited.

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default.

For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many(
    [
        {"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
        {"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
    ]
)

collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]

For more information, see the API reference.

const unique = await collection.distinct('category');

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

const unique = await collection.distinct(
  'food.allergies',
  { registeredForDinner: true },
);

Parameters:

Name Type Summary

key

string

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: 'field', 'field.subfield', 'field.3', and 'field.3.subfield'. If lists are encountered and no numeric index is specified, all items in the list are visited.

filter?

Filter<Schema>

A filter to select the documents to use. If not provided, all documents will be used.

Returns:

Promise<Flatten<(SomeDoc & ToDotNotation<FoundDoc<Schema>>)[Key]>[]> - A promise which resolves to the unique distinct values.

The return type is mostly accurate, but with complex keys, it may be required to manually cast the return type to the expected type.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertOne({ name: 'Marco', food: ['apple', 'orange'], city: 'Helsinki' });
  await collection.insertOne({ name: 'Emma', food: { likes_fruit: true, allergies: [] } });

  // ['Marco', 'Emma']
  await collection.distinct('name')

  // ['Helsinki']
  await collection.distinct('city')

  // ['apple', 'orange', { likes_fruit: true, allergies: [] }]
  await collection.distinct('food')

  // ['orange']
  await collection.distinct('food.1')

  // []
  await collection.distinct('food.allergies')

  // [true]
  await collection.distinct('food.likes_fruit')
})();

Gets the distinct values of the specified field name.

// Synchronous
DistinctIterable<T,F> distinct(String fieldName, Filter filter, Class<F> resultClass);
DistinctIterable<T,F> distinct(String fieldName, Class<F> resultClass);

// Asynchronous
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Filter filter, Class<F> resultClass);
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Class<F> resultClass);

Returns:

DistinctIterable<F> - List of distinct values of the specified field name.

Parameters:

Name Type Summary

fieldName

String

The name of the field on which project the value.

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

resultClass

Class

The type of the field we are working on

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DistinctIterable;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Distinct {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Execute a find operation
        DistinctIterable<Document, String> result = collection
                .distinct("field", String.class);
        DistinctIterable<Document, String> result2 = collection
                .distinct("field", filter, String.class);

        // Iterate over the result
        for (String fieldValue : result) {
            System.out.println(fieldValue);
        }
    }
}

This operation has no literal equivalent in HTTP. Instead, you can use Find documents using filtering options, and then use jq or another utility to extract _id or other desired values from the response.

Count documents in a collection

Get the count of documents in a collection. Count all documents or apply filtering to count a subset of documents.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Count all documents in a collection up to the specified limit:

collection.count_documents({}, upper_bound=500)

Get the count of the documents in a collection matching a filter condition up to the specified limit:

collection.count_documents({"seq":{"$gt": 15}}, upper_bound=50)

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

Example response
320

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, an exception is raised. If you need to count large numbers of documents, consider using estimatedDocumentCount instead.

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

upper_bound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upper_bound.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": i} for i in range(20)])

collection.count_documents({}, upper_bound=100)
# prints: 20
collection.count_documents({"seq":{"$gt": 15}}, upper_bound=100)
# prints: 4
collection.count_documents({}, upper_bound=10)
# Raises: astrapy.exceptions.TooManyDocumentsToCountException

For more information, see the API reference.

const numDocs = await collection.countDocuments({}, 500);

Get the count of the documents in a collection matching a filter.

const numDocs = await collection.countDocuments({ seq: { $gt: 15 } }, 50);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to count. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

upperBound

number

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upperBound.

options?

WithTimeout

The options (the timeout) for this operation.

Returns:

Promise<number> - A promise that resolves to the exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound, in which case an exception is raised.

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, an exception is raised. If you need to count large numbers of documents, consider using estimatedDocumentCount instead.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany(Array.from({ length: 20 }, (_, i) => ({ seq: i })));

  // Prints 20
  await collection.countDocuments({}, 100);

  // Prints 4
  await collection.countDocuments({ seq: { $gt: 15 } }, 100);

  // Throws TooManyDocumentsToCountError
  await collection.countDocuments({}, 10);
})();

Count all documents or get the count of the documents in a collection matching a condition:

// Synchronous
int countDocuments(int upperBound)
throws TooManyDocumentsToCountException;

int countDocuments(Filter filter, int upperBound)
throws TooManyDocumentsToCountException;

Parameters:

Name Type Summary

filter (optional)

Filter

A filter to select documents to count. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

upperBound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upperBound.

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

The checked exception TooManyDocumentsToCountException is raised when the actual number of documents exceeds the upper bound set by the caller or the API. This exception indicates that there are more matching documents beyond the count threshold. Consider modifying your conditions to count fewer documents at once. If you need to count large numbers of documents, consider using estimatedDocumentCount instead.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import static com.datastax.astra.client.model.Filters.lt;

public class CountDocuments {
    public static void main(String[] args)  {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        try {
            // Count with no filter
            collection.countDocuments(500);

            // Count with a filter
            collection.countDocuments(filter, 500);

        } catch(TooManyDocumentsToCountException tmde) {
            // Explicit error if the count is above the upper limit or above the 1000 limit
        }

    }


}

Use the Data API countDocuments command to obtain the exact count of documents in a collection:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{ "countDocuments": {} }' | jq

You can provide an optional filter condition to count only documents matching the filter:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "countDocuments": {
    "filter": {
      "year": { "$gt": 2000 }
    }
  }
}' | jq

Parameters:

Name Type Summary

countDocuments

command

A command to return an exact count of documents in a collection.

filter

object

An optional filter to select the documents to count. If not provided, all documents are counted. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

Returns:

A successful response returns count. This is the exact count of the documents counted as requested, unless it exceeds the API-set upper bound, in which case the overflow is reported in the response by the moreData flag.

Response within upper bound
{
  "status": {
    "count": 105
  }
}
Response exceeding upper bound
{
  "status": {
    "moreData": true,
    "count": 1000
  }
}

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, the response includes "moreData": true to indicate that there are more matching documents beyond the count threshold.

If you need to count large numbers of documents, consider using estimatedDocumentCount instead.

Estimate document count in a collection

Get an approximate document count for an entire collection. Filtering isn’t supported. For the clients, you can set standard options, such as a timeout in milliseconds. There are no other options available.

In the estimatedDocumentCount command’s response, the document count is based on current system statistics at the time the request is received by the database server. Due to potential in-progress updates (document additions and deletions), the actual number of documents in the collection can be lower or higher in the database.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Get an approximate document count for a collection:

collection.estimated_document_count()

Returns:

int - A server-side estimate of the total number of documents in the collection.

Example response
37500

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.collection

collection.estimated_document_count()

For more information, see the API reference.

Get an approximate document count for a collection:

const estNumDocs = await collection.estimatedDocumentCount();

Returns:

Promise<number> - A promise that resolves to a server-side estimate of the total number of documents in the collection.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  console.log(await collection.estimatedDocumentCount());
})();

For more information, see the API reference.

Get an approximate document count for a collection:

long estimatedDocumentCount();
long estimatedDocumentCount(EstimatedCountDocumentsOptions options);

Returns:

long - A server-side estimate of the total number of documents in the collection. This estimate is built from the SSTable files.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.EstimatedCountDocumentsOptions;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.internal.command.LoggingCommandObserver;

import static com.datastax.astra.client.model.Filters.lt;

public class EstimateCountDocuments {

    public static void main(String[] args)  {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Count with no filter
        long estimatedCount = collection.estimatedDocumentCount();

        // Count with options (adding a logger)
        EstimatedCountDocumentsOptions options = new EstimatedCountDocumentsOptions()
                    .registerObserver("logger", new LoggingCommandObserver(DataAPIClient.class));
        long estimateCount2 = collection.estimatedDocumentCount(options);
    }


}

Use the estimatedDocumentCount command to get an approximate document count for a collection:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{ "estimatedDocumentCount": {} }' | jq

Returns:

A successful request returns count, which is an estimate of the total number of documents in the collection:

{ "status": { "count": 37500 } }

Find and replace a document

Find one document that matches a filter condition, replace it with a new document, and then return the document itself. This command is similar to Find and update a document.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find a document matching a filter condition, and then replace the matching document with the given replacement:

collection.find_one_and_replace(
    {"_id": "rule1"}, # filter
    {"text": "some animals are more equal!"}, # replacement
)

Locate and replace a document, returning the document itself, and create a new one if no match is found:

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
    upsert=True,
)

Locate and replace the document most similar to a query vector from either $vector or $vectorize. In this example, the filter object is empty, and only the sort object is used to locate the document to replace. Including the empty filter object ensures that the replacement object is read correctly.

collection.find_one_and_replace(
    {}, # empty filter
    {"name": "Zoo", "desc": "the new best match"}, # replacement
    sort={"$vector": [0.1, 0.2, 0.3]}, # sort object, to locate the document to replace
)

Returns:

Dict[str, Any] - Either the original or the replaced document. The exact fields returned depend on the projection parameter. If you request the original document, and there are no matches, then None is returned.

Example response
{'_id': 'rule1', 'text': 'all animals are equal'}

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

replacement

Dict[str, Any]

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

See Find a document and Example values for projection operations.

sort

Optional[Dict[str, Any]]

See Find a document and Example values for sort operations.

upsert

bool = False

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts the replacement as a new document. If false and there are no matches, then the operation silently does nothing.

return_document

str

A flag controlling what document is returned. If set to ReturnDocument.BEFORE or the string "before", then the original document is returned. If set to ReturnDocument.AFTER or the string "after", then the replacement document is returned. The default is "before".

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
import astrapy

collection.insert_one({"_id": "rule1", "text": "all animals are equal"})

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
)
# prints: {'_id': 'rule1', 'text': 'all animals are equal'}
collection.find_one_and_replace(
    {"text": "some animals are more equal!"},
    {"text": "and the pigs are the rulers"},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'rule1', 'text': 'and the pigs are the rulers'}
collection.find_one_and_replace(
    {"_id": "rule2"},
    {"text": "F=ma^2"},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_replace(
    {"_id": "rule2"},
    {"text": "F=ma"},
    upsert=True,
    return_document=astrapy.constants.ReturnDocument.AFTER,
    projection={"_id": False},
)
# prints: {'text': 'F=ma'}

For more information, see the API reference.

Find a document matching a filter condition, and then replace the matching document with the given replacement:

const docBefore = await collection.findOneAndReplace(
  { _id: 123 }, // filter
  { text: 'some animals are more equal!' }, // replacement
);

Locate and replace a document, returning the document itself, and creating a new one if no match is found:

const docBefore = await collection.findOneAndReplace(
  { _id: 123 },
  { text: 'some animals are more equal!' },
  { upsert: true  },
);

Locate and replace the document most similar to a query vector from either $vector or $vectorize. In this example, the filter object is empty, and only the sort object is used to locate the document to replace. Including the empty filter object ensures that the replacement object is read correctly.

const docBefore = await collection.findOneAndReplace(
  {}, // empty filter
  { name: 'Zoe', desc: 'The new best match' }, // replacement
  { sort: { $vector: [0.1, 0.2, 0.3] } }, // sort object, to locate the document to replace
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to replace. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

replacement

NoId<Schema>

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

options

FindOneAndReplaceOptions

The options for this operation.

Name Type Summary

returnDocument

'before' | 'after'

Specifies whether to return the original ('before') or replacement ('after') document.

upsert?

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts the replacement as a new document. If false and there are no matches, then the operation silently does nothing.

projection?

Projection

See Find a document and Example values for projection operations.

sort?

Sort

See Find a document and Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

includeResultMetadata?

boolean

When true, returns ok: 1, in addition to the document, if the command executed successfully.

Returns:

Promise<WithId<Schema> | null> - The document before/after the update, depending on the type of returnDocument, or null if no matches are found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertOne({ _id: 'rule1', text: 'all animals are equal' });

  // { _id: 'rule1', text: 'all animals are equal' }
  await collection.findOneAndReplace(
    { _id: 'rule1' },
    { text: 'some animals are more equal!' },
    { returnDocument: 'before' }
  );

  // { _id: 'rule1', text: 'and the pigs are the rulers' }
  await collection.findOneAndReplace(
    { text: 'some animals are more equal!' },
    { text: 'and the pigs are the rulers' },
    { returnDocument: 'after' }
  );

  // null
  await collection.findOneAndReplace(
    { _id: 'rule2' },
    { text: 'F=ma^2' },
    { returnDocument: 'after' }
  );

  // { text: 'F=ma' }
  await collection.findOneAndReplace(
    { _id: 'rule2' },
    { text: 'F=ma' },
    { upsert: true, returnDocument: 'after', projection: { _id: false } }
  );
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
Optional<T> findOneAndReplace(Filter filter, T replacement);
Optional<T> findOneAndReplace(Filter filter, T replacement, FindOneAndReplaceOptions options);

// Asynchronous
CompletableFuture<Optional<T>> findOneAndReplaceAsync(Filter filter, T replacement);
CompletableFuture<Optional<T>> findOneAndReplaceAsync(Filter filter, T replacement, FindOneAndReplaceOptions options);

Returns:

Optional<T> - Return the a document that matches the filter. Whether returnDocument is set to before or after it will return the document before or after update accordingly.

Parameters:

Name Type Summary

filter (optional)

Filter

Filter criteria to find the document to replace. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For examples and options, including projection and sort, see Find documents using filtering options.

replacement

T

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

options (optional)

FindOneAndReplaceOptions

Set the different options for the find and replace operation, including the following:

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneAndReplaceOptions;
import com.datastax.astra.client.model.Projections;
import com.datastax.astra.client.model.Sorts;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndReplace {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        FindOneAndReplaceOptions options = new FindOneAndReplaceOptions()
                .projection(Projections.include("field1"))
                .sort(Sorts.ascending("field1"))
                .upsert(true)
                .returnDocumentAfter();

        Document docForReplacement = new Document()
                .append("field1", "value1")
                .append("field2", 20)
                .append("field3", 30)
                .append("field4", "value4");

        // It will return the document before deleting it
        Optional<Document> docBeforeReplace = collection
                .findOneAndReplace(filter, docForReplacement, options);
    }
}

Example with sort and projection:

 FindOneAndReplaceOptions options = FindOneAndReplaceOptions.Builder
  .projection(Projections.include("field1"))
  .sort(Sorts.ascending("field1"))
  .upsert(true)
  .returnDocumentAfter();

findOneAndReplace doesn’t support dot notation in the replacement object.

Find a document matching a filter condition, and then replace that document with a new one:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOneAndReplace": {
    "filter": { "_id": "14" },
    "replacement": { "customer": { "name": "Ann Jones" }, "account": { "status": "inactive } }
  }
}' | jq

Locate and replace a document or insert a new one if no match is found:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOneAndReplace": {
    "filter": { "_id": "16" },
    "replacement": { "customer": { "name": "Ann Jones" }, "account": { "status": "inactive } },
    "options": { "upsert": true }
  }
}' | jq

Locate and replace the document most similar to a query vector from either $vector or $vectorize:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOneAndReplace": {
    "sort": { "$vector": [0.1, 0.2, 0.3] },
    "replacement": { "customer": { "name": "Ann Jones" }, "account": { "status": "inactive } },
    "projection": { "$vector": 1 },
    "options": { "returnDocument": "after" }
  }
}' | jq

Parameters:

Name Type Summary

findOneAndReplace

command

The Data API command to find and replace one document in a collection based on filter, sort, replacement, projection, and options.

sort, filter

object

Search criteria to find the document to replace. For a list of available operators, see Data API operators. For examples and parameters, see Find a document and Example values for sort operations.

replacement

object

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

projection

object

Select a subset of fields to include in the response for the returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

options.upsert

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts the replacement as a new document. If false and there are no matches, then the operation silently does nothing.

options.returnDocument

string

A flag controlling what document is returned. If set to "before", then the original document is returned. If set to "after", then the replacement document is returned. The default is "before".

Returns:

A successful response returns an object representing the original or replacement document, based on the returnDocument and projection options.

Replace a document

Find one document that matches a filter condition, and then replace it with a new document.

replaceOne is similar to findOneAndReplace, except that the response includes only the result of the operation. The response doesn’t include a document object, and the request doesn’t support response-related parameters, such as projection or returnDocument.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find a document matching a filter condition, and then replace the matching document with the given replacement:

replace_result = collection.replace_one(
    {"Marco": {"$exists": True}}, # filter
    {"Buda": "Pest"}, # replacement
)

Locate and replace a document or create a new one if no match is found:

replace_result = collection.replace_one(
    {"Marco": {"$exists": True}},
    {"Buda": "Pest"},
    upsert=True,
)

Locate and replace the document most similar to a query vector from either $vector or $vectorize. In this example, the filter object is empty, and only the sort object is used to locate the document to replace. Including the empty filter object ensures that the replacement object is read correctly.

collection.replace_one(
    {}, # empty filter
    {"name": "Zoo", "desc": "the new best match"}, # replacement
    sort={"$vector": [0.1, 0.2, 0.3]}, # sort object, to locate the document to replace
)

Returns:

UpdateResult - An object representing the response from the database after the replace operation. It includes information about the operation.

Example response
UpdateResult(update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1}, raw_results=...)

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

replacement

Dict[str, Any]

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

sort

Optional[Dict[str, Any]]

See Find a document and Example values for sort operations.

upsert

bool = False

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts the replacement as a new document. If false and there are no matches, then the operation silently does nothing.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})
collection.replace_one({"Marco": {"$exists": True}}, {"Buda": "Pest"})
# prints: UpdateResult(update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1}, raw_results=...)
collection.find_one({"Buda": "Pest"})
# prints: {'_id': '8424905a-...', 'Buda': 'Pest'}
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"})
# prints: UpdateResult(update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0}, raw_results=...)
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"}, upsert=True)
# prints: UpdateResult(update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '931b47d6-...'}, raw_results=...)

For more information, see the API reference.

Find a document matching a filter condition, and then replace the matching document with the given replacement:

const result = await collection.replaceOne(
  { 'Marco': 'Polo' }, // filter
  { 'Buda': 'Pest' }, // replacement
);

Locate and replace a document or create a new one if no match is found:

const result = await collection.replaceOne(
  { 'Marco': 'Polo' },
  { 'Buda': 'Pest' },
  { upsert: true },
);

Locate and replace the document most similar to a query vector from either $vector or $vectorize. In this example, the filter object is empty, and only the sort object is used to locate the document to replace. Including the empty filter object ensures that the replacement object is read correctly.

const result = await collection.replaceOne(
  {}, // empty filter
  { name: "Zoe", desc: "The new best match" }, // replacement
  { sort: { $vector: [0.1, 0.2, 0.3] } }, // sort object, to locate the document to replace
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to replace. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

replacement

NoId<Schema>

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

options?

ReplaceOneOptions

The options for this operation.

Options (ReplaceOneOptions):

Name Type Summary

upsert?

boolean

This parameter controls the behavior if there are no matches. If true and there are no matches, then the operation inserts the replacement as a new document. If false and there are no matches, then the operation silently does nothing.

sort?

Sort

See Find a document and Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

Returns:

Promise<ReplaceOneResult<Schema>> - The result of the replacement operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertOne({ 'Marco': 'Polo' });

  // { modifiedCount: 1, matchedCount: 1, upsertedCount: 0 }
  await collection.replaceOne(
    { 'Marco': { '$exists': true } },
    { 'Buda': 'Pest' }
  );

  // { _id: '3756ce75-aaf1-430d-96ce-75aaf1730dd3', Buda: 'Pest' }
  await collection.findOne({ 'Buda': 'Pest' });

  // { modifiedCount: 0, matchedCount: 0, upsertedCount: 0 }
  await collection.replaceOne(
    { 'Mirco': { '$exists': true } },
    { 'Oh': 'yeah?' }
  );

  // { modifiedCount: 0, matchedCount: 0, upsertedId: '...', upsertedCount: 1 }
  await collection.replaceOne(
    { 'Mirco': { '$exists': true } },
    { 'Oh': 'yeah?' },
    { upsert: true }
  );
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
UpdateResult replaceOne(Filter filter, T replacement);
UpdateResult replaceOne(Filter filter, T replacement, ReplaceOneOptions options);

// Asynchronous
CompletableFuture<UpdateResult> replaceOneAsync(Filter filter, T replacement);
CompletableFuture<UpdateResult> replaceOneAsync(Filter filter, T replacement, ReplaceOneOptions options);

Returns:

UpdateResult - Return a wrapper object with the result of the operation. The object contains the number of documents matched (matchedCount) and updated (modifiedCount)

Parameters:

Name Type Summary

filter (optional)

Filter

Filter criteria to find the document to replace. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For examples and options, including projection and sort, see Find documents using filtering options.

replacement

T

The new document to write into the collection. Define all fields that the replacement document must include, except for the _id.

Find and replace is intended to replace an existing document and retain the original document’s _id. An error occurs if the provided replacement has a different _id. In most cases, it is best to omit the _id field from the replacement.

options(optional)

ReplaceOneOptions

Set the different options for the replaceOne() operation, including the following:

  • sort(): See Find a document and Example values for sort operations.

  • upset(): Controls the behavior if there are no matches. If true and there are no matches, then the operation inserts the replacement as a new document. If false and there are no matches, then the operation silently does nothing.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneAndReplaceOptions;
import com.datastax.astra.client.model.Projections;
import com.datastax.astra.client.model.Sorts;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndReplace {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        FindOneAndReplaceOptions options = new FindOneAndReplaceOptions()
                .projection(Projections.include("field1"))
                .sort(Sorts.ascending("field1"))
                .upsert(true)
                .returnDocumentAfter();

        Document docForReplacement = new Document()
                .append("field1", "value1")
                .append("field2", 20)
                .append("field3", 30)
                .append("field4", "value4");

        // It will return the document before deleting it
        Optional<Document> docBeforeReplace = collection
                .findOneAndReplace(filter, docForReplacement, options);
    }
}

This operation has no literal equivalent in HTTP. Instead, you can use Find and replace a document with "projection": {"*": false}, which excludes all document fields from the response.

Find and delete a document

Find one document that matches a filter condition, delete it, and then return the deleted document.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find a document matching a filter condition, and then delete it:

deleted_document = collection.find_one_and_delete({"status": "stale_entry"})

Locate and delete the document most similar to a query vector from either $vector or $vectorize:

deleted_document = collection.find_one_and_delete(
    {},
    sort={"$vector": [0.1, 0.2, 0.3]},
)

Returns:

Dict[str, Any] - The deleted document or, if no matches are found, None. The exact fields returned depend on the projection parameter.

Example response
{'_id': 199, 'status': 'stale_entry', 'request_id': 'A4431'}

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

See Find a document and Example values for projection operations.

sort

Optional[Dict[str, Any]]

See Find a document and Example values for sort operations.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many(
    [
        {"species": "swan", "class": "Aves"},
        {"species": "frog", "class": "Amphibia"},
    ],
)
collection.find_one_and_delete(
    {"species": {"$ne": "frog"}},
    projection={"species": True},
)
# prints: {'_id': '5997fb48-...', 'species': 'swan'}
collection.find_one_and_delete({"species": {"$ne": "frog"}})
# (returns None for no matches)

For more information, see the API reference.

Find a document matching a filter condition, and then delete it:

const deletedDoc = await collection.findOneAndDelete({ status: 'stale_entry' });

Locate and delete the document most similar to a query vector from either $vector or $vectorize:

const deletedDoc = await collection.findOneAndDelete(
  {},
  { sort: { $vector: [0.1, 0.2, 0.3] } },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to delete. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

options?

FindOneAndDeleteOptions

The options for this operation.

Name Type Summary

projection?

Projection

See Find a document and Example values for projection operations.

sort?

Sort

See Find a document and Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

includeResultMetadata?

boolean

When true, returns ok: 1, in addition to the document, if the command executed successfully.

Returns:

Promise<WithId<Schema> | null> - The deleted document, or, if no matches are found, null. The exact fields returned depend on the projection parameter.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertMany([
    { species: 'swan', class: 'Aves' },
    { species: 'frog', class: 'Amphibia' },
  ]);

  // { _id: '...', species: 'swan' }
  await collection.findOneAndDelete(
    { species: { $ne: 'frog' } },
    { projection: { species: 1 } },
  );

  // null
  await collection.findOneAndDelete(
    { species: { $ne: 'frog' } },
  );
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
Optional<T> findOneAndDelete(Filter filter);
Optional<T> findOneAndDelete(Filter filter, FindOneAndDeleteOptions options);

// Asynchronous
CompletableFuture<Optional<T>> findOneAndDeleteAsync(Filter filter);
CompletableFuture<Optional<T>> findOneAndDeleteAsync(Filter filter, FindOneAndDeleteOptions options);

Returns:

DeleteResult - Wrapper that contains the deleted count.

Parameters:

Name Type Summary

filter (optional)

Filter

Filter criteria to find the document to delete. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For examples and options, including projection and sort, see Find documents using filtering options.

options (optional)

FindOneAndDeleteOptions

Set the different options for the find and delete operation, including the following:

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndDelete {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // It will return the document before deleting it
        Optional<Document> docBeforeRelease = collection.findOneAndDelete(filter);
    }
}

Find a document matching a filter condition, and then delete it:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "findOneAndDelete": {
    "filter": {
      "customer.name": "Fred Smith",
      "_id": "13"
    }
  }
}' | jq

Locate and delete the document most similar to a query vector from either $vector or $vectorize:

"findOneAndDelete": {
  "sort": { "$vector": [0.1, 0.2, 0.3] },
  "projection": { "$vector": 1 }
}

Parameters:

Name Type Summary

findOneAndDelete

command

The Data API command to find and delete the first document in a collection that matches the given filter/sort criteria. If there is no match, then no action is taken.

sort, filter

object

Search criteria to find the document to delete. For a list of available operators, see Data API operators. For examples and parameters, see Find a document and Example values for sort operations.

projection

object

Select a subset of fields to include in the response for the returned document. If empty or unset, the default projection is used. The default projection doesn’t always include all document fields. For more information and examples, see Example values for projection operations.

Response:

A successful response incudes data and status objects.

  • The data object can contain the deleted document, based on the projection parameter, if a matching document was found and deleted.

  • The status object contains the number of deleted documents. For findOneAndDelete, this is either 1 (one document deleted) or 0 (no matches).

{
  "status": {
    "deletedCount": 1
  }
}

Delete a document

Locate and delete one document in a collection.

deleteOne is similar to findOneAndDelete, except that the response includes only the result of the operation. The response doesn’t include a document object, and the request doesn’t support response-related parameters, such as projection or returnDocument.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find a document matching a filter condition, and then delete it:

# Find by ID
response = collection.delete_one({ "_id": "1" })

# Find by a document property
document = collection.delete_one({"location": "warehouse_C"})

# Find with a filter operator
document = collection.delete_one({"tag": {"$exists": True}})

Locate and delete the document most similar to a query vector from either $vector or $vectorize:

# Find by vector search with $vector
result = collection.delete_one({}, sort={"$vector": [.12, .52, .32]})

# Find by vector search with $vectorize
result = collection.delete_one({}, sort={"$vectorize": "Text to vectorize"})

Returns:

DeleteResult - An object representing the response from the database after the delete operation. It includes information about the success of the operation.

Example response
DeleteResult(deleted_count=1, raw_results=...)

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

sort

Optional[Dict[str, Any]]

See Find a document and Example values for sort operations.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])

collection.delete_one({"seq": 1})
# prints: DeleteResult(deleted_count=1, raw_results=...)
collection.distinct("seq")
# prints: [0, 2]
collection.delete_one(
    {"seq": {"$exists": True}},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: DeleteResult(deleted_count=1, raw_results=...)
collection.distinct("seq")
# prints: [0]
collection.delete_one({"seq": 2})
# prints: DeleteResult(deleted_count=0, raw_results=...)

For more information, see the API reference.

Find a document matching a filter condition, and then delete it:

// Find by ID
const result = await collection.deleteOne({ _id: '1' });

// Find by a document property
const result = await collection.deleteOne({ location: 'warehouse_C' });

// Find with a filter operator
const result = await collection.deleteOne({ tag: { $exists: true } });

Locate and delete the document most similar to a query vector from either $vector or $vectorize:

// Find by vector search with $vector
const result = await collection.deleteOne({}, { sort: { $vector: [.12, .52, .32] } });

// Find by vector search with $vectorize
const result = await collection.deleteOne({}, { sort: { $vectorize: 'Text to vectorize' } });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to delete. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

options?

DeleteOneOptions

The options for this operation.

Options (DeleteOneOptions):

Name Type Summary

sort?

Sort

See Find a document and Example values for sort operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete each underlying HTTP request.

Returns:

Promise<DeleteOneResult> - The result of the deletion operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertMany([{ seq: 1 }, { seq: 0 }, { seq: 2 }]);

  // { deletedCount: 1 }
  await collection.deleteOne({ seq: 1 });

  // [0, 2]
  await collection.distinct('seq');

  // { deletedCount: 1 }
  await collection.deleteOne({ seq: { $exists: true } }, { sort: { seq: -1 } });

  // [0]
  await collection.distinct('seq');

  // { deletedCount: 0 }
  await collection.deleteOne({ seq: 2 });
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
DeleteResult deleteOne(Filter filter);
DeleteResult deleteOne(Filter filter, DeleteOneOptions options);

// Asynchronous
CompletableFuture<DeleteResult> deleteOneAsync(Filter filter);
CompletableFuture<DeleteResult> deleteOneAsync(Filter filter, DeleteOneOptions options);

Returns:

DeleteResult - Wrapper that contains the deleted count.

Parameters:

Name Type Summary

filter (optional)

Filter

Filter criteria to find the document to delete. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For examples and options, including projection and sort, see Find documents using filtering options.

options (optional)

DeleteOneOptions

Set the different options for the deleteOne() operation, including sort.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteOneOptions;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;

public class DeleteOne {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sample Filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Delete one options
        DeleteOneOptions options = new DeleteOneOptions()
                .sort(Sorts.ascending("field2"));
        DeleteResult result = collection.deleteOne(filter, options);
        System.out.println("Deleted Count:" + result.getDeletedCount());
    }
}

Find a document matching a filter condition, and then delete it:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "deleteOne": {
    "filter": {
      "tags": "first"
    }
  }
}' | jq

Locate and update the document most similar to a query vector from either $vector or $vectorize:

"deleteOne": {
  "sort": { "$vector": [0.1, 0.2, 0.3] }
}

Parameters:

Name Type Summary

deleteOne

command

The Data API command to find and delete the first document in a collection that matches the given filter/sort criteria. If there is no match, then no action is taken.

sort, filter

object

Search criteria to find the document to delete. For a list of available operators, see Data API operators. For examples and parameters, see Find a document and Example values for sort operations.

Response:

A successful response returns the number of deleted documents. For deleteOne, this is either 1 (one document deleted) or 0 (no matches).

{
  "status": {
    "deletedCount": 1
  }
}

Delete documents

Delete all documents in a collection that match a given filter condition. If you supply an empty filter, then the operation deletes every document in the collection.

This operation doesn’t support sort conditions.

Sort and filter operations can use only indexed fields.

If you apply selective indexing when you create a collection, you can’t reference non-indexed fields in sort or filter queries.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

Find documents in a collection that match a given filter, and then delete them:

delete_result = collection.delete_many({"status": "processed"})

An empty filter deletes all documents and completely empties the collection:

delete_result = collection.delete_many({})

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. For example: {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

An empty filter deletes all documents, completely emptying the collection.

max_time_ms

Optional[int]

The timeout, in milliseconds, for the entire delete operation. This method uses the collection-level timeout by default.

Returns:

DeleteResult - An object representing the response from the database after the delete operation. It includes information about the success of the operation. A response of deleted_count=-1 indicates that every document in the collection was deleted.

Example response
DeleteResult(deleted_count=2, raw_results=...)

The time required for the delete operation depends on the number of documents that match the filter.

To delete a large number of documents, this operation issues multiple sequential HTTP requests until all matching documents are deleted. You might need to increase the timeout parameter to allow enough time for all underlying HTTP requests.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("A*PI_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])

collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=2)
collection.distinct("seq")
# prints: [2]
collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=0)

# An empty filter deletes all documents and completely empties the collection:
collection.delete_many({})
# prints: DeleteResult(raw_results=..., deleted_count=-1)

For more information, see the API reference.

Find documents in a collection that match a given filter, and then delete them:

const result = await collection.deleteMany({ status: 'processed' });

An empty filter deletes all documents and completely empties the collection:

const result = await collection.deleteMany({});

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to delete. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

An empty filter deletes all documents, completely emptying the collection.

options?

WithTimeout

The timeout, in milliseconds, for the entire delete operation.

Returns:

Promise<DeleteManyResult> - The result of the deletion operation. A deleted count of -1 indicates that every document in the collection was deleted.

The time required for the delete operation depends on the number of documents that match the filter.

To delete a large number of documents, this operation issues multiple sequential HTTP requests until all matching documents are deleted. You might need to increase the timeout parameter to allow enough time for all underlying HTTP requests.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertMany([{ seq: 1 }, { seq: 0 }, { seq: 2 }]);

  // { deletedCount: 1 }
  await collection.deleteMany({ seq: { $lte: 1 } });

  // [2]
  await collection.distinct('seq');

  // { deletedCount: 0 }
  await collection.deleteMany({ seq: { $lte: 1 } });

  // { deletedCount: -1 }
  await collection.deleteMany({});
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the API reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
DeleteResult deleteMany(Filter filter);

// Asynchronous
CompletableFuture<DeleteResult> deleteManyAsync(Filter filter);

An empty filter deletes all documents and completely empties the collection:

DeleteResult deleteMany();

Parameters:

Name Type Summary

filter (optional)

Filter

Filter criteria to find the documents to delete. The filter is a JSON object that can contain any valid Data API filter expression. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

An empty filter deletes all documents, completely emptying the collection.

Returns:

DeleteResult - Wrapper that contains the deleted count.

The time required for the delete operation depends on the number of documents that match the filter. To delete a large number of documents, this operation iterates over batches of documents until all matching documents are deleted.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import static com.datastax.astra.client.model.Filters.lt;

public class DeleteMany {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sample Filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));
        DeleteResult result = collection.deleteMany(filter);
        System.out.println("Deleted Count:" + result.getDeletedCount());

    }
}

Find documents in a collection that match a filter condition, and then delete them:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_NAMESPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "deleteMany": {
    "filter": {
      "status": "inactive"
    }
  }
}' | jq

An empty filter or deleteMany object deletes all documents and completely empties the collection:

# Empty filter object
"deleteMany": { "filter": {} }

# Empty deleteMany object
"deleteMany": {}

Parameters:

Name Type Summary

deleteMany

command

The Data API command to delete all matching documents from a collection based on the provided filter criteria.

filter

object

A filter to select the documents to delete. For a list of available operators, see Data API operators. For additional examples, see Find documents using filtering options.

An empty filter deletes all documents, completely emptying the collection.

Response:

A successful response returns the result of the delete operation. This operation deletes up to 20 documents at a time. If the deletedCount is 20, there might be more matching documents to delete.

{
  "status": {
    "deletedCount": 20
  }
}

To delete another batch of documents, reissue the same request. Continue issuing the deleteMany request until the deletedCount is less than 20.

Example of batch deletion

For this example, assume that you send the following deleteMany command and the server finds 30 matching documents:

{
    "deleteMany": {
        "filter": { "a": true }
    }
}

The server deletes the first 20 documents and then returns the following response:

{
  "status": {
    "moreData": true,
    "deletedCount": 20
  }
}

The server doesn’t tell you explicitly how many matches were found. However, the deletedCount of 20 indicates there could be more matching documents to delete. To delete the next batch of documents, reissue the same deleteMany command:

{
    "deleteMany": {
        "filter": { "a": true }
    }
}

This time, the server returns the following:

{
  "status": {
    "deletedCount": 10
  }
}

Because the deletedCount is less than 20, this indicates that all matching documents were deleted.

To confirm, you can reissue the deleteMany request and get a deletedCount of 0:

{
  "status": {
    "deletedCount": 0
  }
}

A deleted count of -1 indicates that every document in the collection was deleted. This occurs if you pass an empty filter or deleteMany object. In this case, you don’t need to delete documents in batches. With an empty filter, the server automatically iterates over batches of documents until all documents are deleted.

{
  "status": {
    "deletedCount": -1
  }
}

Execute multiple write operations

Execute a (reusable) list of write operations on a collection with a single command.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the API reference.

bw_results = collection.bulk_write(
    [
        InsertMany([{"a": 1}, {"a": 2}]),
        ReplaceOne(
            {"z": 9},
            replacement={"z": 9, "replaced": True},
            upsert=True,
        ),
    ],
)

Returns:

BulkWriteResult - A single object summarizing the whole list of requested operations. The keys in the map attributes of the result (when present) are the integer indices of the corresponding operation in the requests iterable.

Example response
BulkWriteResult(deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'}, bulk_api_results=...)

Parameters:

Name Type Summary

requests

Iterable[BaseOperation]

An iterable over concrete subclasses of BaseOperation, such as InsertMany or ReplaceOne. Each such object represents an operation ready to be executed on a collection, and is instantiated by passing the same parameters as one would the corresponding collection method.

ordered

bool

Whether to launch the requests one after the other or in arbitrary order, possibly in a concurrent fashion. DataStax suggests False (default) when possible for faster performance.

concurrency

Optional[int]

Maximum number of concurrent operations executing at a given time. It cannot be more than one for ordered bulk writes.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the whole bulk write. This method uses the collection-level timeout by default. You may need to increase the timeout duration depending on the number of operations. If the method call times out, there’s no guarantee about how much of the bulk write was completed.

Example:

from astrapy import DataAPIClient
from astrapy.operations import (
    InsertOne,
    InsertMany,
    UpdateOne,
    UpdateMany,
    ReplaceOne,
    DeleteOne,
    DeleteMany,
)
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

op1 = InsertMany([{"a": 1}, {"a": 2}])
op2 = ReplaceOne({"z": 9}, replacement={"z": 9, "replaced": True}, upsert=True)
collection.bulk_write([op1, op2])
# prints: BulkWriteResult(deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'}, bulk_api_results=...)
collection.count_documents({}, upper_bound=100)
# prints: 3
collection.distinct("replaced")
# prints: [True]

For more information, see the API reference.

const results = await collection.bulkWrite([
  { insertOne: { a: '1' } },
  { insertOne: { a: '2' } },
  { replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);

Parameters:

Name Type Summary

operations

AnyBulkWriteOperation<Schema>[]

The operations to perform.

options?

BulkWriteOptions

The options for this operation.

Options (BulkWriteOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all operations may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered operations. Defaults to 8.

Not available for ordered operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<BulkWriteResult<Schema>> - A promise that resolves to a summary of the performed operations.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { namespace: 'NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.bulkWrite([
    { insertOne: { document: { a: 1 } } },
    { insertOne: { document: { a: 2 } } },
    { replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
  ]);

  // 3
  await collection.countDocuments({}, 100);

  // [true]
  await collection.distinct('replaced');
})();
// Synchronous
BulkWriteResult bulkWrite(List<Command> commands);
BulkWriteResult bulkWrite(List<Command> commands, BulkWriteOptions options);

// Asynchronous
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands);
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands, BulkWriteOptions options);

Returns:

BulkWriteResult - Wrapper with the list of responses for each command.

Parameters:

Name Type Summary

commands

List<Command>

List of the generic Command to execute.

options(optional)

BulkWriteOptions

Provide list of options for those commands like ordered or concurrency.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.BulkWriteOptions;
import com.datastax.astra.client.model.BulkWriteResult;
import com.datastax.astra.client.model.Command;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.internal.api.ApiResponse;

import java.util.List;

public class BulkWrite {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Set a couple of Commands
        Command cmd1 = Command.create("insertOne").withDocument(new Document().id(1).append("name", "hello"));
        Command cmd2 = Command.create("insertOne").withDocument(new Document().id(2).append("name", "hello"));

        // Set the options for the bulk write
        BulkWriteOptions options1 = BulkWriteOptions.Builder.ordered(false).concurrency(1);

        // Execute the queries
        BulkWriteResult result = collection.bulkWrite(List.of(cmd1, cmd2), options1);

        // Retrieve the LIST of responses
        for(ApiResponse res : result.getResponses()) {
            System.out.println(res.getData());
        }
    }

}

This operation has no literal equivalent in HTTP. Instead, you can execute multiple, sequential write operations.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com