Documents reference

Documents represent a single row or record of data in a namespace. You use the Collection class to work with documents. If you haven’t done so already, consult the Collections reference topic for details on how to get a Collection object.

Working with dates

  • Python

  • TypeScript

  • Java

  • cURL

Date and datetime objects, which are instances of the Python standard library datetime.datetime and datetime.date classes, can be used anywhere in documents.

collection.insert_one({"when": datetime.datetime.now()})
collection.insert_one({"date_of_birth": datetime.date(2000, 1, 1)})

collection.update_one(
    {"registered_at": datetime.date(1999, 11, 14)},
    {"$set": {"message": "happy Sunday!"}},
)

print(
    collection.find_one(
        {"date_of_birth": {"$lt": datetime.date(2001, 1, 1)}},
        projection={"_id": False},
    )
)
# will print:
#    {'date_of_birth': datetime.datetime(2000, 1, 1, 0, 0)}

As shown in the example, read operations from a collection always return the datetime class regardless of whether a date or a datetime was provided in the insertion.

Native JS Date objects can be used anywhere in documents to represent dates and times.

Document fields stored using the { $date: number } will also be returned as Date objects when read.

(async function () {
  // Create an untyped collection
  const collection = await db.createCollection('dates_test', { checkExists: false });

  // Insert documents with some dates
  await collection.insertOne({ dateOfBirth: new Date(1394104654000) });
  await collection.insertOne({ dateOfBirth: new Date('1863-05-28') });

  // Update a document with a date and setting lastModified to now
  await collection.updateOne(
    {
      dateOfBirth: new Date('1863-05-28'),
    },
    {
      $set: { message: 'Happy Birthday!' },
      $currentDate: { lastModified: true },
    },
  );

  // Will print around new Date()
  const found = await collection.findOne({ dateOfBirth: { $lt: new Date('1900-01-01') } });
  console.log(found?.lastModified);
})();

Data API is using the ejson standard to represents time-related objects. The client introducing custom serializers but 3 types of objects java.util.Date, java.util.Calendar, java.util.Instant.

Those objects can be used naturally both in filter clauses, update clauses and or in documents.

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.Projections;

import java.time.Instant;
import java.util.Calendar;
import java.util.Date;

import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Updates.set;

public class WorkingWithDates {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        Calendar c = Calendar.getInstance();
        collection.insertOne(new Document().append("registered_at", c));
        collection.insertOne(new Document().append("date_of_birth", new Date()));
        collection.insertOne(new Document().append("just_a_date", Instant.now()));

        collection.updateOne(
                eq("registered_at", c), // filter clause
                set("message", "happy Sunday!")); // update clause

        collection.findOne(
                lt("date_of_birth", new Date(System.currentTimeMillis() - 1000 * 1000)),
                new FindOneOptions().projection(Projections.exclude("_id")));
    }
}

In the JSON payload of the following Data API insertOne command, $date is used to specify a car’s purchase date:

"purchase_date": {"$date": 1690045891}

Example:

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "insertOne": {
    "document": {
      "_id": "1",
      "purchase_type": "Online",
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": {"$date": 1690045891},
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car" : "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status" : "active",
      "preferred_customer" : true
    }
  }
}' | json_pp
Response
{
    "status": {
        "insertedIds": [
            "1"
        ]
    }
}

Working with document IDs

Documents in a collection are always identified by an ID that is unique within the collection. The ID can be any of several types, such as a string, integer, or datetime. However, it’s recommended to instead prefer the uuid or the ObjectId types.

The Data API supports uuid identifiers up to version 8, as well as ObjectId identifiers as provided by the bson library. These can appear anywhere in the document, not only in its _id field. Moreover, different types of identifier can appear in different parts of the same document. And these identifiers can be part of filtering clauses and update/replace directives just like any other data type.

One of the optional settings of a collection is the "default ID type": that is, it is possible to specify what kind of identifiers the server should supply for documents without an explicit _id field. (For details, see the create_collection method and Data API createCollection command in the Collections reference.) Regardless of the defaultId setting, however, identifiers of any type can be explicitly provided for documents at any time and will be honored by the API, for example when inserting documents.

  • Python

  • TypeScript

  • Java

  • cURL

from astrapy.ids import (
    ObjectId,
    uuid1,
    uuid3,
    uuid4,
    uuid5,
    uuid6,
    uuid7,
    uuid8,
    UUID,
)

AstraPy recognizes uuid versions 1 through 8 (with the exception of 2) as provided by the uuid and uuid6 Python libraries, as well as the ObjectId from the bson package. Furthermore, out of convenience, these same utilities are exposed in AstraPy directly, as shown in the example above.

You can then generate new identifiers with statements such as new_id = uuid8() or new_obj_id = ObjectId(). Keep in mind that all uuid versions are instances of the same class (UUID), which exposes a version property, should you need to access it.

Here is a short example:

collection.insert_one({"_id": uuid8(), "tag": "new_id_v_8"})
collection.insert_one(
    {"_id": UUID("018e77bc-648d-8795-a0e2-1cad0fdd53f5"), "tag": "id_v_8"}
)
collection.insert_one({"id": ObjectId(), "tag": "new_obj_id"})
collection.insert_one(
    {"id": ObjectId("6601fb0f83ffc5f51ba22b88"), "tag": "obj_id"}
)
collection.find_one_and_update(
    {"_id": ObjectId("6601fb0f83ffc5f51ba22b88")},
    {"$set": {"item_inventory_id": UUID("1eeeaf80-e333-6613-b42f-f739b95106e6")}},
)
import { UUID, ObjectId } from '@datastax/astra-db-ts';

astra-db-ts provides the UUID and ObjectId classes for using and generating new identifiers. Note that these are not the same as exported from the bson or uuid libraries, but rather are custom classes that must be imported from the astra-db-ts package.

You can generate new identifiers using UUID.v4(), UUID.v7(), or new ObjectId(). The UUID methods all return an instance of the same class, but it exposes a version property, should you need to access it. They may also be constructed from a string representation of the IDs if custom generation is desired.

Here is a short example of the concepts:

import { DataAPIClient, UUID, ObjectId } from '@datastax/astra-db-ts';

// Schema for the collection
interface Person {
  _id: UUID | ObjectId;
  name: string;
  friendId?: UUID;
}

  // Insert documents w/ various IDs
  await collection.insertOne({ name: 'John', _id: UUID.v4() });
  await collection.insertOne({ name: 'Jane', _id: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') });

  await collection.insertOne({ name: 'Dan', _id: new ObjectId()});
  await collection.insertOne({ name: 'Tim', _id: new ObjectId('65fd9b52d7fabba03349d013') });

  // Update a document with a UUID in a non-_id field
  await collection.updateOne(
    { name: 'John' },
    { $set: { friendId: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') } },
  );

  // Find a document by a UUID in a non-_id field
  const john = await collection.findOne({ name: 'John' });
  const jane = await collection.findOne({ _id: john!.friendId });

  // Prints 'Jane 016b1cac-14ce-660e-8974-026c927b9b91 6'
  console.log(jane?.name, jane?._id.toString(), (<UUID>jane?._id).version);
})();
  • To cope with different implementations of UUID (v6 and v7 especially) dedicated classes have been defined.

  • When an unique identifier is retrieved from the server, it is returned as a uuid and will be converted to the appropriate UUID class leveraging the class definition in the defaultId option.

  • The ObjectId classes is extracted from the Bson package and is used to represent the ObjectId type.

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.ObjectId;
import com.datastax.astra.client.model.UUIDv6;
import com.datastax.astra.client.model.UUIDv7;

import java.time.Instant;
import java.util.UUID;

import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Updates.set;

public class WorkingWithDocumentIds {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Ids can be different Json scalar
        // ('defaultId' options NOT set for collection)
        new Document().id("abc");
        new Document().id(123);
        new Document().id(Instant.now());

        // Working with UUIDv4
        new Document().id(UUID.randomUUID());

        // Working with UUIDv6
        collection.insertOne(new Document().id(new UUIDv6()).append("tag", "new_id_v_6"));
        UUID uuidv4 = UUID.fromString("018e77bc-648d-8795-a0e2-1cad0fdd53f5");
        collection.insertOne(new Document().id(new UUIDv6(uuidv4)).append("tag", "id_v_8"));

        // Working with UUIDv7
        collection.insertOne(new Document().id(new UUIDv7()).append("tag", "new_id_v_7"));

        // Working with ObjectIds
        collection.insertOne(new Document().id(new ObjectId()).append("tag", "obj_id"));
        collection.insertOne(new Document().id(new ObjectId("6601fb0f83ffc5f51ba22b88")).append("tag", "obj_id"));

        collection.findOneAndUpdate(
                eq((new ObjectId("6601fb0f83ffc5f51ba22b88"))),
                set("item_inventory_id", UUID.fromString("1eeeaf80-e333-6613-b42f-f739b95106e6")));
    }
}

Java natural UUID are implemented the UUID v4 standard.

The same underlying ID functionality as noted for the clients applies when using _id types with Data API commands. For full details about the defaultId option on the createCollection command, and its accepted type settings, see The defaultId option.

Example:

{
    "createCollection": {
        "name": "vector_collection2",
        "options": {
            "defaultId": {
                "type": "objectId"
            },
            "vector": {
                "dimension": 1024,
                "metric": "cosine"
            }
        }
    }
}
Response
{
    "status": {
        "ok": 1
    }
}

Insert a single document

Insert a single document into a collection.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

insert_result = collection.insert_one({"name": "Jane Doe"})

Insert a document with an associated vector.

insert_result = collection.insert_one(
    {
      "name": "Jane Doe",
      "$vector": [.08, .68, .30],
    },
)

Insert a document and generate a vector automatically.

insert_result = collection.insert_one(
    {
      "name": "Jane Doe",
      "$vectorize": "Text to vectorize",
    },
)

Returns:

InsertOneResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertOneResult(raw_results=[{'status': {'insertedIds': ['92b4c4f4-db44-4440-b4c4-f4db44e440b8']}}], inserted_id='92b4c4f4-db44-4440-b4c4-f4db44e440b8')

Parameters:

Name Type Summary

document

Dict

The dictionary expressing the document to insert. The _id field of the document can be left out, in which case it will be created automatically.

vector

Optional[Iterable[float]]

A vector (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the "$vector" field of the document itself, however the two are mutually exclusive.

vectorize

Optional[str]

A string to be vectorized. This only works for collections associated with an embedding service.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead.

Example:

# Insert a document with a specific ID
response1 = collection.insert_one(
    {
        "_id": 101,
        "name": "John Doe",
        "$vector": [.12, .52, .32],
    },
)

# Insert a document without specifying an ID
# so that _id is generated automatically
response2 = collection.insert_one(
    {
        "name": "Jane Doe",
        "$vector": [.08, .68, .30],
    },
)

View this topic in more detail on the API Reference.

const result = await collection.insertOne({ name: 'Jane Doe' });

Insert a document with an associated vector.

const result = await collection.insertOne(
  {
    name: 'Jane Doe',
    $vector: [.08, .68, .30],
  },
);

Insert a document and generate a vector automatically.

const result = await collection.insertOne(
  {
    name: 'Jane Doe',
    $vectorize: 'Text to vectorize',
  },
);

Parameters:

Name Type Summary

document

MaybeId<Schema>

The document to insert. If the document does not have an _id field, the server generates one.

options?

InsertOneOptions

The options for this operation.

Options (InsertOneOptions):

Name Type Summary

vector?

number[]

The vector for the document.

Equivalent to providing the vector in the $vector field of the document itself; however, the two are mutually exclusive.

vectorize?

string

A string to be vectorized. This only works for collections associated with an embedding service.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertOneResult<Schema>> - A promise that resolves to the inserted ID.

Example:

(async function () {
  // Insert a document with a specific ID
  await collection.insertOne({ _id: '1', name: 'John Doe' });

  // Insert a document with an autogenerated ID
  await collection.insertOne({ name: 'Jane Doe' });

  // Insert a document with a vector
  await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();
  • Operations on documents are performed at Collection level, to get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

InsertOneResult insertOne(DOC document);
InsertOneResult insertOne(DOC document, float[] embeddings);

// Equivalent in asynchronous
CompletableFuture<InsertOneResult> insertOneAsync(DOC document);
CompletableFuture<InsertOneResult> insertOneAsync(DOC document, float[] embeddings);

Returns:

InsertOneResult - Wrapper with the inserted document Id.

Parameters:

Name Type Summary

document

DOC

Object representing the document to insert. The _id field of the document can be left out, in which case it will be created automatically. If the collection is associated with an embedding service, it will generate a vector automatically from the $vectorize field.

embeddings

float[]

A vector of embeddings (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the $vector field of the document itself.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertOneOptions;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

public class InsertOne {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        InsertOneResult res1 = collectionDoc.insertOne(doc1);
        System.out.println(res1.getInsertedId()); // should be "1"

        // Insert a document with embeddings
        Document doc2 = new Document("2").append("name", "joe");
        collectionDoc.insertOne(doc2, new float[] {.1f, .2f});

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with custom bean
        collectionProduct.insertOne(new Product("1", "joe"));
        collectionProduct.insertOne(new Product("2", "joe"), new float[] {.1f, .2f});

    }
}
cURL -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "insertOne": {
    "document": {
      "_id": "1",
      "purchase_type": "Online",
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": {"$date": 1690045891},
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car" : "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status" : "active",
      "preferred_customer" : true
    }
  }
}' | json_pp

Properties:

Name Type Summary

insertOne

command

Data API designation that a single document is inserted.

document

JSON object

Contains the details of the record added.

_id

uuid4

A unique identifier for the document. Other _id types are possible based on how the collection was created, including a defaultId option on a createCollection command or an equivalent method in a client. See the defaultId option.

purchase_type

string

Specifies how the purchase was made.

$vector

array

A reserved property used to store vector data. The value is an array of numbers, or it can be generated. These numbers could be used for various purposes like similarity searches, clustering, or other mathematical commands that can be applied to vectors. Given that this is a reserved property, the vector-enabled DataStax Enterprise (DSE) database has specialized handling for data stored in this format. That is, optimized query performance for vector similarity.

customer

string

Information about the customer who made the purchase.

customer.name

string

The customer’s name

customer.phone

string

The customer’s contact phone number.

customer.age

number

The customer’s age. Subsequent examples can use the customer.age property to demonstrate range and logical query operators such as $gt, $lt, gte, lte, and $not.

customer.credit_score

number

The customer’s credit score at the time of the car’s purchase. Subsequent examples can use customer.credit_score to demonstrate range and logical query operators.

customer.address

string

Contains further details about the customer’s address.

customer.address_line

string

The customer’s street or location address.

customer.city

string

The customer’s city.

customer.state

string

The state where the customer resides.

purchase_date

date

The date on which the purchase was made, using the $date operator and an Epoch value.

seller

JSON object

Information about the seller from whom the purchase was made.

seller.name

string

The seller’s name.

seller.location

string

The seller’s location.

items

JSON object

An array detailing the items included in this purchase.

items.car

string

Information about the make and model of the car.

items.color

string

Information about the car’s color.

Extended warranty - 5 years

string

Additional detail that’s part of the items array. Indicates the customer has an "Extended warranty - 5 years" as part of the purchase.

amount

number

The total cost of the purchase.

status

string

Current status of the purchase.

preferred_customer

boolean

Whether the buyer is a preferred customer.

Response
{
    "status": {
        "insertedIds": [
            "1"
        ]
    }
}

Insert many documents

Insert multiple documents into a collection.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

response = collection.insert_many(
    [
        {
            "_id": 101,
            "name": "John Doe",
            "$vector": [.12, .52, .32],
        },
        {
            # ID is generated automatically
            "name": "Jane Doe",
            "$vector": [.08, .68, .30],
        },
    ],
)

Insert multiple documents and generate vectors automatically.

response = collection.insert_many(
    [
        {
            "name": "John Doe",
            "$vectorize": "Text to vectorize for John Doe",
        },
        {
            "name": "Jane Doe",
            "$vectorize": "Text to vectorize for Jane Doe",
        },
    ],
)

Returns:

InsertManyResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertManyResult(raw_results=[{'status': {'insertedIds': [101, '81077d86-05dc-43ca-877d-8605dce3ca4d']}}], inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'])

Parameters:

Name Type Summary

documents

Iterable[Dict[str, Any]],

An iterable of dictionaries, each a document to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically.

vectors

Optional[Iterable[Optional[Iterable[float]]]]

An optional list of vectors (as many vectors as the provided documents) to associate to the documents when inserting. Each vector is added to the corresponding document prior to insertion on database. The list can be a mixture of None and vectors, in which case some documents will not have a vector, unless it is specified in their "$vector" field already. Passing vectors this way is indeed equivalent to the "$vector" field of the documents, however the two are mutually exclusive.

vectorize

Optional[Iterable[Optional[str]]]

An optional list of strings to be vectorized. This only works for collections associated with an embedding service.

ordered

bool

If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance.

chunk_size

Optional[int]

How many documents to include in a single API request. The default and maximum value is 20.

concurrency

Optional[int]

Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully.

Unless there are specific reasons not to, it is recommended to prefer ordered = False as it will result in a much higher insert throughput than an equivalent ordered insertion.

Example:

collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])

collection.insert_many(
    [{"seq": i} for i in range(50)],
    concurrency=5,
)

collection.insert_many(
    [
        {"tag": "a", "$vector": [1, 2]},
        {"tag": "b", "$vector": [3, 4]},
    ]
)

View this topic in more detail on the API Reference.

const result = await collection.insertMany([
  {
    _id: '1',
    name: 'John Doe',
    $vector: [.12, .52, .32],
  },
  {
    name: 'Jane Doe',
    $vector: [.08, .68, .30],
  },
], {
  ordered: true,
});

Insert multiple documents and generate vectors automatically.

const result = await collection.insertMany([
  {
    name: 'John Doe',
    $vectorize: 'Text to vectorize for John Doe',
  },
  {
    name: 'Jane Doe',
    $vectorize: 'Text to vectorize for Jane Doe',
  },
], {
  ordered: true,
});

Parameters:

Name Type Summary

documents

MaybeId<Schema>[]

The documents to insert. If any document does not have an _id field, the server generates one.

options?

InsertManyOptions

The options for this operation.

Options (InsertManyOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all documents may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered insertions. Defaults to 8.

Not available for ordered insertions.

chunkSize?

number

Control how many documents are sent each network request. The default and maximum value is 20.

vectors?

(number[] | null | undefined)[]

An array of vectors to associate with each document. If a vector is null or undefined, the document will not have a vector. Must equal the number of documents if provided.

Equivalent to providing the vector in the $vector field of the documents themselves; however, the two are mutually exclusive.

vectorize?

string[]

An array of strings to be vectorized. This only works for collections associated with an embedding service.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Unless there are specific reasons not to, it is recommended to prefer to leave ordered false as it will result in a much higher insert throughput than an equivalent ordered insertion.

Returns:

Promise<InsertManyResult<Schema>> - A promise that resolves to the inserted IDs.

Example:

(async function () {
  try {
    // Insert many documents
    await collection.insertMany([
      { _id: '1', name: 'John Doe' },
      { name: 'Jane Doe' }, // Will autogen ID
    ], { ordered: true });

    // Insert many with vectors
    await collection.insertMany([
      { name: 'John Doe', $vector: [.12, .52, .32] },
      { name: 'Jane Doe', $vector: [.32, .52, .12] },
    ]);
  } catch (e) {
    if (e instanceof InsertManyError) {
      console.log(e.partialResult);
    }
  }
})();
  • Operations on documents are performed at Collection level, to get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);

// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);

Returns:

InsertManyResult - Wrapper with the list of inserted document ids.

Parameters:

Name Type Summary

docList

List<? extends DOC>

A list of documents to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. If the collection is associated with an embedding service, it will generate vectors automatically from the $vectorize field in each document. You can also set the $vector field directly.

options (optional)

InsertManyOptions

Set the different options for the insert operation. The options are ordered, concurrency, chunkSize.

The java operation insertMany can take as many documents as you want as long as it fits in your JVM memory. It will split the documents in chunks of chunkSize and send them to the server in a distributed way through an ExecutorService. The default and maximum value of chunkSize is 20. To set the size of the executor use concurrency.

InsertManyOptions.Builder
  .chunkSize(20)  // batch size, 20 is max
  .concurrency(8) // concurrent insertions
  .ordered(false) // unordered insertions
  .build();

If not provided the default values are chunkSize=20, concurrency=1 and ordered=false.

  • It is recommended to work with ordered=false for performance reasons. It would then insert chunks in parallels.

  • Try to always provide the InsertManyOptions even when using default, it brings visibility to the readers.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

import java.util.List;

public class InsertMany {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        Document doc2 = new Document("2").append("name", "joe");
        InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
        System.out.println("Identifiers inserted: " + res1.getInsertedIds());

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with embeddings
        InsertManyOptions options = new InsertManyOptions()
                .chunkSize(20)  // how many process per request
                .concurrency(1) // parallel processing
                .ordered(false) // allows parallel processing
                .timeout(1000); // timeout in millis

        InsertManyResult res2 = collectionProduct.insertMany(
                List.of(new Product("1", "joe"),
                        new Product("2", "joe")),
                options);
    }
}

The API accepts up to 20 documents per request.

The following Data API insertMany command adds 20 documents to a collection.

cURL -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "_id": "2",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "customer": {
          "name": "Jack B.",
          "phone": "123-456-2222",
        "age": 34,
        "credit_score": 700,
          "address": {
            "address_line": "888 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1690391491},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [
            {
          "car" : "Tesla Model 3",
          "color": "White"
            },
            "Extended warranty - 10 years",
            "Service - 5 years"
        ],
        "amount": 53990,
      "status" : "active"
      },
      {
        "_id": "3",
        "purchase_type": "Online",
        "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
        "customer": {
          "name": "Jill D.",
          "phone": "123-456-3333",
        "age": 30,
        "credit_score": 742,
          "address": {
            "address_line": "12345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1690564291},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": "Extended warranty - 10 years",
        "amount": 4600,
      "status" : "active"
      },
      {
        "_id": "4",
        "purchase_type": "In Person",
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.26],
        "customer": {
          "name": "Lester M.",
          "phone": "123-456-4444",
        "age": 40,
        "credit_score": 802,
          "address": {
            "address_line": "12346 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1690909891},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [
            {
          "car" : "BMW 330i Sedan",
          "color": "Red"
            },
            "Extended warranty - 5 years",
            "Service - 5 years"
        ],
        "amount": 48510,
      "status" : "active"
      },
      {
        "_id": "5",
        "purchase_type": "Online",
        "$vector": [0.25, 0.045, 0.38, 0.31, 0.67],
        "customer": {
          "name": "David C.",
          "phone": "123-456-5555",
        "age": 50,
        "credit_score": 800,
          "address": {
            "address_line": "32345 Main Ave",
            "city": "Jersey City",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1690996291},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [
          {
          "car" : "Tesla Model S",
          "color": "Red"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 94990,
      "status" : "active"
      },
      {
        "_id": "6",
        "purchase_type": "In Person",
        "$vector": [0.11, 0.02, 0.78, 0.10, 0.27],
        "customer": {
          "name": "Chris E.",
          "phone": "123-456-6666",
        "age": 43,
        "credit_score": 764,
          "address": {
            "address_line": "32346 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1691860291},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [
          {
          "car" : "Tesla Model X",
          "color": "Blue"
            }
        ],
        "amount": 109990,
      "status" : "active"
      },
      {
        "_id": "7",
        "purchase_type": "Online",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Jeff G.",
          "phone": "123-456-7777",
        "age": 66,
        "credit_score": 802,
          "address": {
            "address_line": "22999 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1692119491},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "BMW M440i Gran Coupe",
          "color": "Black"
            },
            "Extended warranty - 5 years"],
        "amount": 61050,
      "status" : "active"
      },
      {
        "_id": "8",
        "purchase_type": "In Person",
        "$vector": [0.3, 0.23, 0.15, 0.17, 0.4],
        "customer": {
          "name": "Harold S.",
          "phone": "123-456-8888",
        "age": 29,
        "credit_score": 710,
          "address": {
            "address_line": "1234 Main St",
            "city": "Orange",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1693329091},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [{
          "car" : "BMW X3 SUV",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 46900,
      "status" : "active"
      },
      {
        "_id": "9",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.06],
        "customer": {
          "name": "Richard Z.",
          "phone": "123-456-9999",
        "age": 22,
        "credit_score": 690,
          "address": {
            "address_line": "22345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1693588291},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "Tesla Model 3",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 53990,
      "status" : "active"
      },
      {
        "_id": "10",
        "purchase_type": "In Person",
        "$vector": [0.25, 0.045, 0.38, 0.31, 0.68],
        "customer": {
          "name": "Eric B.",
          "phone": null,
        "age": 54,
        "credit_score": 780,
          "address": {
            "address_line": "9999 River Rd",
            "city": "Fair Haven",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1694797891},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model S",
          "color": "Black"
            }
        ],
        "amount": 93800,
      "status" : "active"
      },
      {
        "_id": "11",
        "purchase_type": "Online",
        "$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
        "customer": {
          "name": "Ann J.",
          "phone": "123-456-1112",
        "age": 47,
        "credit_score": 660,
          "address": {
            "address_line": "99 Elm St",
            "city": "Fair Lawn",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1695921091},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model Y",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 57500,
      "status" : "active"
      },
      {
        "_id": "12",
        "purchase_type": "In Person",
        "$vector": [0.33, 0.44, 0.55, 0.77, 0.66],
        "customer": {
          "name": "John T.",
          "phone": "123-456-1123",
        "age": 55,
        "credit_score": 786,
          "address": {
            "address_line": "23 Main Blvd",
            "city": "Staten Island",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1696180291},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [{
          "car" : "BMW 540i xDrive Sedan",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 64900,
      "status" : "active"
      },
      {
        "_id": "13",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.07],
        "customer": {
          "name": "Aaron W.",
          "phone": "123-456-1133",
        "age": 60,
        "credit_score": 702,
          "address": {
            "address_line": "1234 4th Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1697389891},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [{
          "car" : "Tesla Model 3",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 55000,
      "status" : "active"
      },
      {
        "_id": "14",
        "purchase_type": "In Person",
        "$vector": [0.11, 0.02, 0.78, 0.21, 0.27],
        "customer": {
          "name": "Kris S.",
          "phone": "123-456-1144",
        "age": 44,
        "credit_score": 702,
          "address": {
            "address_line": "1414 14th Pl",
            "city": "Brooklyn",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1698513091},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "Tesla Model X",
          "color": "White"
            }
        ],
        "amount": 110400,
      "status" : "active"
      },
      {
        "_id": "15",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.08],
        "customer": {
          "name": "Maddy O.",
          "phone": "123-456-1155",
        "age": 41,
        "credit_score": 782,
          "address": {
            "address_line": "1234 Maple Ave",
            "city": "West New York",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1701191491},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": {
          "car" : "Tesla Model 3",
          "color": "White"
            },
        "amount": 52990,
      "status" : "active"
      },
      {
        "_id": "16",
        "purchase_type": "In Person",
        "$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
        "customer": {
          "name": "Tim C.",
          "phone": "123-456-1166",
        "age": 38,
        "credit_score": 700,
          "address": {
            "address_line": "1234 Main St",
            "city": "Staten Island",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1701450691},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [{
          "car" : "Tesla Model Y",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 58990,
      "status" : "active"
      },
      {
        "_id": "17",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.09],
        "customer": {
          "name": "Yolanda Z.",
          "phone": "123-456-1177",
        "age": 61,
        "credit_score": 694,
          "address": {
            "address_line": "1234 Main St",
            "city": "Hoboken",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1702660291},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model 3",
          "color": "Blue"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 54900,
      "status" : "active"
      },
      {
        "_id": "18",
        "purchase_type": "Online",
        "$vector": [0.15, 0.17, 0.15, 0.43, 0.55],
        "customer": {
          "name": "Thomas D.",
          "phone": "123-456-1188",
        "age": 45,
        "credit_score": 724,
          "address": {
            "address_line": "98980 20th St",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1703092291},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [{
          "car" : "BMW 750e xDrive Sedan",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 106900,
      "status" : "active"
      },
      {
        "_id": "19",
        "purchase_type": "Online",
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.27],
        "customer": {
          "name": "Vivian W.",
          "phone": "123-456-1199",
        "age": 20,
        "credit_score": 698,
          "address": {
            "address_line": "5678 Elm St",
            "city": "Hartford",
            "state": "CT"
          }
        },
        "purchase_date": {"$date": 1704215491},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "BMW 330i Sedan",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 46980,
      "status" : "active"
      },
      {
        "_id": "20",
        "purchase_type": "In Person",
        "$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
        "customer": {
          "name": "Leslie E.",
          "phone": null,
        "age": 44,
        "credit_score": 782,
          "address": {
            "address_line": "1234 Main St",
            "city": "Newark",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1705338691},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model Y",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 59800,
      "status" : "active"
      },
      {
        "_id": "21",
        "purchase_type": "In Person",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Rachel I.",
          "phone": null,
        "age": 62,
        "credit_score": 786,
          "address": {
            "address_line": "1234 Park Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1706202691},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [{
          "car" : "BMW M440i Gran Coupe",
          "color": "Silver"
            },
            "Extended warranty - 5 years",
            "Gap Insurance - 5 years"
        ],
        "amount": 65250,
      "status" : "active"
      }
    ],
    "options": {
        "ordered": false
    }
  }
}' | json_pp
Response
{
   "status" : {
      "insertedIds" : [
         "4",
         "7",
         "10",
         "13",
         "16",
         "19",
         "21",
         "18",
         "6",
         "12",
         "15",
         "9",
         "3",
         "11",
         "2",
         "17",
         "14",
         "8",
         "20",
         "5"
      ]
   }
}

Properties:

Name Type Summary

insertMany

command

Data API designation that many documents are being inserted. You can insert up to 20 documents at a time.

document

JSON object

Contains the details of the record added.

_id

uuid4

A unique identifier for the document. Other _id types are possible based on how the collection was created, including a defaultId option on a createCollection command or an equivalent method in a client. See the defaultId option.

purchase_type

string

Specifies how the purchase was made.

$vector

array

A reserved property used to store vector data. The value is an array of numbers, or it can be generated. These numbers could be used for various purposes like similarity searches, clustering, or other mathematical commands that can be applied to vectors. Given that this is a reserved property, the vector-enabled DataStax Enterprise (DSE) database has specialized handling for data stored in this format. That is, optimized query performance for vector similarity.

customer

string

Information about the customer who made the purchase.

customer.name

string

The customer’s name

customer.phone

string

The customer’s contact phone number.

customer.age

number

The customer’s age. Subsequent examples can use the customer.age property to demonstrate range and logical query operators such as $gt, $lt, gte, lte, and $not.

customer.credit_score

number

The customer’s credit score at the time of the car’s purchase. Subsequent examples can use customer.credit_score to demonstrate range and logical query operators.

customer.address

string

Contains further details about the customer’s address.

customer.address_line

string

The customer’s street or location address.

customer.city

string

The customer’s city.

customer.state

string

The state where the customer resides.

purchase_date

date

The date on which the purchase was made, using the $date operator and an Epoch value.

seller

JSON object

Information about the seller from whom the purchase was made.

seller.name

string

The seller’s name.

seller.location

string

The seller’s location.

items

JSON object

An array detailing the items included in this purchase.

items.car

string

Information about the make and model of the car.

items.color

string

Information about the car’s color.

Extended warranty - 5 years

string

Additional detail that’s part of the items array. Indicates the customer has an "Extended warranty - 5 years" as part of the purchase.

amount

number

The total cost of the purchase.

status

string

Current status of the purchase.

preferred_customer

boolean

Whether the buyer is a preferred customer.

Find a document

Retrieve a single document from a collection using various options.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

Retrieve a single document from a collection by its _id.

document = collection.find_one({"_id": 101})

Retrieve a single document from a collection by any attribute, as long as it is covered by the collection’s indexing configuration.

As noted in The Indexing option in the Collections reference topic, any field that is part of a subsequent filter or sort operation must be an indexed field. If you elected to not index certain or all fields when you created the collection, you cannot reference that field in a filter/sort query.

document = collection.find_one({"location": "warehouse_C"})

Retrieve a single document from a collection by an arbitrary filtering clause.

document = collection.find_one({"tag": {"$exists": True}})

Retrieve the most similar document to a given vector.

result = collection.find_one({}, vector=[.12, .52, .32])

Generate a vector and retrieve the most similar document.

result = collection.find_one({}, vectorize="Text to vectorize")

Retrieve only specific fields from a document.

result = collection.find_one({"_id": 101}, projection={"name": True})

Returns:

Union[Dict[str, Any], None] - Either the found document as a dictionary or None if no matching document is found.

Example response
{'_id': 101, 'name': 'John Doe', '$vector': [0.12, 0.52, 0.32]}

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. _id, $vector) are controlled individually. The default projection does not necessarily include all fields of the document. See the projection examples for more on this parameter.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search. That is, Approximate Nearest Neighbors (ANN) search, extracting the most similar document in the collection matching the filter. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to vectorize before performing a vector search. This only works for collections associated with an embedding service. This parameter cannot be used together with vector.

include_similarity

Optional[bool]

A boolean to request the numeric value of the similarity to be returned as an added "$similarity" key in the returned document. Can only be used for vector ANN search, i.e. when either vector is supplied or the sort parameter has the shape {"$vector": …​}.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the order the documents are returned. See the discussion about sorting for details.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

collection.find_one()
# prints: {'_id': '68d1e515-...', 'seq': 37}
collection.find_one({"seq": 10})
# prints: {'_id': 'd560e217-...', 'seq': 10}
collection.find_one({"seq": 1011})
# (returns None for no matches)
collection.find_one(projection={"seq": False})
# prints: {'_id': '68d1e515-...'}
collection.find_one(
    {},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: {'_id': '97e85f81-...', 'seq': 69}
collection.find_one(vector=[1, 0], projection={"*": True})
# prints: {'_id': '...', 'tag': 'D', '$vector': [4.0, 1.0]}

View this topic in more detail on the API Reference.

Retrieve a single document from a collection by its _id.

const doc = await collection.findOne({ _id: '101' });

Retrieve a single document from a collection by any attribute, as long as it is covered by the collection’s indexing configuration.

As noted in The Indexing option in the Collections reference topic, any field that is part of a subsequent filter or sort operation must be an indexed field. If you elected to not index certain or all fields when you created the collection, you cannot reference that field in a filter/sort query.

const doc = await collection.findOne({ location: 'warehouse_C' });

Retrieve a single document from a collection by an arbitrary filtering clause.

const doc = await collection.findOne({ tag: { $exists: true } });

Retrieve the most similar document to a given vector.

const doc = await collection.findOne({}, { vector: [.12, .52, .32] });

Generate a vector and retrieve the most similar document.

const doc = await collection.findOne({}, { vectorize: 'Text to vectorize' });

Retrieve only specific fields from a document.

const doc = await collection.findOne({ _id: '101' }, { projection: { name: 1 } });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to find.

options?

FindOneOptions

The options for this operation.

Options (FindOneOptions):

Name Type Summary

projection?

Projection

Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields.

When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting.

includeSimilarity?

boolean

Requests the numeric value of the similarity to be returned as an added $similarity key in the returned document.

Can only be used when performing a vector search.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to vectorize before performing a vector search. This only works for collections associated with an embedding service. This parameter cannot be used together with vector.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<FoundDoc<Schema> | null> - A promise that resolves to the found document (inc. $similarity if applicable), or null if no matching document is found.

Example:

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
    { name: 'Jane', age: 25, },
    { name: 'Dave', age: 40, },
  ]);

  // Unpredictably prints one of their names
  const unpredictable = await collection.findOne({});
  console.log(unpredictable?.name);

  // Failed find by name (null)
  const failed = await collection.findOne({ name: 'Carrie' });
  console.log(failed);

  // Find by $gt age (Dave)
  const dave = await collection.findOne({ age: { $gt: 30 } });
  console.log(dave?.name);

  // Find by sorting by age (Jane)
  const jane = await collection.findOne({}, { sort: { age: 1 } });
  console.log(jane?.name);

  // Find by vector similarity (John, 1)
  const john = await collection.findOne({}, { vector: [1, 1, 1, 1, 1], includeSimilarity: true });
  console.log(john?.name, john?.$similarity);
})();
  • Operations on documents are performed at Collection level, to get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

// Synchronous
Optional<T> findOne(Filter filter);
Optional<T> findOne(Filter filter, FindOneOptions options);
Optional<T> findById(Object id); // build the filter for you

// Asynchronous
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter);
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter, FindOneOptions options);
CompletableFuture<Optional<DOC>> findByIdAsync(Filter filter);

Returns:

[Optional<T>] - Return the working document matching the filter or Optional.empty() if no document is found.

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

options (optional)

FindOneOptions

Set the different options for the findOne operation. The options are a sort clause, some projection to retrieve sub parts of the documents and a flag to include the similarity in case of a vector search.

Things you must know about Data API requests:

  • A Filter is a Json expression that accept different operators listed on the Data API command page.

  • A Projection is list of flags that indicate if you want to retrieve a field or not

  • The sort clause is used either for similarity search or order results

  • In options you will reveal if you want to include the similarity in the result

{
  "findOne": {
    "filter": {
     "$and": [
        {"field2": {"$gt": 10}},
        {"field3": {"$lt": 20}},
        {"field4": {"$eq": "value"}}
     ]
    },
    "projection": {
      "_id": 0,
      "field": 1,
      "field2": 1,
      "field3": 1
    },
    "sort": {
      "$vector": [ 0.25, 0.25, 0.25,0.25, 0.25]
    },
    "options": {
      "includeSimilarity": true
    }
  }
}

To execute this exact query with Java here is the snippet

collection.findOne(
  Filters.and(
   Filters.gt("field2", 10),
   Filters.lt("field3", 20),
   Filters.eq("field4", "value")
  ),
  new FindOneOptions()
   .projection(Projections.include("field", "field2", "field3"))
   .projection(Projections.exclude("_id"))
   .vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
   .includeSimilarity()
  )
);

// with the import Static Magic
collection.findOne(
  and(
   gt("field2", 10),
   lt("field3", 20),
   eq("field4", "value")
  ),
  vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
   .projection(Projections.include("field", "field2", "field3"))
   .projection(Projections.exclude("_id"))
   .includeSimilarity()
);

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneOptions;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.and;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.gt;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class FindOne {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Complete FindOne
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));
        FindOneOptions options = new FindOneOptions()
                .projection(include("field", "field2", "field3"))
                .projection(exclude("_id"))
                .sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
                .includeSimilarity();
        Optional<Document> result = collection.findOne(filter, options);

        // with the import Static Magic
        collection.findOne(and(
                gt("field2", 10),
                lt("field3", 20),
                eq("field4", "value")),
               new FindOneOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
                .projection(include("field", "field2", "field3"))
                .projection(exclude("_id"))
                .includeSimilarity()
        );

        // find one with a vectorize
        collection.findOne(and(
                        gt("field2", 10),
                        lt("field3", 20),
                        eq("field4", "value")),
                new FindOneOptions().sort("Life is too short to be living somebody else's dream.")
                        .projection(include("field", "field2", "field3"))
                        .projection(exclude("_id"))
                        .includeSimilarity()
        );

        collection.insertOne(new Document()
                .append("field", "value")
                .append("field2", 15)
                .append("field3", 15)
                .vectorize("Life is too short to be living somebody else's dream."));

    }
}

This Data API findOne command retrieves a document based on a filter using a specific _id value.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "findOne": {
    "filter": {"_id" : "14"}
  }
}' | json_pp

Result:

{
   "data" : {
      "document" : {
         "$vector" : [
            0.11,
            0.02,
            0.78,
            0.21,
            0.27
         ],
         "_id" : "14",
         "amount" : 110400,
         "customer" : {
            "address" : {
               "address_line" : "1414 14th Pl",
               "city" : "Brooklyn",
               "state" : "NY"
            },
            "age" : 44,
            "credit_score" : 702,
            "name" : "Kris S.",
            "phone" : "123-456-1144"
         },
         "items" : [
            {
               "car" : "Tesla Model X",
               "color" : "White"
            }
         ],
         "purchase_date" : {
            "$date" : 1698513091
         },
         "purchase_type" : "In Person",
         "seller" : {
            "location" : "Brooklyn NYC",
            "name" : "Jasmine S."
         },
         "status" : "active"
      }
   }
}

Find documents using filtering options

Iterate over documents in a collection matching a given filter.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

doc_iterator = collection.find({"category": "house_appliance"}, limit=10)

Iterate over the documents most similar to a given query vector.

doc_iterator = collection.find({}, vector=[0.55, -0.40, 0.08], limit=5)

Generate a vector and iterate over the documents most similar to it.

doc_iterator = collection.find({}, vectorize="Text to vectorize", limit=5)

Returns:

Cursor - A cursor for iterating over documents. An AstraPy cursor can be used in a for loop, and provides a few additional features.

Example response
Cursor("vector_collection", new, retrieved: 0)

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. _id, $vector) are controlled individually. The default projection does not necessarily include all fields of the document. See the projection examples for more on this parameter.

skip

Optional[int]

With this integer parameter, what would be the first skip documents returned by the query are discarded, and the results start from the (skip+1)-th document. This parameter can be used only in conjunction with an explicit sort criterion of the ascending/descending type (i.e. it cannot be used when not sorting, nor with vector-based ANN search).

limit

Optional[int]

This (integer) parameter sets a limit over how many documents are returned. Once limit is reached (or the cursor is exhausted for lack of matching documents), nothing more is returned.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search; that is, Approximate Nearest Neighbors (ANN) search. When running similarity search on a collection, no other sorting criteria can be specified. Moreover, there is an upper bound to the number of documents that can be returned. For details, see the Data API Limits.

vectorize

Optional[str]

A string to vectorize before performing a vector search. This only works for collections associated with an embedding service. This parameter cannot be used together with vector.

include_similarity

Optional[bool]

A boolean to request the numeric value of the similarity to be returned as an added "$similarity" key in each returned document. Can only be used for vector ANN search, i.e. when either vector is supplied or the sort parameter has the shape {"$vector": …​}.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the order the documents are returned. See the discussion about sorting, including the note on upper bounds on the number of visited documents, for details.

max_time_ms

Optional[int]

A timeout, in milliseconds, for each underlying HTTP request used to fetch documents as you iterate over the cursor. This method uses the collection-level timeout by default.

Example:

# Find all documents in the collection
list(collection.find({}))

# Find all documents in the collection with a specific field value
list(collection.find({
  "a": 123,
}))

# Find all documents in the collection that match a compound filter expression
list(collection.find({
  "$and": [
    {"f1": 1},
    {"f2": 2},
  ]
}))

# Same as the preceeding example, but using the implicit AND operator
list(collection.find({
  "f1": 1,
  "f2": 2,
}))

# Use the "less than" operator in the filter expression
list(collection.find({
  "$and": [
    {"name": "John"},
    {"price": {"$lt": 100}},
  ]
}))

View this topic in more detail on the API Reference.

const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 });

Iterate over the documents most similar to a given query vector.

const cursor = collection.find({}, { vector: [0.55, -0.40, 0.08], limit: 5 });

Generate a vector and iterate over the documents most similar to it.

const cursor = collection.find({}, { vectorize: 'Text to vectorize', limit: 5 });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to find.

options?

FindOptions

The options for this operation.

Options (FindOptions):

Name Type Summary

projection?

Projection

Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields.

When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting.

includeSimilarity?

boolean

Requests the numeric value of the similarity to be returned as an added $similarity key in the returned document.

Can only be used when performing a vector search.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to vectorize before performing a vector search. This only works for collections associated with an embedding service. This parameter cannot be used together with vector.

skip?

number

The number of documents to skip before returning the first document.

limit?

number

The maximum number of documents to return in the lifetime of the cursor.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

Returns:

FindCursor<FoundDoc<Schema>> - A cursor for iterating over the matching documents.

Example:

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
    { name: 'Jane', age: 25, },
    { name: 'Dave', age: 40, },
  ]);

  // Gets all 3 in some order
  const unpredictable = await collection.find({}).toArray();
  console.log(unpredictable);

  // Failed find by name ([])
  const matchless = await collection.find({ name: 'Carrie' }).toArray();
  console.log(matchless);

  // Find by $gt age (John, Dave)
  const gtAgeCursor = collection.find({ age: { $gt: 25 } });
  for await (const doc of gtAgeCursor) {
    console.log(doc.name);
  }

  // Find by sorting by age (Jane, John, Dave)
  const sortedAgeCursor = collection.find({}, { sort: { age: 1 } });
  await sortedAgeCursor.forEach(console.log);

  // Find first by vector similarity (John, 1)
  const john = await collection.find({}, { vector: [1, 1, 1, 1, 1], includeSimilarity: true }).next();
  console.log(john?.name, john?.$similarity);
})();
  • Operations on documents are performed at Collection level. To get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

// Synchronous
FindIterable<T> find(Filter filter, FindOptions options);
// Helper to build filter and options above ^
FindIterable<T> find(FindOptions options); // no filter
FindIterable<T> find(Filter filter); // default options
FindIterable<T> find(); // default options + no filters
FindIterable<T> find(float[] vector, int limit); // semantic search
FindIterable<T> find(Filter filter, float[] vector, int limit);

Returns:

FindIterable<T> - A cursor where the first up to 20 documents are fetched and the rest are fetched as needed. As the same stated it is an Iterable.

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

options (optional)

FindOptions

Set the different options for the find operation. The options are a sort clause, some projection to retrieve sub parts of the documents and a flag to include the similarity in case of a vector search.

The FindIterable is an Iterable and can be used in a for loop to iterate over the documents.

The FindIterable will fetch the documents in chunks of 20, and will fetch more as needed. The FindIterable is a lazy iterator, meaning that it will only fetch the next chunk of documents when needed. s

It provides the method .all() to exhaust it but should be used with caution.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Find {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Find Options
        FindOptions options = new FindOptions()
                .projection(include("field", "field2", "field3")) // select fields
                .projection(exclude("_id")) // exclude some fields
                .sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}) // similarity vector
                .skip(1) // skip first item
                .limit(10) // stop after 10 items (max records)
                .pageState("pageState") // used for pagination
                .includeSimilarity(); // include similarity

        // Execute a find operation
        FindIterable<Document> result = collection.find(filter, options);

        // Iterate over the result
        for (Document document : result) {
            System.out.println(document);
        }
    }
}

There are two examples with Data API find filters in this cURL section.

The first example uses a filter specifying two properties, customer.address.city and customer.address.state, to look for car sales by customers in Hoboken, NJ.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "find": {
    "filter": {
      "customer.address.city": "Hoboken",
      "customer.address.state": "NJ"
    }
  }
}' | json_pp

Result:

{
   "data" : {
      "documents" : [
         {
            "$vector" : [
               0.1,
               0.15,
               0.3,
               0.12,
               0.09
            ],
            "_id" : "17",
            "amount" : 54900,
            "customer" : {
               "address" : {
                  "address_line" : "1234 Main St",
                  "city" : "Hoboken",
                  "state" : "NJ"
               },
               "age" : 61,
               "credit_score" : 694,
               "name" : "Yolanda Z.",
               "phone" : "123-456-1177"
            },
            "items" : [
               {
                  "car" : "Tesla Model 3",
                  "color" : "Blue"
               },
               "Extended warranty - 5 years"
            ],
            "purchase_date" : {
               "$date" : 1702660291
            },
            "purchase_type" : "Online",
            "seller" : {
               "location" : "Jersey City NJ",
               "name" : "Jim A."
            },
            "status" : "active"
         }
      ],
      "nextPageState" : null
   }
}

Parameters:

Name Type Summary

find

command

Selects and returns documents from a collection based on a specified criteria.

filter

object

Contains the criteria that the find command uses to fetch documents from the database.

customer.address.city and customer.address.state

string

Query values in this example that find customers from Hoboken, NJ.

This next Data API find example uses the $and and $or logical operators in a filter. The goal is to find documents where the customer’s city is "Jersey City" or "Orange" AND the seller’s name is "Jim A." or "Tammy S.". For a document to be returned, both these primary conditions (customer’s city and seller’s name) must be met.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
    "find": {
        "filter": {
            "$and": [
                {
                    "$or": [
                        {
                            "customer.address.city": "Jersey City"
                        },
                        {
                            "customer.address.city": "Orange"
                        }
                    ]
                },
                {
                    "$or": [
                        {
                            "seller.name": "Jim A."
                        },
                        {
                            "seller.name": "Tammy S."
                        }
                    ]
                }
            ]
        }
    }
}' | json_pp

Result:

{
   "data" : {
      "documents" : [
         {
            "$vector" : [
               0.3,
               0.23,
               0.15,
               0.17,
               0.4
            ],
            "_id" : "8",
            "amount" : 46900,
            "customer" : {
               "address" : {
                  "address_line" : "1234 Main St",
                  "city" : "Orange",
                  "state" : "NJ"
               },
               "age" : 29,
               "credit_score" : 710,
               "name" : "Harold S.",
               "phone" : "123-456-8888"
            },
            "items" : [
               {
                  "car" : "BMW X3 SUV",
                  "color" : "Black"
               },
               "Extended warranty - 5 years"
            ],
            "purchase_date" : {
               "$date" : 1693329091
            },
            "purchase_type" : "In Person",
            "seller" : {
               "location" : "Staten Island NYC",
               "name" : "Tammy S."
            },
            "status" : "active"
         },
         {
            "$vector" : [
               0.25,
               0.045,
               0.38,
               0.31,
               0.67
            ],
            "_id" : "5",
            "amount" : 94990,
            "customer" : {
               "address" : {
                  "address_line" : "32345 Main Ave",
                  "city" : "Jersey City",
                  "state" : "NJ"
               },
               "age" : 50,
               "credit_score" : 800,
               "name" : "David C.",
               "phone" : "123-456-5555"
            },
            "items" : [
               {
                  "car" : "Tesla Model S",
                  "color" : "Red"
               },
               "Extended warranty - 5 years"
            ],
            "purchase_date" : {
               "$date" : 1690996291
            },
            "purchase_type" : "Online",
            "seller" : {
               "location" : "Jersey City NJ",
               "name" : "Jim A."
            },
            "status" : "active"
         }
      ],
      "nextPageState" : null
   }
}

Parameters:

Name Type Summary

find

command

Selects and returns documents from collections based on a specified criteria.

filter

object

Contains the criteria that the find command uses to fetch documents from the database.

$and

logical operator

Ensures all nested conditions must be met for a record to be returned.

$or

logical operator

A logical operator where any one of the nested conditions must be met. In this example, the first $or nested condition checks whether the customer.address.city property is equal to "Jersey City" or to "Orange". The next $or nested condition check whether the seller.name property is equal to "Jim A." or to "Tammy S.".

Example values for sort operations

  • Python

  • TypeScript

  • Java

  • cURL

When no particular order is required:

sort={}  # (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

from astrapy.constants import SortDocuments
sort={"field": SortDocuments.ASCENDING}
sort={"field": SortDocuments.DESCENDING}

When sorting first by "field" and then by "subfield" (while modern Python versions preserve the order of dictionaries, it is suggested for clarity to employ a collections.OrderedDict in these cases):

sort={
    "field": SortDocuments.ASCENDING,
    "subfield": SortDocuments.ASCENDING,
}

When running a vector similarity (ANN) search:

sort={"$vector": [0.4, 0.15, -0.5]}

Generate a vector to perform a vector similarity search. The collection must be associated with an embedding service.

sort={"$vectorize": "Text to vectorize"}

Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:

  • Vector ANN searches cannot return more than a certain number of documents; currently, 1000 per search operation.

  • When using a sort criterion of the ascending/descending type, the Data API returns a smaller number of documents, currently set to 20, and stops there. The returned documents are the top results across the whole collection according to the requested criterion.

Keep in mind these provisions even when subsequently running a command such as .distinct() on a cursor.

When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents.

The behavior of the cursor — in the case that documents have been added/removed after the find was started — depends on database internals. It it is not guaranteed, nor excluded, that such "real-time" changes in the data would be picked up by the cursor.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
    print(doc["seq"])
...
# will print e.g.:
#   37
#   35
#   10
#   36
#   27
cursor1 = collection.find(
    {},
    limit=4,
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
    {"tag": "A", "$vector": [4, 5]},
    {"tag": "B", "$vector": [3, 4]},
    {"tag": "C", "$vector": [3, 2]},
    {"tag": "D", "$vector": [4, 1]},
    {"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
    document["tag"]
    for document in collection.find(
        {},
        limit=3,
        vector=[3, 3],
    )
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)

Sort is very weakly typed by default—see StrictSort<Schema> for a stronger typed alternative that provides full autocomplete as well.

When no particular order is required:

{ sort: {} }  // (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

{ sort: { field: +1 } }  // ascending
{ sort: { field: -1 } }  // descending

When sorting first by "field" and then by "subfield" (order matters! ES2015+ guarantees string keys in order of insertion):

{ sort: { field: 1, subfield: 1 } }

When running a vector similarity (ANN) search:

{ sort: { $vector: [0.4, 0.15, -0.5] } }

Generate a vector to perform a vector similarity search. The collection must be associated with an embedding service.

{ sort: { $vectorize: "Text to vectorize" } }

Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:

  • Vector ANN searches cannot return more than a certain number of documents; currently, 1000 per search operation.

  • When using a sort criterion of the ascending/descending type, the Data API returns a smaller number of documents, currently set to 20, and stops there. The returned documents are the top results across the whole collection according to the requested criterion.

Keep in mind these provisions even when subsequently running a command such as .distinct(), which uses a cursor underneath.

When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents.

The behavior of the cursor — in the case that documents have been added/removed after the find was started — depends on database internals. It is not guaranteed, nor excluded, that such "real-time" changes in the data would be picked up by the cursor.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([
    { name: 'Jane', age: 25, $vector: [1.0, 1.0, 1.0, 1.0, 1.0] },
    { name: 'Dave', age: 40, $vector: [0.4, 0.5, 0.6, 0.7, 0.8] },
    { name: 'Jack', age: 40, $vector: [0.1, 0.9, 0.0, 0.5, 0.7] },
  ]);

  // Sort by age ascending, then by name descending (Jane, Jack, Dave)
  const sorted1 = await collection.find({}, { sort: { age: 1, name: -1 } }).toArray();
  console.log(sorted1.map(d => d.name));

  // Sort by vector distance (Jane, Dave, Jack)
  const sorted2 = await collection.find({}, { vector: [1, 1, 1, 1, 1] }).toArray();
  console.log(sorted2.map(d => d.name));
})();
  • Use the sort() operations in different options only is you need them, it is optional

  • It is important to keep the order when chaining multiple sorts.

Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
FindOptions.Builder.sort(s1, s2);
  • When running a vector similarity (ANN) search:

FindOptions.Builder
 .sort(new float[] {0.4f, 0.15f, -0.5f});
  • Generate a vector to perform a vector similarity search.

FindOptions.Builder
 .sort("Text to vectorize");

Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:

  • Vector ANN searches cannot return more than a certain number of documents; currently, 1000 per search operation.

  • When using a sort criterion of the ascending/descending type, the Data API returns a smaller number of documents, currently set to 20, and stops there. The returned documents are the top results across the whole collection according to the requested criterion.

Keep in mind these provisions even when subsequently running a command such as .distinct() on a cursor.

When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents.

The behavior of the cursor — in the case that documents have been added/removed after the find was started — depends on database internals. It it is not guaranteed, nor excluded, that such "real-time" changes in the data would be picked up by the cursor.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sort;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;

public class WorkingWithSorts {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sort Clause for a vector
        Sorts.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;

        // Sort Clause for other fields
        Sort s1 = Sorts.ascending("field1");
        Sort s2 = Sorts.descending("field2");

        // Build the sort clause
        new FindOptions().sort(s1, s2);

        // Adding vector
        new FindOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}, s1, s2);

    }
}

This Data API command aims to find and sort documents that are most similar to the specified vector, based on a similarity metric, and uses a projection clause to project specific properties from those documents in the response. The $similarity score (such as 0.99444735) is useful for understanding how close each result is to the queried vector.

  • A value of 0 indicates that the vectors are diametrically opposed.

  • A value of 0.5 suggests the vectors are orthogonal (or perpendicular) and have no match.

  • A value of 1 indicates that the vectors are identical in direction.

In this example response, only the $vector and $similarity properties are returned for each document, making the output more focused and potentially reducing the amount of data transferred.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "find": {
    "sort" : {"$vector" : [0.15, 0.1, 0.1, 0.35, 0.55]},
    "projection" : {"$vector" : 1},
    "options" : {
        "includeSimilarity" : true,
        "limit" : 100
    }
  }
}' | json_pp

Response:

{
   "data" : {
      "documents" : [
         {
            "$similarity" : 1,
            "$vector" : [
               0.15,
               0.1,
               0.1,
               0.35,
               0.55
            ],
            "_id" : "3"
         },
         {
            "$similarity" : 0.9953563,
            "$vector" : [
               0.15,
               0.17,
               0.15,
               0.43,
               0.55
            ],
            "_id" : "18"
         },
         {
            "$similarity" : 0.9732053,
            "$vector" : [
               0.21,
               0.22,
               0.33,
               0.44,
               0.53
            ],
            "_id" : "21"
         },
         {
            "$similarity" : 0.9732053,
            "$vector" : [
               0.21,
               0.22,
               0.33,
               0.44,
               0.53
            ],
            "_id" : "7"
         },
         {
            "$similarity" : 0.96955204,
            "$vector" : [
               0.25,
               0.045,
               0.38,
               0.31,
               0.68
            ],
            "_id" : "10"
         },
         {
            "$similarity" : 0.9691053,
            "$vector" : [
               0.25,
               0.045,
               0.38,
               0.31,
               0.67
            ],
            "_id" : "5"
         },
         {
            "$similarity" : 0.9600924,
            "$vector" : [
               0.44,
               0.11,
               0.33,
               0.22,
               0.88
            ],
            "_id" : "11"
         },
         {
            "$similarity" : 0.9600924,
            "$vector" : [
               0.44,
               0.11,
               0.33,
               0.22,
               0.88
            ],
            "_id" : "20"
         },
         {
            "$similarity" : 0.9600924,
            "$vector" : [
               0.44,
               0.11,
               0.33,
               0.22,
               0.88
            ],
            "_id" : "16"
         },
         {
            "$similarity" : 0.9468591,
            "$vector" : [
               0.33,
               0.44,
               0.55,
               0.77,
               0.66
            ],
            "_id" : "12"
         },
         {
            "$similarity" : 0.94535017,
            "$vector" : [
               0.3,
               0.23,
               0.15,
               0.17,
               0.4
            ],
            "_id" : "8"
         },
         {
            "$similarity" : 0.9163125,
            "$vector" : [
               0.25,
               0.25,
               0.25,
               0.25,
               0.27
            ],
            "_id" : "19"
         },
         {
            "$similarity" : 0.91263497,
            "$vector" : [
               0.25,
               0.25,
               0.25,
               0.25,
               0.26
            ],
            "_id" : "4"
         },
         {
            "$similarity" : 0.9087937,
            "$vector" : [
               0.25,
               0.25,
               0.25,
               0.25,
               0.25
            ],
            "_id" : "1"
         },
         {
            "$similarity" : 0.7909429,
            "$vector" : [
               0.1,
               0.15,
               0.3,
               0.12,
               0.09
            ],
            "_id" : "17"
         },
         {
            "$similarity" : 0.7820388,
            "$vector" : [
               0.1,
               0.15,
               0.3,
               0.12,
               0.08
            ],
            "_id" : "15"
         },
         {
            "$similarity" : 0.77284586,
            "$vector" : [
               0.1,
               0.15,
               0.3,
               0.12,
               0.07
            ],
            "_id" : "13"
         },
         {
            "$similarity" : 0.7711377,
            "$vector" : [
               0.11,
               0.02,
               0.78,
               0.21,
               0.27
            ],
            "_id" : "14"
         },
         {
            "$similarity" : 0.76337516,
            "$vector" : [
               0.1,
               0.15,
               0.3,
               0.12,
               0.06
            ],
            "_id" : "9"
         },
         {
            "$similarity" : 0.75363994,
            "$vector" : [
               0.1,
               0.15,
               0.3,
               0.12,
               0.05
            ],
            "_id" : "2"
         },
         {
            "$similarity" : 0.74406904,
            "$vector" : [
               0.11,
               0.02,
               0.78,
               0.1,
               0.27
            ],
            "_id" : "6"
         }
      ],
      "nextPageState" : null
   }
}

Parameters:

Name Type Summary

find

command

A "find" or search command is to be executed. It contains nested JSON objects that define the search criteria, projection, and other options.

sort

clause

Specifies the vector against which other vectors in the vector-enabled DataStax Enterprise (DSE) database are to be compared. The $vector key is a reserved property name for storing vector data. The vector in this example is set to [0.15, 0.1, 0.1, 0.35, 0.55]. Documents in the database are sorted based on their similarity to this vector.

projection

clause

Specify which properties should be included in the returned documents.

includeSimilarity

boolean

Setting this boolean to true means that the response includes a $similarity score, representing the similarity metric between the sorted vector and the vectors in the database. The returned scores (such as 0.99444735) are useful for understanding how close each result is to the queried vector.

  • A value of 0 indicates that the vectors are diametrically opposed.

  • A value of 0.5 suggests the vectors are orthogonal (or perpendicular) and have no match.

  • A value of 1 indicates that the vectors are identical in direction.

limit

number

Specifies the maximum number of documents to be returned. It’s set to 100, meaning the search returns up to the top 100 most similar documents. Pagination can occur if more than 20 documents are returned in the current set of matching documents.

Example values for projection operations

Certain document operations — such as finding one or multiple documents, find-and-update, find-and-replace, and find-and-delete — allow the use of a projection option to control which part of the document(s) is returned. The projection can generally take one of two forms: either specifying which fields to include or which fields to exclude.

If no projection, or an empty projection, is specified, a default projection is applied by the Data API. This default projection includes at least the identifier (_id) of the document and all its "regular" fields, which are those not starting with a dollar sign. However, future versions of the Data API might exclude other fields (such as $vector) from the documents by default.

When a projection is provided, specific, individually overridable inclusion and exclusion defaults apply for "special" fields, such as _id, $vector, and $vectorize. Conversely, for the regular fields the projection must either list included fields or excluded ones and cannot be a mixture of the two types of specifications.

In order to optimize the response size, a recommended performance improvement is to always provide, when reading, an explicit projection tailored to the needs of the application.

If an application relies on the presence of $vector (or other special fields) in the returned document(s), the projection must explicitly define inclusion of that field.

A quick, if possibly suboptimal, way to ensure the presence of fields is to use the {"*": true} star-projection described below.

A projection is expressed as a mapping of field names to boolean values. To return the document ID, field1, and field2:

{"_id": true, "field1": true, "field2": true}

Specific fields can be excluded, keeping any other field found in the document:

{"field1": false, "field2": false}

Fields specified in the projection but not encountered in the document are simply ignored for that document.

The projection cannot mix include and exclude clauses for regular fields. In other words, it must either have all true or all false values. If a projection has false values, all non-mentioned fields found in the document are included; conversely, if it has true values, all non-mentioned fields in the document are excluded.

Special fields (_id, $vector, and $vectorize) behave differently, in that they have their own default and their presence can be controlled in any way within the projection. For example, the _id field is included by default and can be excluded even in an include-clause projection ({"_id": talse, "field1": true}); conversely. the $vector field is excluded by default and can be included even in an exclude projection ({"field1": false, "$vector": true}).

So, the following are all valid projections:

{"_id": true, "field1": true, "field2": true}
{"_id": false, "field1": true, "field2": true}
{"_id": false, "field1": false, "field2": false}
{"_id": true, "field1": false, "field2": false}
{"_id": true, "field1": true, "field2": true, "$vector": true}
{"_id": true, "field1": true, "field2": true, "$vector": false}
{"_id": false, "field1": true, "field2": true, "$vector": true}
{"_id": false, "field1": true, "field2": true, "$vector": false}
{"_id": false, "field1": false, "field2": false, "$vector": true}
{"_id": false, "field1": false, "field2": false, "$vector": false}
{"_id": true, "field1": false, "field2": false, "$vector": true}
{"_id": true, "field1": false, "field2": false, "$vector": false}

However, the following projection is invalid and will result in an API error:

// Invalid:
{"field1": true, "field2": false}

The special projection path "*" ("star-projection"), which must be the only key in the projection, represents the whole of the document. With the following projection all of the document is returned:

{"*": true}

Conversely, with the following any document would return as {}:

{"*": false}

The values in a projection map can be objects, booleans or number (decimal or integer), but are then treated as booleans by the API. The following two examples include and exclude the four fields respectively:

{"field1": true, "field2": 1, "field3": 90.0, "field4": {"keep": "yes!"}}
{"field1": false, "field2": 0, "field3": 0.0, "field4": {}}

Passing null-like things (such as {}, null or 0) for the whole projection has the same effect as not passing it altogether.

The projection cannot include the special $similarity key — which is not part of the document but is rather computed during vector ANN queries and is controlled through a specific includeSimilarity parameter in the search payload.

However, for array fields, a $slice can be provided to specify which elements of the array to return. It can be in one of the following formats:

// Return the first two elements
{"arr": {"$slice": 2}}

// Return the last two elements
{"arr": {"$slice": -2}}

// Skip 4 elements (from 0th index), return the next 2
{"arr": {"$slice": [4, 2]}}

// Skip backward 4 elements (from the end), return next 2 elements (forward)
{"arr": {"$slice": [-4, 2]}}

The projection can also refer to nested fields: in that case, keys in a subdocument will be included/excluded as requested. If all keys of an existing subdocument are excluded, the document will be returned with the subdocument still present, but consisting of an empty object:

Given the following document:

{
  "_id": "z",
  "a": {
    "a1": 10,
    "a2": 20
  }
}

Here the result of different projections can be seen:

Projection Result

{"a": true}

{"_id": "z", "a": {"a1": 10, "a2": 20}}

{"a.a1": false}

{"_id": "z", "a": {"a2": 20}}

{"a.a1": true}

{"_id": "z", "a": {"a1": 10}}

{"a.a1": false, "a.a2": false}

{"_id": "z", "a": {}}

{"*": false}

{}

Referencing overlapping (sub/)paths in the projection may lead to (possibly) conflicting clauses. These are rejected, so for instance this would yield an API error:

// Invalid:
{"a.a1": true, "a": true}
  • Python

  • TypeScript

  • Java

For the Python client, the type of the projection argument can be not only a Dict[str, Any] in compliance with the general provisions above, but it can also be a list — or other iterable — over key names. In this case it is implied that there are all included in the projection. So, the two following statements are equivalent:

document = collection.find_one(
   {"_id": 101},
   projection={"name": True, "city": True},
)

document = collection.find_one(
   {"_id": 101},
   projection={"name": True, "city": True},
)

The Typescript client simply takes in an untyped Plain Old JavaScript Object (POJO) for the projection parameter.

However, it offers a StrictProjection<Schema> type that provides full autocomplete and type checking for your document schema.

import { StrictProjection } from '@datastax/astra-db-ts';

const doc = await collection.findOne({}, {
  projection: {
    'name': true,
    'address.city': true,
  },
});

interface MySchema {
  name: string,
  address: {
    city: string,
    state: string,
  },
}

const doc = await collection.findOne({}, {
  projection: {
    'name': 1,
    'address.city': 1,
    // @ts-expect-error - 'address.car' does not exist in type StrictProjection<MySchema>
    'address.car': 0,
    // @ts-expect-error - Type { $slice: number } is not assignable to type boolean | 0 | 1 | undefined
    'address.state': { $slice: 3 }
  } satisfies StrictProjection<MySchema>,
});

To support the projection mechanism, the different Options classes provide the projection method in the helpers. This method takes an array of Projection classes providing the field name and a boolean flag to choose between inclusion and exclusion.

Projection p1 = new Projection("field1", true);
Projection p2 = new Projection("field2", true);
FindOptions options1 = FindOptions.Builder.projection(p1, p2);

This syntax can be simplified by leveraging the syntactic sugar called Projections:

FindOptions options2 = FindOptions.Builder
  .projection(Projections.include("field1", "field2"));

FindOptions options3 = FindOptions.Builder
  .projection(Projections.exclude("field1", "field2"));

When it comes to support of $slice for array fields, the Projection class provides a method as well:

// {"arr": {"$slice": 2}}
Projection sliceOnlyStart = Projections.slice("arr", 2, null);

// {"arr": {"$slice": [-4, 2]}}
Projection sliceOnlyRange =Projections.slice("arr", -4, 2);

// An you can use then freely in the different builders
FindOptions options4 = FindOptions.Builder
  .projection(sliceOnlyStart);

Find and update a document

Locate a document matching a filter condition and apply changes to it, returning the document itself.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
)

Locate and update a document, returning the document itself, creating a new one if nothing is found.

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
    upsert=True,
)

Returns:

Dict[str, Any] - The document that was found, either before or after the update (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 999, 'Marco': 'Polo'}

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. See Data API operators for the full syntax.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the document being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. _id, $vector) are controlled individually. The default projection does not necessarily include all fields of the document. See the projection examples for more on this parameter.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to be vectorized and used as the sorting criterion in a vector search. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the updated one. See the sort examples for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, a new document (resulting from applying the update to an empty document) is inserted if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

return_document

str

A flag controlling what document is returned: if set to ReturnDocument.BEFORE, or the string "before", the document found on database is returned; if set to ReturnDocument.AFTER, or the string "after", the new document is returned. The default is "before".

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
)
# prints: {'_id': 'a80106f2-...', 'Marco': 'Polo'}
collection.find_one_and_update(
    {"title": "Mr."},
    {"$inc": {"rank": 3}},
    projection={"title": True, "rank": True},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'a80106f2-...', 'title': 'Mr.', 'rank': 3}
collection.find_one_and_update(
    {"name": "Johnny"},
    {"$set": {"rank": 0}},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_update(
    {"name": "Johnny"},
    {"$set": {"rank": 0}},
    upsert=True,
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'cb4ef2ab-...', 'name': 'Johnny', 'rank': 0}

View this topic in more detail on the API Reference.

const docBefore = await collection.findOneAndUpdate(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
  { returnDocument: 'before' },
);

Locate and update a document, returning the document itself, creating a new one if nothing is found.

const docBefore = await collection.findOneAndUpdate(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
  { upsert: true, returnDocument: 'before' },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to update.

update

UpdateFilter<Schema>

The update to apply to the selected document.

options

FindOneAndUpdateOptions

The options for this operation.

Name Type Summary

returnDocument

'before' | 'after'

Specifies whether to return the original or updated document.

upsert?

boolean

If true, creates a new document if no document matches the filter.

projection?

Projection

Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields.

When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting.

Can only be used when performing a vector search.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to be vectorized and used as the sorting criterion in a vector search.

Equivalent to setting the $vectorize field in the sort field itself. The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vectorize field in the sort object directly.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

includeResultMetadata?

boolean

When true, returns alongside the document, an ok field with a value of 1 if the command executed successfully.

Returns:

Promise<WithId<Schema> | null> - The document before/after the update, depending on the type of returnDocument, or null if no matches are found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert a document
  await collection.insertOne({ 'Marco': 'Polo' });

  // Prints 'Mr.'
  const updated1 = await collection.findOneAndUpdate(
    { 'Marco': 'Polo' },
    { $set: { title: 'Mr.' } },
    { returnDocument: 'after' },
  );
  console.log(updated1?.title);

  // Prints { _id: ..., title: 'Mr.', rank: 3 }
  const updated2 = await collection.findOneAndUpdate(
    { title: 'Mr.' },
    { $inc: { rank: 3 } },
    { projection: { title: 1, rank: 1 }, returnDocument: 'after' },
  );
  console.log(updated2);

  // Prints null
  const updated3 = await collection.findOneAndUpdate(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
    { returnDocument: 'after' },
  );
  console.log(updated3);

  // Prints { _id: ..., name: 'Johnny', rank: 0 }
  const updated4 = await collection.findOneAndUpdate(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
    { upsert: true, returnDocument: 'after' },
  );
  console.log(updated4);
})();
  • Operations on documents are performed at Collection level, to get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

// Synchronous
Optional<T> findOneAndUpdate(Filter filter, Update update);

// Synchronous
CompletableFuture<Optional<T>> findOneAndUpdateAsync(Filter filter, Update update);

Returns:

[Optional<T>] - Return the working document matching the filter or Optional.empty() if no document is found.

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

update

Update

Set the different options for the find operation. The options are a sort clause, some projection to retrieve sub parts of the documents and a flag to include the similarity in case of a vector search.

What you need to know:

To build the different parts of the requests a set of helper classes are provided suffixed by a s like Filters for Filter.

Update is no different and you can leverage the class Updates.

Update update = Updates
 .set("field1", "value1")
 .inc("field2", 1d)
 .unset("field3");

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.Updates;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndUpdate {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Building the update
        Update update = Updates.set("field1", "value1")
                .inc("field2", 1d)
                .unset("field3");

        Optional<Document> doc = collection.findOneAndUpdate(filter, update);

    }
}

The following Data API findOneAndUpdate command uses the $sort and $set operators to update the status of one matching document (per $vector) as active.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
    "findOneAndUpdate": {
        "sort": {
            "$vector": [
                0.25,
                0.045,
                0.38,
                0.31,
                0.67
            ]
        },
        "update": {
            "$set": {
                "status": "active"
            }
        },
        "options": {
            "returnDocument": "after"
        }
    }
}' | json_pp

Response:

In this case, notice that the response returns a modifiedCount of 0 because the matching document’s status was already active.

{
    "data": {
        "document": {
            "_id": "5",
            "purchase_type": "Online",
            "$vector": [
                0.25,
                0.045,
                0.38,
                0.31,
                0.67
            ],
            "customer": {
                "name": "David C.",
                "phone": "123-456-5555",
                "age": 50,
                "credit_score": 800,
                "address": {
                    "address_line": "32345 Main Ave",
                    "city": "Jersey City",
                    "state": "NJ"
                }
            },
            "purchase_date": {
                "$date": 1690996291
            },
            "seller": {
                "name": "Jim A.",
                "location": "Jersey City NJ"
            },
            "items": [
                {
                    "car": "Tesla Model S",
                    "color": "Red"
                },
                "Extended warranty - 5 years"
            ],
            "amount": 94990,
            "status": "active"
        }
    },
    "status": {
        "matchedCount": 1,
        "modifiedCount": 0
    }
}

Parameters:

Name Type Summary

findOneAndUpdate

command

Find one document based on certain criteria and determine if the document should be updated.

sort

clause

Contains an object specifying the sort criteria for selecting the document.

$vector

array

Indicates a vector-based sort operation, where the documents are sorted based on the provided vector values. In this example, [0.15, 0.1, 0.1, 0.35, 0.55].

$vectorize

string

A string to be vectorized and used as the sorting criterion in a vector search.

update

clause

Contains the changes to be applied to the selected document.

$set

Update operator

Used to set the value of a field. Here, it is used to set the status property of the document to active.

options

clause

Provides additional settings for the findOneAndUpdate command.

returnDocument

clause

In this example, the returnDocument: after` option specifies that the modified document should be returned in the response after the update is applied. This allows the client to see the updated state of the document immediately. In this case, though, notice that the response returns a modifiedCount of 0 because the matching document’s status was already active.

Update a document

Update a single document on the collection as requested.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

update_result = collection.update_one(
    {"_id": 456},
    {"$set": {"name": "John Smith"}},
)

Update a single document on the collection, inserting a new one if no match is found.

update_result = collection.update_one(
    {"_id": 456},
    {"$set": {"name": "John Smith"}},
    upsert=True,
)

Returns:

UpdateResult - An object representing the response from the database after the update operation. It includes information about the operation.

Example response
UpdateResult(raw_results=[{'data': {'document': {'_id': '1', 'name': 'John Doe'}}, 'status': {'matchedCount': 1, 'modifiedCount': 1}}], update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. See Data API operators for the full syntax.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to be vectorized and used as the sorting criterion in a vector search. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the updated one. See the sort examples for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, a new document (resulting from applying the update to an empty document) is inserted if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})

collection.update_one({"Marco": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
collection.update_one({"Mirko": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.update_one(
    {"Mirko": {"$exists": True}},
    {"$inc": {"rank": 3}},
    upsert=True,
)
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '2a45ff60-...'})

View this topic in more detail on the API Reference.

const result = await collection.updateOne(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
);

Update a single document on the collection, inserting a new one if no match is found.

const result = await collection.updateOne(
  { $and: [{ name: 'Jesse' }, { gender: 'M' }] },
  { $set: { title: 'Mr.' } },
  { upsert: true },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to update.

update

UpdateFilter<Schema>

The update to apply to the selected document.

options?

UpdateOneOptions

The options for this operation.

Options (UpdateOneOptions):

Name Type Summary

upsert?

boolean

If true, creates a new document if no document matches the filter.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to be vectorized and used as the sorting criterion in a vector search.

Equivalent to setting the $vectorize field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vectorize field in the sort object directly.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

Returns:

Promise<UpdateOneResult<Schema>> - The result of the update operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert a document
  await collection.insertOne({ 'Marco': 'Polo' });

  // Prints 1
  const updated1 = await collection.updateOne(
    { 'Marco': 'Polo' },
    { $set: { title: 'Mr.' } },
  );
  console.log(updated1?.modifiedCount);

  // Prints 0 0
  const updated2 = await collection.updateOne(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
  );
  console.log(updated2.matchedCount, updated2?.upsertedCount);

  // Prints 0 1
  const updated3 = await collection.updateOne(
    { name: 'Johnny' },
    { $set: { rank: 0 } },
    { upsert: true },
  );
  console.log(updated3.matchedCount, updated3?.upsertedCount);
})();
  • Operations on documents are performed at Collection level, to get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

// Synchronous
UpdateResult updateOne(Filter filter, Update update);

// Asynchronous
CompletableFuture<UpdateResult<T>> updateOneAsync(Filter filter, Update update);

Returns:

UpdateResults<T> - Result of the operation with the number of documents matched (matchedCount) and updated (modifiedCount)

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

update

Update

Set the different options for the find operation. The options are a sort clause, some projection to retrieve sub parts of the documents and a flag to include the similarity in case of a vector search.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.UpdateResult;
import com.datastax.astra.client.model.Updates;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class UpdateOne {
    // Given an existing collection
    Collection<Document> collection = new DataAPIClient("TOKEN")
            .getDatabase("API_ENDPOINT")
            .getCollection("COLLECTION_NAME");

    // Building a filter
    Filter filter = Filters.and(
            Filters.gt("field2", 10),
            lt("field3", 20),
            Filters.eq("field4", "value"));

    // Building the update
    Update update = Updates.set("field1", "value1")
            .inc("field2", 1d)
            .unset("field3");

    UpdateResult result = collection.updateOne(filter, update);
}

The following Data API updateOne command uses the $set update operator to set the value of a property (which uses the dot notation customer.name) to a new value.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "updateOne": {
    "filter": {
      "_id": "upsert-id"
    },
    "update" : {"$set" : { "customer.name" : "CUSTOMER 22"}}
  }
}' | json_pp

Response:

{
   "status" : {
      "matchedCount" : 1,
      "modifiedCount" : 1
   }
}

Parameters:

Name Type Summary

updateOne

command

Updates a single document that matches the given criteria within a database collection.

filter

clause

Used to select the document to be updated.

_id

key

This key within the filter object targets a unique identifier property in the database’s documents. The accompanying value upsert-id is the specific ID the API looks for when determining which document to update.

update

object

Specifies what updates are applied to the document that meets the filter criteria. It’s an object that contains database update operators and the modifications they perform.

$set

Update operator

Sets the value of a property in a document. In this example, customer is a nested document or a property within the main document, and name is a property within customer. The operation targets this nested field. The CUSTOMER 22 value is what the customer.name property is updated to during the operation.

Update multiple documents

Update multiple documents in a collection.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

results = collection.update_many(
    {"name": {"$exists": False}},
    {"$set": {"name": "unknown"}},
)

Update multiple documents in a collection, inserting a new one if no matches are found.

results = collection.update_many(
    {"name": {"$exists": False}},
    {"$set": {"name": "unknown"}},
    upsert=True,
)

Returns:

UpdateResult - An object representing the response from the database after the update operation. It includes information about the operation.

Example response
UpdateResult(raw_results=[{'status': {'matchedCount': 2, 'modifiedCount': 2}}], update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2})

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. See Data API operators for the full syntax.

upsert

bool

This parameter controls the behavior in absence of matches. If True, a single new document (resulting from applying update to an empty document) is inserted if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. You may need to increase the timeout duration when updating a large number of documents, as the update will require multiple HTTP requests in sequence.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"c": "red"}, {"c": "green"}, {"c": "blue"}])

collection.update_many({"c": {"$ne": "green"}}, {"$set": {"nongreen": True}})
# prints: UpdateResult(raw_results=..., update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2})
collection.update_many({"c": "orange"}, {"$set": {"is_also_fruit": True}})
# prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.update_many(
    {"c": "orange"},
    {"$set": {"is_also_fruit": True}},
    upsert=True,
)
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '46643050-...'})

View this topic in more detail on the API Reference.

const result = await collection.updateMany(
  { name: { $exists: false } },
  { $set: { title: 'unknown' } },
);

Update multiple documents in a collection, inserting a new one if no matches are found.

const result = await collection.updateMany(
  { name: { $exists: false } },
  { $set: { title: 'unknown' } },
  { upsert: true },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to update.

update

UpdateFilter<Schema>

The update to apply to the selected documents.

options?

UpdateManyOptions

The options for this operation.

Options (UpdateManyOptions):

Name Type Summary

upsert?

boolean

If true, creates a new document if no document matches the filter.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

Returns:

Promise<UpdateManyResult<Schema>> - The result of the update operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany([{ c: 'red' }, { c: 'green' }, { c: 'blue' }]);

  // { modifiedCount: 2, matchedCount: 2, upsertedCount: 0 }
  await collection.updateMany({ c: { $ne: 'green' } }, { $set: { nongreen: true } });

  // { modifiedCount: 0, matchedCount: 0, upsertedCount: 0 }
  await collection.updateMany({ c: 'orange' }, { $set: { is_also_fruit: true } });

  // { modifiedCount: 0, matchedCount: 0, upsertedCount: 1, upsertedId: '...' }
  await collection.updateMany({ c: 'orange' }, { $set: { is_also_fruit: true } }, { upsert: true });
})();
  • Operations on documents are performed at Collection level, to get details on each signature you can access the Collection JavaDOC.

  • Collection is a generic class, default type is Document but you can specify your own type and the object will be serialized by Jackson.

  • Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by Async and return a CompletableFuture.

// Synchronous
UpdateResult updateMany(Filter filter, Update update);
UpdateResult updateMany(Filter filter, Update update, UpdateManyOptions);

// Synchronous
CompletableFuture<UpdateResult<T>> updateManyAsync(Filter filter, Update update);
CompletableFuture<UpdateResult<T>> updateManyAsync(Filter filter, Update update, UpdateManyOptions);

Returns:

UpdateResults<T> - Result of the operation with the number of documents matched (matchedCount) and updated (modifiedCount)

Parameters:

Name Type Summary

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

update

Update

Set the different options for the find operation. The options are a sort clause, some projection to retrieve sub parts of the documents and a flag to include the similarity in case of a vector search.

options

UpdateManyOptions

Contains the options for update many here you can set the upsert flag.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.UpdateManyOptions;
import com.datastax.astra.client.model.UpdateResult;
import com.datastax.astra.client.model.Updates;

import static com.datastax.astra.client.model.Filters.lt;

public class UpdateMany {

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        Update update = Updates.set("field1", "value1")
                .inc("field2", 1d)
                .unset("field3");

        UpdateManyOptions options =
                new UpdateManyOptions().upsert(true);

        UpdateResult result = collection.updateMany(filter, update, options);
    }
}

Use the Data API updateMany command to update multiple documents in a collection.

In this example, the JSON payload uses the $set update operator to change a status to "inactive" for those documents that have an "active" status.

The updateMany command includes pagination support in the event more documents that matched the filter are on a subsequent page. For more, see the pagination note after the cURL example.

The JSON structure is sent via an HTTP POST request to a server within an authenticated vector-enabled DataStax Enterprise (DSE) database. Via the environment variables, the keyspace name is default_keyspace; and the collection name in this example is vector_collection.

Example:

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "updateMany": {
    "filter": {"status" : "active" },
    "update" : {"$set" : { "status" : "inactive"}}
  }
}' | json_pp

Result:

{
   "status" : {
      "matchedCount" : 20,
      "modifiedCount" : 20,
      "moreData" : true
   }
}
Name Type Summary

updateMany

command

Updates multiple documents in the database’s collection.

filter

object

Defines the criteria for selecting documents to which the command applies. The filter looks for documents where: * status: The key being evaluated in each document; a property within the documents in the database. * active: The value that the status property must match for the document to be selected. In this case, it’s targeting documents that currently have a status of active.

update

object

Specifies the modifications to be applied to all documents that match the criteria set by the filter.

$set

operator

An update operator indicating that the operation should overwrite the value of a property (or properties) in the selected documents.

status

String

Specifies the property in the document to update. In this example, active or inactive will be set for all selected documents. In this context, it’s changing the status from active to inactive.

In the updateMany response, check whether a nextPageState ID was returned. The updateMany command includes pagination support. You can update one page of matching documents at a time. If there is a subsequent page with matching documents to update, the transaction returns a nextPageState ID. You would then submit the insertMany command again and include the pageState ID in the new request to update the next page of documents that matched the filter:

{
    "updateMany": {
        "filter": {
            "active_user": true
        },
        "update": {
            "$set": {
                "new_data": "new_data_value"
            }
        },
        "options": {
            "pageState": "<id-value-from-prior-response>"
        }
    }
}

During the pagination process, you would then follow the sequence of one or more insertMany commands until all pages with documents matching the filter have the update applied.

Find distinct values across documents

Get a list of the distinct values of a certain key in a collection.

  • Python

  • TypeScript

  • Java

View this topic in more detail on the API Reference.

collection.distinct("category")

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

collection.distinct(
    "food.allergies",
    filter={"registered_for_dinner": True},
)

Returns:

List[Any] - A list of the distinct values encountered. Documents that lack the requested key are ignored.

Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]

Parameters:

Name Type Summary

key

str

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: "field", "field.subfield", "field.3", and "field.3.subfield". If lists are encountered and no numeric index is specified, all items in the list are visited.

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default.

Keep in mind that distinct is a client-side operation, which effectively browses all required documents using the logic of the find method and collects the unique values found for key. As such, there may be performance, latency and ultimately billing implications if the amount of matching documents is large.

For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_many(
    [
        {"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
        {"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
    ]
)

collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]

View this topic in more detail on the API Reference.

const unique = await collection.distinct('category');

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

const unique = await collection.distinct(
  'food.allergies',
  { registeredForDinner: true },
);

Parameters:

Name Type Summary

key

string

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: 'field', 'field.subfield', 'field.3', and 'field.3.subfield'. If lists are encountered and no numeric index is specified, all items in the list are visited.

filter?

Filter<Schema>

A filter to select the documents to use. If not provided, all documents will be used.

Returns:

Promise<Flatten<(SomeDoc & ToDotNotation<FoundDoc<Schema>>)[Key]>[]> - A promise which resolves to the unique distinct values.

The return type is mostly accurate, but with complex keys, it may be required to manually cast the return type to the expected type.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertOne({ name: 'Marco', food: ['apple', 'orange'], city: 'Helsinki' });
  await collection.insertOne({ name: 'Emma', food: { likes_fruit: true, allergies: [] } });

  // ['Marco', 'Emma']
  await collection.distinct('name')

  // ['Helsinki']
  await collection.distinct('city')

  // ['apple', 'orange', { likes_fruit: true, allergies: [] }]
  await collection.distinct('food')

  // ['orange']
  await collection.distinct('food.1')

  // []
  await collection.distinct('food.allergies')

  // [true]
  await collection.distinct('food.likes_fruit')
})();

Gets the distinct values of the specified field name.

// Synchronous
DistinctIterable<T,F> distinct(String fieldName, Filter filter, Class<F> resultClass);
DistinctIterable<T,F> distinct(String fieldName, Class<F> resultClass);

// Asynchronous
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Filter filter, Class<F> resultClass);
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Class<F> resultClass);

Returns:

DistinctIterable<F> - List of distinct values of the specified field name.

Parameters:

Name Type Summary

fieldName

String

The name of the field on which project the value.

filter

Filter

Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression.

resultClass

Class

The type of the field we are working on

Keep in mind that distinct is a client-side operation, which effectively browses all required documents using the logic of the find method and collects the unique values found for key. As such, there may be performance, latency and ultimately billing implications if the amount of matching documents is large.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DistinctIterable;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;

import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;

public class Distinct {
    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Execute a find operation
        DistinctIterable<Document, String> result = collection
                .distinct("field", String.class);
        DistinctIterable<Document, String> result2 = collection
                .distinct("field", filter, String.class);

        // Iterate over the result
        for (String fieldValue : result) {
            System.out.println(fieldValue);
        }
    }
}

Count documents in a collection

Get the count of documents in a collection. Count all documents or apply filtering to count a subset of documents.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

collection.count_documents({}, upper_bound=500)

Get the count of the documents in a collection matching a condition.

collection.count_documents({"seq":{"$gt": 15}}, upper_bound=50)

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

Example response
320

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

upper_bound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upper_bound.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": i} for i in range(20)])

collection.count_documents({}, upper_bound=100)
# prints: 20
collection.count_documents({"seq":{"$gt": 15}}, upper_bound=100)
# prints: 4
collection.count_documents({}, upper_bound=10)
# Raises: astrapy.exceptions.TooManyDocumentsToCountException

View this topic in more detail on the API Reference.

const numDocs = await collection.countDocuments({}, 500);

Get the count of the documents in a collection matching a filter.

const numDocs = await collection.countDocuments({ seq: { $gt: 15 } }, 50);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the documents to count. If not provided, all documents will be counted.

upperBound

number

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception is raised. An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of upperBound.

options?

WithTimeout

The options (the timeout) for this operation.

Returns:

Promise<number> - A promise that resolves to the exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound, in which case an exception is raised.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some documents
  await collection.insertMany(Array.from({ length: 20 }, (_, i) => ({ seq: i })));

  // Prints 20
  await collection.countDocuments({}, 100);

  // Prints 4
  await collection.countDocuments({ seq: { $gt: 15 } }, 100);

  // Throws TooManyDocumentsToCountError
  await collection.countDocuments({}, 10);
})();
// Synchronous
int countDocuments(int upperBound)
throws TooManyDocumentsToCountException;

int countDocuments(Filter filter, int upperBound)
throws TooManyDocumentsToCountException;

Get the count of the documents in a collection matching a condition.

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

Parameters:

Name Type Summary

filter (optional)

Filter

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

upperBound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception will be raised. Furthermore, if the actual number of documents exceeds the maximum count that the Data API can reach (regardless of upper_bound), an exception will be raised.

The checked exception TooManyDocumentsToCountException is raised when the actual number of documents exceeds the upper bound set by the caller or the API. This exception indicates that there are more matching documents beyond the count threshold.

Consider modifying your conditions to count fewer documents at once. If you need to count large numbers of documents, consider using the Data API estimatedDocumentCount command.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import static com.datastax.astra.client.model.Filters.lt;

public class CountDocuments {
    public static void main(String[] args)  {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        try {
            // Count with no filter
            collection.countDocuments(500);

            // Count with a filter
            collection.countDocuments(filter, 500);

        } catch(TooManyDocumentsToCountException tmde) {
            // Explicit error if the count is above the upper limit or above the 1000 limit
        }

    }


}

Use the Data API countDocuments command to obtain the exact count of documents in a collection:

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
            "countDocuments": {
            }
}' | json_pp

You can provide an optional filter condition:

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
            "countDocuments": {
                "filter": {
                    "year": {"$gt": 2000}
                }
            }
}' | json_pp

Returns:

count - The exact count of the documents counted as requested, unless it exceeds the API-set upper bound, in which case the overflow is reported in the response by the moreData flag.

Example response
{
    "status": {
        "count": 105
    }
}

Properties:

Name Type Summary

countDocuments

command

Returns an exact count of documents in a collection. By default, all documents are counted.

filter

JSON object

Optional filtering clause for countDocuments. If included, countDocuments counts the subset of documents matching the filter.

This operation is suited to use cases where the number of documents to count is moderate. Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API. If the count total exceeds the server-side threshold, the response includes "moreData": true to indicate that there are more matching documents beyond the count threshold.

{
    "status": {
        "moreData": true,
        "count": 1000
    }
}

If you need to count large numbers of documents, consider using the Data API estimatedDocumentCount command.

Estimate document count in a collection

Get an approximate document count for an entire collection. Filtering isn’t supported.

In the estimatedDocumentCount command’s response, the document count is based on current system statistics at the time the request is received by the database server. Due to potential in-progress updates (document additions and deletions), the actual number of documents in the collection can be lower or higher in the database.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

collection.estimated_document_count()

Returns:

int - A server-side estimate of the total number of documents in the collection.

Example response
37500

Parameters:

Name Type Summary

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_DB_API_ENDPOINT("01234567-...")
collection = database.my_collection

collection.estimated_document_count()
# prints: 37500

View this topic in more detail on the API Reference.

const estNumDocs = await collection.estimatedDocumentCount();

Parameters:

Name Type Summary

options?

WithTimeout

The options (the timeout) for this operation.

Returns:

Promise<number> - A promise that resolves to a server-side estimate of the total number of documents in the collection.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  console.log(await collection.estimatedDocumentCount());
})();

View this topic in more detail on the API Reference.

long estimatedDocumentCount();
long estimatedDocumentCount(EstimatedCountDocumentsOptions options);

Parameters:

Name Type Summary

options?

options

Set different options for the estimatedDocumentCount operation, such as timeout and httpSettings.

Returns:

long - A server-side estimate of the total number of documents in the collection. This estimate is built from the SSTable files.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.EstimatedCountDocumentsOptions;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.internal.command.LoggingCommandObserver;

import static com.datastax.astra.client.model.Filters.lt;

public class EstimateCountDocuments {

    public static void main(String[] args)  {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Count with no filter
        long estimatedCount = collection.estimatedDocumentCount();

        // Count with options (adding a logger)
        EstimatedCountDocumentsOptions options = new EstimatedCountDocumentsOptions()
                    .registerObserver("logger", new LoggingCommandObserver(DataAPIClient.class));
        long estimateCount2 = collection.estimatedDocumentCount(options);
    }


}

Use the Data API estimatedDocumentCount command to return the approximate number of documents in the collection.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
            "estimatedDocumentCount": {
            }
}' | json_pp

Returns:

count - An estimate of the total number of documents in the collection.

Example response
{
    "status": {
        "count": 37500
    }
}

Properties:

Name Type Summary

estimatedDocumentCount

command

Returns an estimated count of documents within the context of the specified collection.

The estimatedDocumentCount object is empty ({}) because there are no filters or options for this command.

Find and replace a document

Locate a document matching a filter condition and replace it with a new document, returning the document itself.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
)

Locate and replace a document, returning the document itself, additionally creating it if nothing is found.

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
    upsert=True,
)

Returns:

Dict[str, Any] - The document that was found, either before or after the replacement (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 'rule1', 'text': 'all animals are equal'}

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

replacement

Dict[str, Any]

the new document to write into the collection.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the document being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. _id, $vector) are controlled individually. The default projection does not necessarily include all fields of the document. See the projection examples for more on this parameter.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to be vectorized and used as the sorting criterion in a vector search. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the replaced one. See the sort examples for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, replacement is inserted as a new document if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

return_document

str

A flag controlling what document is returned: if set to ReturnDocument.BEFORE, or the string "before", the document found on database is returned; if set to ReturnDocument.AFTER, or the string "after", the new document is returned. The default is "before".

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
import astrapy

collection.insert_one({"_id": "rule1", "text": "all animals are equal"})

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
)
# prints: {'_id': 'rule1', 'text': 'all animals are equal'}
collection.find_one_and_replace(
    {"text": "some animals are more equal!"},
    {"text": "and the pigs are the rulers"},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'rule1', 'text': 'and the pigs are the rulers'}
collection.find_one_and_replace(
    {"_id": "rule2"},
    {"text": "F=ma^2"},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_replace(
    {"_id": "rule2"},
    {"text": "F=ma"},
    upsert=True,
    return_document=astrapy.constants.ReturnDocument.AFTER,
    projection={"_id": False},
)
# prints: {'text': 'F=ma'}

View this topic in more detail on the API Reference.

const docBefore = await collection.findOneAndReplace(
  { _id: 123 },
  { text: 'some animals are more equal!' },
  { returnDocument: 'before' },
);

Locate and replace a document, returning the document itself, additionally creating it if nothing is found.

const docBefore = await collection.findOneAndReplace(
  { _id: 123 },
  { text: 'some animals are more equal!' },
  { returnDocument: 'before', upsert: true },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to replace.

replacement

NoId<Schema>

The replacement document, which contains no _id field.

options

FindOneAndReplaceOptions

The options for this operation.

Name Type Summary

returnDocument

'before' | 'after'

Specifies whether to return the original or replaced document.

upsert?

boolean

If true, creates a new document if no document matches the filter.

projection?

Projection

Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields.

When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting.

Can only be used when performing a vector search.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to be vectorized and used as the sorting criterion in a vector search.

Equivalent to setting the $vectorize field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vectorize field in the sort object directly.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

includeResultMetadata?

boolean

When true, returns alongside the document, an ok field with a value of 1 if the command executed successfully.

Returns:

Promise<WithId<Schema> | null> - The document before/after the update, depending on the type of returnDocument, or null if no matches are found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertOne({ _id: "rule1", text: "all animals are equal" });

  // { _id: 'rule1', text: 'all animals are equal' }
  await collection.findOneAndReplace(
    { _id: "rule1" },
    { text: "some animals are more equal!" },
    { returnDocument: 'before' }
  );

  // { _id: 'rule1', text: 'and the pigs are the rulers' }
  await collection.findOneAndReplace(
    { text: "some animals are more equal!" },
    { text: "and the pigs are the rulers" },
    { returnDocument: 'after' }
  );

  // null
  await collection.findOneAndReplace(
    { _id: "rule2" },
    { text: "F=ma^2" },
    { returnDocument: 'after' }
  );

  // { text: 'F=ma' }
  await collection.findOneAndReplace(
    { _id: "rule2" },
    { text: "F=ma" },
    { upsert: true, returnDocument: 'after', projection: { _id: false } }
  );
})();
// Synchronous
Optional<T> findOneAndReplace(Filter filter, T replacement);
Optional<T> findOneAndReplace(Filter filter, T replacement, FindOneAndReplaceOptions options);

// Asynchronous
CompletableFuture<Optional<T>> findOneAndReplaceAsync(Filter filter, T replacement);
CompletableFuture<Optional<T>> findOneAndReplaceAsync(Filter filter, T replacement, FindOneAndReplaceOptions options);

Returns:

Optional<T> - Return the a document that matches the filter. Whether returnDocument is set to before or after it will return the document before or after update accordingly.

Parameters:

Name Type Summary

filter (optional)

Filter

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

replacement

T

This is the document that will replace the existing one if exist. It flag upsert is set to true and no document is found, this document will be inserted.

options(optional)

FindOneAndReplaceOptions

Provide list of options for findOneAndReplace operation as a Sort clause (sort on vector or any other field) or a Projection clause, upsert flag and returnDocument flag.

Sample definition of FindOneAndReplaceOptions:

 FindOneAndReplaceOptions options = FindOneAndReplaceOptions.Builder
  .projection(Projections.include("field1"))
  .sort(Sorts.ascending("field1"))
  .upsert(true)
  .returnDocumentAfter();

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneAndReplaceOptions;
import com.datastax.astra.client.model.Projections;
import com.datastax.astra.client.model.Sorts;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndReplace {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        FindOneAndReplaceOptions options = new FindOneAndReplaceOptions()
                .projection(Projections.include("field1"))
                .sort(Sorts.ascending("field1"))
                .upsert(true)
                .returnDocumentAfter();

        Document docForReplacement = new Document()
                .append("field1", "value1")
                .append("field2", 20)
                .append("field3", 30)
                .append("field4", "value4");

        // It will return the document before deleting it
        Optional<Document> docBeforeReplace = collection
                .findOneAndReplace(filter, docForReplacement, options);
    }
}

Use the Data API fineOneAndReplace command to find an existing document that matches the filter criteria and replace the document with a new one.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
    "findOneAndReplace": {
      "filter" : {
        "_id" : "14"
        },
        "replacement" : { "customer.name": "Ann Jones", "status" : "inactive" }
    }
}' | json_pp

Parameters:

Name Type Summary

findOneAndReplace

command

Finds a single document that matches a specified filter and replaces it with the provided replacement document. This operation is atomic within a single document.

filter

clause

Specifies the criteria for selecting the document to replace. In this example, it’s a document with an _id value of 14.

replacement

clause

Specifies the new content of the document that will replace the existing document found using the filter criteria. The replacement content provided is a document with two fields:

  • customer.name: Set to "Ann Jones" in this examples.

  • status: Set to "inactive" in this example.

Replace a document

Replace a document in the collection with a new one.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

replace_result = collection.replace_one(
    {"Marco": {"$exists": True}},
    {"Buda": "Pest"},
)

Replace a document in the collection with a new one, creating a new one if no match is found.

replace_result = collection.replace_one(
    {"Marco": {"$exists": True}},
    {"Buda": "Pest"},
    upsert=True,
)

Returns:

UpdateResult - An object representing the response from the database after the replace operation. It includes information about the operation.

Example response
UpdateResult(raw_results=[{'data': {'document': {'_id': '1', 'Marco': 'Polo'}}, 'status': {'matchedCount': 1, 'modifiedCount': 1}}], update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})

Parameters:

Name Type Summary

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

replacement

Dict[str, Any]

the new document to write into the collection.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to be vectorized and used as the sorting criterion in a vector search. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the replaced one. See the sort examples for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, replacement is inserted as a new document if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})
collection.replace_one({"Marco": {"$exists": True}}, {"Buda": "Pest"})
 prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
collection.find_one({"Buda": "Pest"})
 prints: {'_id': '8424905a-...', 'Buda': 'Pest'}
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"})
 prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"}, upsert=True)
 prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '931b47d6-...'})

View this topic in more detail on the API Reference.

const result = await collection.replaceOne(
  { 'Marco': 'Polo' },
  { 'Buda': 'Pest' },
);

Replace a document in the collection with a new one, creating a new one if no match is found.

const result = await collection.replaceOne(
  { 'Marco': 'Polo' },
  { 'Buda': 'Pest' },
  { upsert: true },
);

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to replace.

replacement

NoId<Schema>

The replacement document, which contains no _id field.

options?

ReplaceOneOptions

The options for this operation.

Options (ReplaceOneOptions):

Name Type Summary

upsert?

boolean

If true, creates a new document if no document matches the filter.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to be vectorized and used as the sorting criterion in a vector search.

Equivalent to setting the $vectorize field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vectorize field in the sort object directly.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

Returns:

Promise<ReplaceOneResult<Schema>> - The result of the replacement operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertOne({ 'Marco': 'Polo' });

  // { modifiedCount: 1, matchedCount: 1, upsertedCount: 0 }
  await collection.replaceOne(
    { 'Marco': { '$exists': true } },
    { 'Buda': 'Pest' }
  );

  // { _id: '3756ce75-aaf1-430d-96ce-75aaf1730dd3', Buda: 'Pest' }
  await collection.findOne({ 'Buda': 'Pest' });

  // { modifiedCount: 0, matchedCount: 0, upsertedCount: 0 }
  await collection.replaceOne(
    { 'Mirco': { '$exists': true } },
    { 'Oh': 'yeah?' }
  );

  // { modifiedCount: 0, matchedCount: 0, upsertedId: '...', upsertedCount: 1 }
  await collection.replaceOne(
    { 'Mirco': { '$exists': true } },
    { 'Oh': 'yeah?' },
    { upsert: true }
  );
})();
// Synchronous
UpdateResult replaceOne(Filter filter, T replacement);
UpdateResult replaceOne(Filter filter, T replacement, ReplaceOneOptions options);

// Asynchronous
CompletableFuture<UpdateResult> replaceOneAsync(Filter filter, T replacement);
CompletableFuture<UpdateResult> replaceOneAsync(Filter filter, T replacement, ReplaceOneOptions options);

Returns:

UpdateResult - Return a wrapper object with the result of the operation. The object contains the number of documents matched (matchedCount) and updated (modifiedCount)

Parameters:

Name Type Summary

filter (optional)

Filter

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

replacement

T

This is the document that will replace the existing one if exist. It flag upsert is set to true and no document is found, this document will be inserted.

options(optional)

ReplaceOneOptions

Provide list of options for replaceOne() operation and especially the upsert flag.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneAndReplaceOptions;
import com.datastax.astra.client.model.Projections;
import com.datastax.astra.client.model.Sorts;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndReplace {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        FindOneAndReplaceOptions options = new FindOneAndReplaceOptions()
                .projection(Projections.include("field1"))
                .sort(Sorts.ascending("field1"))
                .upsert(true)
                .returnDocumentAfter();

        Document docForReplacement = new Document()
                .append("field1", "value1")
                .append("field2", 20)
                .append("field3", 30)
                .append("field4", "value4");

        // It will return the document before deleting it
        Optional<Document> docBeforeReplace = collection
                .findOneAndReplace(filter, docForReplacement, options);
    }
}

Find and delete a document

Locate a document matching a filter condition and delete it, returning the document itself.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

collection.find_one_and_delete({"status": "stale_entry"})

Returns:

Dict[str, Any] - The document that was just deleted (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 199, 'status': 'stale_entry', 'request_id': 'A4431'}

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. _id, $vector) are controlled individually. The default projection does not necessarily include all fields of the document. See the projection examples for more on this parameter.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search. That is, Approximate Nearest Neighbors (ANN) search, extracting the most similar document in the collection matching the filter. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to be vectorized and used as the sorting criterion in a vector search. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the deleted one. See the sort examples for more on sorting.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_many(
    [
        {"species": "swan", "class": "Aves"},
        {"species": "frog", "class": "Amphibia"},
    ],
)
collection.find_one_and_delete(
    {"species": {"$ne": "frog"}},
    projection={"species": True},
)
# prints: {'_id': '5997fb48-...', 'species': 'swan'}
collection.find_one_and_delete({"species": {"$ne": "frog"}})
# (returns None for no matches)

View this topic in more detail on the API Reference.

const deletedDoc = await collection.findOneAndDelete({ status: 'stale_entry' });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to delete.

options?

FindOneAndDeleteOptions

The options for this operation.

Name Type Summary

projection?

Projection

Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields.

When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting.

Can only be used when performing a vector search.

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to be vectorized and used as the sorting criterion in a vector search.

Equivalent to setting the $vectorize field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vectorize field in the sort object directly.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

includeResultMetadata?

boolean

When true, returns alongside the document, an ok field with a value of 1 if the command executed successfully.

Returns:

Promise<WithId<Schema> | null> - The document that was deleted, or null if no matches are found.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertMany([
    { species: 'swan', class: 'Aves' },
    { species: 'frog', class: 'Amphibia' },
  ]);

  // { _id: '...', species: 'swan' }
  await collection.findOneAndDelete(
    { species: { $ne: 'frog' } },
    { projection: { species: 1 } },
  );

  // null
  await collection.findOneAndDelete(
    { species: { $ne: 'frog' } },
  );
})();
// Synchronous
Optional<T> findOneAndDelete(Filter filter);
Optional<T> findOneAndDelete(Filter filter, FindOneAndDeleteOptions options);

// Asynchronous
CompletableFuture<Optional<T>> findOneAndDeleteAsync(Filter filter);
CompletableFuture<Optional<T>> findOneAndDeleteAsync(Filter filter, FindOneAndDeleteOptions options);

Returns:

DeleteResult - Wrapper that contains the deleted count.

Parameters:

Name Type Summary

filter (optional)

Filter

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

options(optional)

FindOneAndDeleteOptions

Provide list of options a delete one such as a Sort clause (sort on vector or any other field) or a Projection clause

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import java.util.Optional;

import static com.datastax.astra.client.model.Filters.lt;

public class FindOneAndDelete {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Building a filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // It will return the document before deleting it
        Optional<Document> docBeforeRelease = collection.findOneAndDelete(filter);
    }
}

Use the Data API findOneAndDelete command to find and delete a single document.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
    "findOneAndDelete": {
        "filter": {
            "customer.name": "Fred Smith",
            "_id": "13"
        }
    }
}' | json_pp

Response:

{
   "status" : {
      "deletedCount" : 1
   }
}

Parameters:

Name Type Summary

findOneAndDelete

command

Deletes the first document that matches the given criteria. If no matching document is found, no action is taken.

filter

clause

Used to identify the document meant for deletion. In this example, the filter is comprised of customer.name and document _id values.

Delete a document

Locate and delete a single document from a collection.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

response = collection.delete_one({ "_id": "1" })

Locate and delete a single document from a collection by any attribute (as long as it is covered by the collection’s indexing configuration).

document = collection.delete_one({"location": "warehouse_C"})

Locate and delete a single document from a collection by an arbitrary filtering clause.

document = collection.delete_one({"tag": {"$exists": True}})

Delete the most similar document to a given vector.

result = collection.delete_one({}, vector=[.12, .52, .32])

Generate a vector from a string and delete the most similar document.

result = collection.delete_one({}, vectorize="Text to vectorize")

Returns:

DeleteResult - An object representing the response from the database after the delete operation. It includes information about the success of the operation.

Example response
DeleteResult(raw_results=[{'status': {'deletedCount': 1}}], deleted_count=1)

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

vectorize

Optional[str]

A string to be vectorized and used as the sorting criterion in a vector search. This parameter cannot be used together with sort. See the sort examples for more on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the deleted one. See the sort examples for more on sorting.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])

collection.delete_one({"seq": 1})
# prints: DeleteResult(raw_results=..., deleted_count=1)
collection.distinct("seq")
# prints: [0, 2]
collection.delete_one(
    {"seq": {"$exists": True}},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: DeleteResult(raw_results=..., deleted_count=1)
collection.distinct("seq")
# prints: [0]
collection.delete_one({"seq": 2})
# prints: DeleteResult(raw_results=..., deleted_count=0)

View this topic in more detail on the API Reference.

const result = await collection.deleteOne({ _id: '1' });

Locate and delete a single document from a collection.

const result = await collection.deleteOne({ location: 'warehouse_C' });

Locate and delete a single document from a collection by an arbitrary filtering clause.

const result = await collection.deleteOne({ tag: { $exists: true } });

Delete the most similar document to a given vector.

const result = await collection.deleteOne({}, { vector: [.12, .52, .32] });

Generate a vector from a string and delete the most similar document.

const result = await collection.deleteOne({}, { vectorize: 'Text to vectorize' });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to delete.

options?

DeleteOneOptions

The options for this operation.

Options (DeleteOneOptions):

Name Type Summary

sort?

Sort

Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk.

vector?

number[]

An optional vector to use to perform a vector search on the collection to find the closest matching document.

Equivalent to setting the $vector field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vector field in the sort object directly.

vectorize?

string

A string to be vectorized and used as the sorting criterion in a vector search.

Equivalent to setting the $vectorize field in the sort field itself—The two are interchangeable, but mutually exclusive.

If you really need to use both, you can set the $vectorize field in the sort object directly.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests.

Returns:

Promise<DeleteOneResult> - The result of the deletion operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertMany([{ seq: 1 }, { seq: 0 }, { seq: 2 }]);

  // { deletedCount: 1 }
  await collection.deleteOne({ seq: 1 });

  // [0, 2]
  await collection.distinct('seq');

  // { deletedCount: 1 }
  await collection.deleteOne({ seq: { $exists: true } }, { sort: { seq: -1 } });

  // [0]
  await collection.distinct('seq');

  // { deletedCount: 0 }
  await collection.deleteOne({ seq: 2 });
})();
// Synchronous
DeleteResult deleteOne(Filter filter);
DeleteResult deleteOne(Filter filter, DeleteOneOptions options);

// Asynchronous
CompletableFuture<DeleteResult> deleteOneAsync(Filter filter);
CompletableFuture<DeleteResult> deleteOneAsync(Filter filter, DeleteOneOptions options);

Returns:

DeleteResult - Wrapper that contains the deleted count.

Parameters:

Name Type Summary

filter (optional)

Filter

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators.

options(optional)

DeleteOneOptions

Provide list of options a delete one such as a Sort clause (sort on vector or any other field)

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteOneOptions;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Sorts;

import static com.datastax.astra.client.model.Filters.lt;

public class DeleteOne {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sample Filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));

        // Delete one options
        DeleteOneOptions options = new DeleteOneOptions()
                .sort(Sorts.ascending("field2"));
        DeleteResult result = collection.deleteOne(filter, options);
        System.out.println("Deleted Count:" + result.getDeletedCount());
    }
}

The Data API deleteOne command deletes a single document. In this example, the deletion would occur where the tags value is first.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "deleteOne": {
    "filter": {
      "tags": "first"
    }
  }
}' | json_pp

Response:

{
   "status" : {
      "deletedCount" : 1
   }
}

Properties:

Name Type Summary

deleteOne

command

Delete a matching document from a collection based on the provided filter criteria.

filter

clause

Provides the conditions that the database uses to identify one or more document(s) meant for deletion.

tags

string

A filtering key that targets a specific property in the database’s documents.

first

string

A value iassociated with the tags key, which in this example must contain the string "first" to meet the deletion criteria.

Delete documents

Delete multiple documents from a collection.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

delete_result = collection.delete_many({"status": "processed"})

Returns:

DeleteResult - An object representing the response from the database after the delete operation. It includes information about the success of the operation.

Example response
DeleteResult(raw_results=[{'status': {'deletedCount': 2}}], deleted_count=2)

Parameters:

Name Type Summary

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$lt": 100}}, {"$and": [{"name": "John"}, {"price": {"$lt": 100}}]}. See Data API operators for the full list of operators. The delete_many method does not accept an empty filter: see delete_all to completely erase all contents of a collection

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. You may need to increase the timeout duration when deleting a large number of documents, as the operation will require multiple HTTP requests in sequence.

This method would not admit an empty filter clause: use the delete_all method to delete all documents in the collection.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])

collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=2)
collection.distinct("seq")
# prints: [2]
collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=0)

View this topic in more detail on the API Reference.

const result = await collection.deleteMany({ status: 'processed' });

Parameters:

Name Type Summary

filter

Filter<Schema>

A filter to select the document to delete.

options?

WithTimeout

The options (the timeout) for this operation.

This method does not admit an empty filter clause; use the deleteAll method to delete all documents in the collection.

Returns:

Promise<DeleteManyResult> - The result of the deletion operation.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.insertMany([{ seq: 1 }, { seq: 0 }, { seq: 2 }]);

  // { deletedCount: 1 }
  await collection.deleteMany({ seq: { $lte: 1 } });

  // [2]
  await collection.distinct('seq');

  // { deletedCount: 0 }
  await collection.deleteMany({ seq: { $lte: 1 } });
})();
// Synchronous
DeleteResult deleteMany(Filter filter);

// Asynchronous
CompletableFuture<DeleteResult> deleteManyAsync(Filter filter);

Returns:

DeleteResult - Wrapper that contains the deleted count.

Same as a few other methods the delete operation can delete only 20 documents at a time. deleteMany() can takes time as we iterate until we got confirmation no more documents matching the filter are available.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;

import static com.datastax.astra.client.model.Filters.lt;

public class DeleteMany {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Sample Filter
        Filter filter = Filters.and(
                Filters.gt("field2", 10),
                lt("field3", 20),
                Filters.eq("field4", "value"));
        DeleteResult result = collection.deleteMany(filter);
        System.out.println("Deleted Count:" + result.getDeletedCount());

    }
}

The following JSON payload is designed to delete documents where the status is inactive.

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "deleteMany": {
    "filter": {
      "status": "inactive"
    }
  }
}' | json_pp

Response:

{
   "status" : {
      "deletedCount" : 20
   }
}

Properties:

Name Type Summary

deleteMany

command

Deletes all matching documents from a collection based on the provided filter criteria.

filter

option

Provides the conditions that the database uses to identify one or more document(s) meant for deletion.

status

option

Used for filtering to decide which documents may be deleted.

inactive

string

The inactive value is associated with the status key. In this example, it must contain the string "inactive" to meet the deletion criteria.

Execute multiple write operations

Execute a (reusable) list of write operations on a collection with a single command.

  • Python

  • TypeScript

  • Java

View this topic in more detail on the API Reference.

bw_results = collection.bulk_write(
    [
        InsertMany([{"a": 1}, {"a": 2}]),
        ReplaceOne(
            {"z": 9},
            replacement={"z": 9, "replaced": True},
            upsert=True,
        ),
    ],
)

Returns:

BulkWriteResult - A single object summarizing the whole list of requested operations. The keys in the map attributes of the result (when present) are the integer indices of the corresponding operation in the requests iterable.

Example response
BulkWriteResult(bulk_api_results={0: ..., 1: ...}, deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'})

Parameters:

Name Type Summary

requests

Iterable[BaseOperation]

An iterable over concrete subclasses of BaseOperation, such as InsertMany or ReplaceOne. Each such object represents an operation ready to be executed on a collection, and is instantiated by passing the same parameters as one would the corresponding collection method.

ordered

bool

Whether to launch the requests one after the other or in arbitrary order, possibly in a concurrent fashion. DataStax suggests False (default) when possible for faster performance.

concurrency

Optional[int]

Maximum number of concurrent operations executing at a given time. It cannot be more than one for ordered bulk writes.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the whole bulk write. This method uses the collection-level timeout by default. You may need to increase the timeout duration depending on the number of operations. If the method call times out, there’s no guarantee about how much of the bulk write was completed.

Example:

from astrapy import DataAPIClient
from astrapy.operations import (
    InsertOne,
    InsertMany,
    UpdateOne,
    UpdateMany,
    ReplaceOne,
    DeleteOne,
    DeleteMany,
)
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

op1 = InsertMany([{"a": 1}, {"a": 2}])
op2 = ReplaceOne({"z": 9}, replacement={"z": 9, "replaced": True}, upsert=True)
collection.bulk_write([op1, op2])
# prints: BulkWriteResult(bulk_api_results={0: ..., 1: ...}, deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'})
collection.count_documents({}, upper_bound=100)
# prints: 3
collection.distinct("replaced")
# prints: [True]

View this topic in more detail on the API Reference.

const results = await collection.bulkWrite([
  { insertOne: { a: '1' } },
  { insertOne: { a: '2' } },
  { replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);

Parameters:

Name Type Summary

operations

AnyBulkWriteOperation<Schema>[]

The operations to perform.

options?

BulkWriteOptions

The options for this operation.

Options (BulkWriteOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all operations may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered operations. Defaults to 8.

Not available for ordered operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<BulkWriteResult<Schema>> - A promise that resolves to a summary of the performed operations.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.bulkWrite([
    { insertOne: { document: { a: 1 } } },
    { insertOne: { document: { a: 2 } } },
    { replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
  ]);

  // 3
  await collection.countDocuments({}, 100);

  // [true]
  await collection.distinct('replaced');
})();
// Synchronous
BulkWriteResult bulkWrite(List<Command> commands);
BulkWriteResult bulkWrite(List<Command> commands, BulkWriteOptions options);

// Asynchronous
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands);
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands, BulkWriteOptions options);

Returns:

BulkWriteResult - Wrapper with the list of responses for each command.

Parameters:

Name Type Summary

commands

List<Command>

List of the generic Command to execute.

options(optional)

BulkWriteOptions

Provide list of options for those commands like ordered or concurrency.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.BulkWriteOptions;
import com.datastax.astra.client.model.BulkWriteResult;
import com.datastax.astra.client.model.Command;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.internal.api.ApiResponse;

import java.util.List;

public class BulkWrite {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Set a couple of Commands
        Command cmd1 = Command.create("insertOne").withDocument(new Document().id(1).append("name", "hello"));
        Command cmd2 = Command.create("insertOne").withDocument(new Document().id(2).append("name", "hello"));

        // Set the options for the bulk write
        BulkWriteOptions options1 = BulkWriteOptions.Builder.ordered(false).concurrency(1);

        // Execute the queries
        BulkWriteResult result = collection.bulkWrite(List.of(cmd1, cmd2), options1);

        // Retrieve the LIST of responses
        for(ApiResponse res : result.getResponses()) {
            System.out.println(res.getData());
        }
    }

}

Delete all documents from a collection

Delete all documents in a collection.

  • Python

  • TypeScript

  • Java

  • cURL

View this topic in more detail on the API Reference.

result = collection.delete_all()

Returns:

Dict - A dictionary in the form {"ok": 1} if the method succeeds.

Example response
{'ok': 1}

Parameters:

Name Type Summary

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection

my_coll.distinct("seq")
# prints: [2, 1, 0]
my_coll.count_documents({}, upper_bound=100)
# prints: 4
my_coll.delete_all()
# prints: {'ok': 1}
my_coll.count_documents({}, upper_bound=100)
# prints: 0

View this topic in more detail on the API Reference.

const results = await collection.bulkWrite([
  { insertOne: { a: '1' } },
  { insertOne: { a: '2' } },
  { replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);

Parameters:

Name Type Summary

operations

AnyBulkWriteOperation<Schema>[]

The operations to perform.

options?

BulkWriteOptions

The options for this operation.

Options (BulkWriteOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all operations may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered operations. Defaults to 8.

Not available for ordered operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<BulkWriteResult<Schema>> - A promise that resolves to a summary of the performed operations.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { namespace: 'DB_NAMESPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.bulkWrite([
    { insertOne: { document: { a: 1 } } },
    { insertOne: { document: { a: 2 } } },
    { replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
  ]);

  // 3
  await collection.countDocuments({}, 100);

  // [true]
  await collection.distinct('replaced');
})();
// Synchronous
DeleteResult deleteAll();

// Asynchronous
CompletableFuture<DeleteResult> deleteAllAsync();

Returns:

DeleteResult - Wrapper that contains the deleted count.

Same as a few other methods, the delete operation can delete only 20 documents at a time. To implement a deleteAll(), execute a deleteMany() without any filter. This operation can takes time as we iterate until we receive confirmation no more documents are available.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;

public class DeleteAll {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Show the deleted count
        DeleteResult result = collection.deleteAll();
        System.out.println("Deleted Count:" + result.getDeletedCount());
    }
}

The following JSON payload is designed to delete all documents in a collection.

If used with an empty { } filter or an empty body, the Data API deleteMany command deletes all the data in your connected database’s collection. This command, with an empty filter or body, bypasses the guardrail of up to 20 rows deleted per transaction. Recall that the auth token is associated with the privileged Database Administrator role. If all data is removed in a collection, the response contains deletedCount: -1 (meaning all rows).

curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "deleteMany": {
  }
}' | json_pp

Response:

{
   "status" : {
      "deletedCount" : -1
   }
}

Properties:

Name Type Summary

deleteMany

command

Deletes all matching documents from a collection based on the provided filter criteria.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com