Documents reference
Documents represent a single row or record of data in a keyspace.
You use the Collection
class to work with documents.
If you haven’t done so already, consult the Collections reference topic for details on how to get a Collection
object.
HCD APIs use the term keyspace to refer to both namespaces and keyspaces. |
Working with dates
-
Python
-
TypeScript
-
Java
-
cURL
Date and datetime objects, which are instances of the Python standard library
datetime.datetime
and datetime.date
classes, can be used anywhere in documents.
collection.insert_one({"when": datetime.datetime.now()})
collection.insert_one({"date_of_birth": datetime.date(2000, 1, 1)})
collection.update_one(
{"registered_at": datetime.date(1999, 11, 14)},
{"$set": {"message": "happy Sunday!"}},
)
print(
collection.find_one(
{"date_of_birth": {"$lt": datetime.date(2001, 1, 1)}},
projection={"_id": False},
)
)
# will print:
# {'date_of_birth': datetime.datetime(2000, 1, 1, 0, 0)}
As shown in the example, read operations from a collection always return the |
Native JS Date
objects can be used anywhere in documents to represent dates and times.
Document fields stored using the { $date: number }
will also be returned as Date
objects when read.
(async function () {
// Create an untyped collection
const collection = await db.createCollection('dates_test', { checkExists: false });
// Insert documents with some dates
await collection.insertOne({ dateOfBirth: new Date(1394104654000) });
await collection.insertOne({ dateOfBirth: new Date('1863-05-28') });
// Update a document with a date and setting lastModified to now
await collection.updateOne(
{
dateOfBirth: new Date('1863-05-28'),
},
{
$set: { message: 'Happy Birthday!' },
$currentDate: { lastModified: true },
},
);
// Will print around new Date()
const found = await collection.findOne({ dateOfBirth: { $lt: new Date('1900-01-01') } });
console.log(found?.lastModified);
})();
Data API is using the ejson
standard to represents time-related objects. The client
introducing custom serializers but 3 types of objects java.util.Date
, java.util.Calendar
, java.util.Instant
.
Those objects can be used naturally both in filter clauses, update clauses and or in documents.
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOneOptions;
import com.datastax.astra.client.model.Projections;
import java.time.Instant;
import java.util.Calendar;
import java.util.Date;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Updates.set;
public class WorkingWithDates {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
Calendar c = Calendar.getInstance();
collection.insertOne(new Document().append("registered_at", c));
collection.insertOne(new Document().append("date_of_birth", new Date()));
collection.insertOne(new Document().append("just_a_date", Instant.now()));
collection.updateOne(
eq("registered_at", c), // filter clause
set("message", "happy Sunday!")); // update clause
collection.findOne(
lt("date_of_birth", new Date(System.currentTimeMillis() - 1000 * 1000)),
new FindOneOptions().projection(Projections.exclude("_id")));
}
}
In the JSON payload of the following Data API insertOne
command, $date
is used to specify a car’s purchase date:
"purchase_date": {"$date": 1690045891}
Example:
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"insertOne": {
"document": {
"_id": "1",
"purchase_type": "Online",
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"customer": {
"name": "Jim A.",
"phone": "123-456-1111",
"age": 51,
"credit_score": 782,
"address": {
"address_line": "1234 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1690045891},
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car" : "BMW 330i Sedan",
"color": "Silver"
},
"Extended warranty - 5 years"
],
"amount": 47601,
"status" : "active",
"preferred_customer" : true
}
}
}' | json_pp
Response
{
"status": {
"insertedIds": [
"1"
]
}
}
Working with document IDs
Documents in a collection are always identified by an ID that is unique within the collection.
The ID can be any of several types, such as a string, integer, or datetime. However, it’s recommended to instead prefer the uuid
or the ObjectId
types.
The Data API supports uuid
identifiers up to version 8, as well as ObjectId
identifiers as provided by the bson
library.
These can appear anywhere in the document, not only in its _id
field. Moreover, different types of identifier can appear in different parts of the same document. And these identifiers can be part of filtering clauses and update/replace directives just like any other data type.
One of the optional settings of a collection is the "default ID type": that is, it is possible to specify what kind of identifiers the server should supply
for documents without an explicit _id
field. (For details, see the create_collection
method and Data API createCollection
command in the Collections reference.) Regardless of the defaultId
setting, however, identifiers of any type can be explicitly provided for documents at any time and will be honored by the API, for example when inserting documents.
-
Python
-
TypeScript
-
Java
-
cURL
from astrapy.ids import (
ObjectId,
uuid1,
uuid3,
uuid4,
uuid5,
uuid6,
uuid7,
uuid8,
UUID,
)
AstraPy recognizes uuid
versions 1 through 8 (with the exception of 2) as provided by the uuid
and uuid6
Python libraries, as well as the ObjectId
from the bson
package. Furthermore, out of convenience, these same utilities are exposed in AstraPy directly, as shown in the example above.
You can then generate new identifiers with statements such as new_id = uuid8()
or new_obj_id = ObjectId()
.
Keep in mind that all uuid
versions are instances of the same class (UUID
), which exposes a version
property, should you need to access it.
Here is a short example:
collection.insert_one({"_id": uuid8(), "tag": "new_id_v_8"})
collection.insert_one(
{"_id": UUID("018e77bc-648d-8795-a0e2-1cad0fdd53f5"), "tag": "id_v_8"}
)
collection.insert_one({"id": ObjectId(), "tag": "new_obj_id"})
collection.insert_one(
{"id": ObjectId("6601fb0f83ffc5f51ba22b88"), "tag": "obj_id"}
)
collection.find_one_and_update(
{"_id": ObjectId("6601fb0f83ffc5f51ba22b88")},
{"$set": {"item_inventory_id": UUID("1eeeaf80-e333-6613-b42f-f739b95106e6")}},
)
import { UUID, ObjectId } from '@datastax/astra-db-ts';
astra-db-ts provides the UUID
and ObjectId
classes for using and generating new identifiers. Note that these are
not the same as exported from the bson
or uuid
libraries, but rather are custom classes that must be imported
from the astra-db-ts
package.
You can generate new identifiers using UUID.v4()
, UUID.v7()
, or new ObjectId()
. The UUID methods all return an
instance of the same class, but it exposes a version
property, should you need to access it. They may also be
constructed from a string representation of the IDs if custom generation is desired.
Here is a short example of the concepts:
import { DataAPIClient, UUID, ObjectId } from '@datastax/astra-db-ts';
// Schema for the collection
interface Person {
_id: UUID | ObjectId;
name: string;
friendId?: UUID;
}
// Insert documents w/ various IDs
await collection.insertOne({ name: 'John', _id: UUID.v4() });
await collection.insertOne({ name: 'Jane', _id: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') });
await collection.insertOne({ name: 'Dan', _id: new ObjectId()});
await collection.insertOne({ name: 'Tim', _id: new ObjectId('65fd9b52d7fabba03349d013') });
// Update a document with a UUID in a non-_id field
await collection.updateOne(
{ name: 'John' },
{ $set: { friendId: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') } },
);
// Find a document by a UUID in a non-_id field
const john = await collection.findOne({ name: 'John' });
const jane = await collection.findOne({ _id: john!.friendId });
// Prints 'Jane 016b1cac-14ce-660e-8974-026c927b9b91 6'
console.log(jane?.name, jane?._id.toString(), (<UUID>jane?._id).version);
})();
-
To cope with different implementations of
UUID
(v6 and v7 especially) dedicated classes have been defined. -
When an unique identifier is retrieved from the server, it is returned as a
uuid
and will be converted to the appropriateUUID
class leveraging the class definition in thedefaultId
option. -
The
ObjectId
classes is extracted from the Bson package and is used to represent theObjectId
type.
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.ObjectId;
import com.datastax.astra.client.model.UUIDv6;
import com.datastax.astra.client.model.UUIDv7;
import java.time.Instant;
import java.util.UUID;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Updates.set;
public class WorkingWithDocumentIds {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Ids can be different Json scalar
// ('defaultId' options NOT set for collection)
new Document().id("abc");
new Document().id(123);
new Document().id(Instant.now());
// Working with UUIDv4
new Document().id(UUID.randomUUID());
// Working with UUIDv6
collection.insertOne(new Document().id(new UUIDv6()).append("tag", "new_id_v_6"));
UUID uuidv4 = UUID.fromString("018e77bc-648d-8795-a0e2-1cad0fdd53f5");
collection.insertOne(new Document().id(new UUIDv6(uuidv4)).append("tag", "id_v_8"));
// Working with UUIDv7
collection.insertOne(new Document().id(new UUIDv7()).append("tag", "new_id_v_7"));
// Working with ObjectIds
collection.insertOne(new Document().id(new ObjectId()).append("tag", "obj_id"));
collection.insertOne(new Document().id(new ObjectId("6601fb0f83ffc5f51ba22b88")).append("tag", "obj_id"));
collection.findOneAndUpdate(
eq((new ObjectId("6601fb0f83ffc5f51ba22b88"))),
set("item_inventory_id", UUID.fromString("1eeeaf80-e333-6613-b42f-f739b95106e6")));
}
}
Java natural |
The same underlying ID functionality as noted for the clients applies when using _id
types with Data API commands. For full details about the defaultId
option on the createCollection
command, and its accepted type settings, see The defaultId option.
Example:
{
"createCollection": {
"name": "vector_collection2",
"options": {
"defaultId": {
"type": "objectId"
},
"vector": {
"dimension": 1024,
"metric": "cosine"
}
}
}
}
Response
{
"status": {
"ok": 1
}
}
Insert a single document
Insert a single document into a collection.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
insert_result = collection.insert_one({"name": "Jane Doe"})
Insert a document with an associated vector.
insert_result = collection.insert_one(
{
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
)
Insert a document and generate a vector automatically.
insert_result = collection.insert_one(
{
"name": "Jane Doe",
"$vectorize": "Text to vectorize",
},
)
Returns:
InsertOneResult
- An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.
Example response
InsertOneResult(raw_results=[{'status': {'insertedIds': ['92b4c4f4-db44-4440-b4c4-f4db44e440b8']}}], inserted_id='92b4c4f4-db44-4440-b4c4-f4db44e440b8')
Parameters:
Name | Type | Summary |
---|---|---|
document |
|
The dictionary expressing the document to insert. The |
vector |
|
A vector (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the "$vector" field of the document itself, however the two are mutually exclusive. |
vectorize |
|
A string to be vectorized. This only works for collections associated with an embedding service. |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead. |
Example:
# Insert a document with a specific ID
response1 = collection.insert_one(
{
"_id": 101,
"name": "John Doe",
"$vector": [.12, .52, .32],
},
)
# Insert a document without specifying an ID
# so that _id
is generated automatically
response2 = collection.insert_one(
{
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
)
View this topic in more detail on the API Reference.
const result = await collection.insertOne({ name: 'Jane Doe' });
Insert a document with an associated vector.
const result = await collection.insertOne(
{
name: 'Jane Doe',
$vector: [.08, .68, .30],
},
);
Insert a document and generate a vector automatically.
const result = await collection.insertOne(
{
name: 'Jane Doe',
$vectorize: 'Text to vectorize',
},
);
Parameters:
Name | Type | Summary |
---|---|---|
document |
The document to insert. If the document does not have an |
|
options? |
The options for this operation. |
Options (InsertOneOptions
):
Name | Type | Summary |
---|---|---|
|
The vector for the document. Equivalent to providing the vector in the |
|
|
A string to be vectorized. This only works for collections associated with an embedding service. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<InsertOneResult<Schema>>
- A promise that resolves
to the inserted ID.
Example:
(async function () {
// Insert a document with a specific ID
await collection.insertOne({ _id: '1', name: 'John Doe' });
// Insert a document with an autogenerated ID
await collection.insertOne({ name: 'Jane Doe' });
// Insert a document with a vector
await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();
-
Operations on documents are performed at
Collection
level, to get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
InsertOneResult insertOne(DOC document);
InsertOneResult insertOne(DOC document, float[] embeddings);
// Equivalent in asynchronous
CompletableFuture<InsertOneResult> insertOneAsync(DOC document);
CompletableFuture<InsertOneResult> insertOneAsync(DOC document, float[] embeddings);
Returns:
InsertOneResult
- Wrapper with the inserted document Id.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Object representing the document to insert.
The |
|
|
A vector of embeddings (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertOneOptions;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
public class InsertOne {
@Data @AllArgsConstructor
public static class Product {
@JsonProperty("_id")
private String id;
private String name;
}
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Insert a document
Document doc1 = new Document("1").append("name", "joe");
InsertOneResult res1 = collectionDoc.insertOne(doc1);
System.out.println(res1.getInsertedId()); // should be "1"
// Insert a document with embeddings
Document doc2 = new Document("2").append("name", "joe");
collectionDoc.insertOne(doc2, new float[] {.1f, .2f});
// Given an existing collection
Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION2_NAME", Product.class);
// Insert a document with custom bean
collectionProduct.insertOne(new Product("1", "joe"));
collectionProduct.insertOne(new Product("2", "joe"), new float[] {.1f, .2f});
}
}
cURL -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"insertOne": {
"document": {
"_id": "1",
"purchase_type": "Online",
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"customer": {
"name": "Jim A.",
"phone": "123-456-1111",
"age": 51,
"credit_score": 782,
"address": {
"address_line": "1234 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1690045891},
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car" : "BMW 330i Sedan",
"color": "Silver"
},
"Extended warranty - 5 years"
],
"amount": 47601,
"status" : "active",
"preferred_customer" : true
}
}
}' | json_pp
Properties:
Name | Type | Summary |
---|---|---|
insertOne |
command |
Data API designation that a single document is inserted. |
document |
JSON object |
Contains the details of the record added. |
_id |
uuid4 |
A unique identifier for the document. Other |
purchase_type |
string |
Specifies how the purchase was made. |
$vector |
array |
A reserved property used to store vector data. The value is an array of numbers, or it can be generated. These numbers could be used for various purposes like similarity searches, clustering, or other mathematical commands that can be applied to vectors. Given that this is a reserved property, the vector-enabled Hyper-Converged Database (HCD) database has specialized handling for data stored in this format. That is, optimized query performance for vector similarity. |
customer |
string |
Information about the customer who made the purchase. |
customer.name |
string |
The customer’s name |
customer.phone |
string |
The customer’s contact phone number. |
customer.age |
number |
The customer’s age. Subsequent examples can use the |
customer.credit_score |
number |
The customer’s credit score at the time of the car’s purchase. Subsequent examples can use |
customer.address |
string |
Contains further details about the customer’s address. |
customer.address_line |
string |
The customer’s street or location address. |
customer.city |
string |
The customer’s city. |
customer.state |
string |
The state where the customer resides. |
purchase_date |
date |
The date on which the purchase was made, using the |
seller |
JSON object |
Information about the seller from whom the purchase was made. |
seller.name |
string |
The seller’s name. |
seller.location |
string |
The seller’s location. |
items |
JSON object |
An array detailing the items included in this purchase. |
items.car |
string |
Information about the make and model of the car. |
items.color |
string |
Information about the car’s color. |
Extended warranty - 5 years |
string |
Additional detail that’s part of the items array. Indicates the customer has an "Extended warranty - 5 years" as part of the purchase. |
amount |
number |
The total cost of the purchase. |
status |
string |
Current status of the purchase. |
preferred_customer |
boolean |
Whether the buyer is a preferred customer. |
Response
{
"status": {
"insertedIds": [
"1"
]
}
}
Insert many documents
Insert multiple documents into a collection.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
response = collection.insert_many(
[
{
"_id": 101,
"name": "John Doe",
"$vector": [.12, .52, .32],
},
{
# ID is generated automatically
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
],
)
Insert multiple documents and generate vectors automatically.
response = collection.insert_many(
[
{
"name": "John Doe",
"$vectorize": "Text to vectorize for John Doe",
},
{
"name": "Jane Doe",
"$vectorize": "Text to vectorize for Jane Doe",
},
],
)
Returns:
InsertManyResult
- An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.
Example response
InsertManyResult(raw_results=[{'status': {'insertedIds': [101, '81077d86-05dc-43ca-877d-8605dce3ca4d']}}], inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'])
Parameters:
Name | Type | Summary |
---|---|---|
documents |
|
An iterable of dictionaries, each a document to insert. Documents may specify their |
vectors |
|
An optional list of vectors (as many vectors as the provided documents) to associate to the documents when inserting. Each vector is added to the corresponding document prior to insertion on database. The list can be a mixture of None and vectors, in which case some documents will not have a vector, unless it is specified in their "$vector" field already. Passing vectors this way is indeed equivalent to the "$vector" field of the documents, however the two are mutually exclusive. |
vectorize |
|
An optional list of strings to be vectorized. This only works for collections associated with an embedding service. |
ordered |
|
If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance. |
chunk_size |
|
How many documents to include in a single API request. The default and maximum value is 20. |
concurrency |
|
Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions. |
max_time_ms |
|
A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully. |
Unless there are specific reasons not to, it is recommended to prefer |
Example:
collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])
collection.insert_many(
[{"seq": i} for i in range(50)],
concurrency=5,
)
collection.insert_many(
[
{"tag": "a", "$vector": [1, 2]},
{"tag": "b", "$vector": [3, 4]},
]
)
View this topic in more detail on the API Reference.
const result = await collection.insertMany([
{
_id: '1',
name: 'John Doe',
$vector: [.12, .52, .32],
},
{
name: 'Jane Doe',
$vector: [.08, .68, .30],
},
], {
ordered: true,
});
Insert multiple documents and generate vectors automatically.
const result = await collection.insertMany([
{
name: 'John Doe',
$vectorize: 'Text to vectorize for John Doe',
},
{
name: 'Jane Doe',
$vectorize: 'Text to vectorize for Jane Doe',
},
], {
ordered: true,
});
Parameters:
Name | Type | Summary |
---|---|---|
documents |
The documents to insert. If any document does not have an |
|
options? |
The options for this operation. |
Options (InsertManyOptions
):
Name | Type | Summary |
---|---|---|
|
You may set the |
|
|
You can set the Not available for ordered insertions. |
|
|
Control how many documents are sent each network request. The default and maximum value is 20. |
|
|
An array of vectors to associate with each document. If a vector is Equivalent to providing the vector in the |
|
|
An array of strings to be vectorized. This only works for collections associated with an embedding service. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Unless there are specific reasons not to, it is recommended to prefer to leave ordered |
Returns:
Promise<InsertManyResult<Schema>>
- A promise that resolves
to the inserted IDs.
Example:
(async function () {
try {
// Insert many documents
await collection.insertMany([
{ _id: '1', name: 'John Doe' },
{ name: 'Jane Doe' }, // Will autogen ID
], { ordered: true });
// Insert many with vectors
await collection.insertMany([
{ name: 'John Doe', $vector: [.12, .52, .32] },
{ name: 'Jane Doe', $vector: [.32, .52, .12] },
]);
} catch (e) {
if (e instanceof InsertManyError) {
console.log(e.partialResult);
}
}
})();
-
Operations on documents are performed at
Collection
level, to get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);
// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);
Returns:
InsertManyResult
- Wrapper with the list of inserted document ids.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A list of documents to insert.
Documents may specify their |
|
Set the different options for the insert operation. The options are |
The java operation
If not provided the default values are |
|
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
import java.util.List;
public class InsertMany {
@Data @AllArgsConstructor
public static class Product {
@JsonProperty("_id")
private String id;
private String name;
}
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Insert a document
Document doc1 = new Document("1").append("name", "joe");
Document doc2 = new Document("2").append("name", "joe");
InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
System.out.println("Identifiers inserted: " + res1.getInsertedIds());
// Given an existing collection
Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION2_NAME", Product.class);
// Insert a document with embeddings
InsertManyOptions options = new InsertManyOptions()
.chunkSize(20) // how many process per request
.concurrency(1) // parallel processing
.ordered(false) // allows parallel processing
.timeout(1000); // timeout in millis
InsertManyResult res2 = collectionProduct.insertMany(
List.of(new Product("1", "joe"),
new Product("2", "joe")),
options);
}
}
The API accepts up to 20 documents per request.
The following Data API insertMany
command adds 20 documents to a collection.
cURL -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"insertMany": {
"documents": [
{
"_id": "2",
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"customer": {
"name": "Jack B.",
"phone": "123-456-2222",
"age": 34,
"credit_score": 700,
"address": {
"address_line": "888 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1690391491},
"seller": {
"name": "Tammy S.",
"location": "Staten Island NYC"
},
"items": [
{
"car" : "Tesla Model 3",
"color": "White"
},
"Extended warranty - 10 years",
"Service - 5 years"
],
"amount": 53990,
"status" : "active"
},
{
"_id": "3",
"purchase_type": "Online",
"$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
"customer": {
"name": "Jill D.",
"phone": "123-456-3333",
"age": 30,
"credit_score": 742,
"address": {
"address_line": "12345 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1690564291},
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": "Extended warranty - 10 years",
"amount": 4600,
"status" : "active"
},
{
"_id": "4",
"purchase_type": "In Person",
"$vector": [0.25, 0.25, 0.25, 0.25, 0.26],
"customer": {
"name": "Lester M.",
"phone": "123-456-4444",
"age": 40,
"credit_score": 802,
"address": {
"address_line": "12346 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1690909891},
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car" : "BMW 330i Sedan",
"color": "Red"
},
"Extended warranty - 5 years",
"Service - 5 years"
],
"amount": 48510,
"status" : "active"
},
{
"_id": "5",
"purchase_type": "Online",
"$vector": [0.25, 0.045, 0.38, 0.31, 0.67],
"customer": {
"name": "David C.",
"phone": "123-456-5555",
"age": 50,
"credit_score": 800,
"address": {
"address_line": "32345 Main Ave",
"city": "Jersey City",
"state": "NJ"
}
},
"purchase_date": {"$date": 1690996291},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [
{
"car" : "Tesla Model S",
"color": "Red"
},
"Extended warranty - 5 years"
],
"amount": 94990,
"status" : "active"
},
{
"_id": "6",
"purchase_type": "In Person",
"$vector": [0.11, 0.02, 0.78, 0.10, 0.27],
"customer": {
"name": "Chris E.",
"phone": "123-456-6666",
"age": 43,
"credit_score": 764,
"address": {
"address_line": "32346 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1691860291},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [
{
"car" : "Tesla Model X",
"color": "Blue"
}
],
"amount": 109990,
"status" : "active"
},
{
"_id": "7",
"purchase_type": "Online",
"$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
"customer": {
"name": "Jeff G.",
"phone": "123-456-7777",
"age": 66,
"credit_score": 802,
"address": {
"address_line": "22999 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1692119491},
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": [{
"car" : "BMW M440i Gran Coupe",
"color": "Black"
},
"Extended warranty - 5 years"],
"amount": 61050,
"status" : "active"
},
{
"_id": "8",
"purchase_type": "In Person",
"$vector": [0.3, 0.23, 0.15, 0.17, 0.4],
"customer": {
"name": "Harold S.",
"phone": "123-456-8888",
"age": 29,
"credit_score": 710,
"address": {
"address_line": "1234 Main St",
"city": "Orange",
"state": "NJ"
}
},
"purchase_date": {"$date": 1693329091},
"seller": {
"name": "Tammy S.",
"location": "Staten Island NYC"
},
"items": [{
"car" : "BMW X3 SUV",
"color": "Black"
},
"Extended warranty - 5 years"
],
"amount": 46900,
"status" : "active"
},
{
"_id": "9",
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.06],
"customer": {
"name": "Richard Z.",
"phone": "123-456-9999",
"age": 22,
"credit_score": 690,
"address": {
"address_line": "22345 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1693588291},
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": [{
"car" : "Tesla Model 3",
"color": "White"
},
"Extended warranty - 5 years"
],
"amount": 53990,
"status" : "active"
},
{
"_id": "10",
"purchase_type": "In Person",
"$vector": [0.25, 0.045, 0.38, 0.31, 0.68],
"customer": {
"name": "Eric B.",
"phone": null,
"age": 54,
"credit_score": 780,
"address": {
"address_line": "9999 River Rd",
"city": "Fair Haven",
"state": "NJ"
}
},
"purchase_date": {"$date": 1694797891},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [{
"car" : "Tesla Model S",
"color": "Black"
}
],
"amount": 93800,
"status" : "active"
},
{
"_id": "11",
"purchase_type": "Online",
"$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
"customer": {
"name": "Ann J.",
"phone": "123-456-1112",
"age": 47,
"credit_score": 660,
"address": {
"address_line": "99 Elm St",
"city": "Fair Lawn",
"state": "NJ"
}
},
"purchase_date": {"$date": 1695921091},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [{
"car" : "Tesla Model Y",
"color": "White"
},
"Extended warranty - 5 years"
],
"amount": 57500,
"status" : "active"
},
{
"_id": "12",
"purchase_type": "In Person",
"$vector": [0.33, 0.44, 0.55, 0.77, 0.66],
"customer": {
"name": "John T.",
"phone": "123-456-1123",
"age": 55,
"credit_score": 786,
"address": {
"address_line": "23 Main Blvd",
"city": "Staten Island",
"state": "NY"
}
},
"purchase_date": {"$date": 1696180291},
"seller": {
"name": "Tammy S.",
"location": "Staten Island NYC"
},
"items": [{
"car" : "BMW 540i xDrive Sedan",
"color": "Black"
},
"Extended warranty - 5 years"
],
"amount": 64900,
"status" : "active"
},
{
"_id": "13",
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.07],
"customer": {
"name": "Aaron W.",
"phone": "123-456-1133",
"age": 60,
"credit_score": 702,
"address": {
"address_line": "1234 4th Ave",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1697389891},
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [{
"car" : "Tesla Model 3",
"color": "White"
},
"Extended warranty - 5 years"
],
"amount": 55000,
"status" : "active"
},
{
"_id": "14",
"purchase_type": "In Person",
"$vector": [0.11, 0.02, 0.78, 0.21, 0.27],
"customer": {
"name": "Kris S.",
"phone": "123-456-1144",
"age": 44,
"credit_score": 702,
"address": {
"address_line": "1414 14th Pl",
"city": "Brooklyn",
"state": "NY"
}
},
"purchase_date": {"$date": 1698513091},
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": [{
"car" : "Tesla Model X",
"color": "White"
}
],
"amount": 110400,
"status" : "active"
},
{
"_id": "15",
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.08],
"customer": {
"name": "Maddy O.",
"phone": "123-456-1155",
"age": 41,
"credit_score": 782,
"address": {
"address_line": "1234 Maple Ave",
"city": "West New York",
"state": "NJ"
}
},
"purchase_date": {"$date": 1701191491},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": {
"car" : "Tesla Model 3",
"color": "White"
},
"amount": 52990,
"status" : "active"
},
{
"_id": "16",
"purchase_type": "In Person",
"$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
"customer": {
"name": "Tim C.",
"phone": "123-456-1166",
"age": 38,
"credit_score": 700,
"address": {
"address_line": "1234 Main St",
"city": "Staten Island",
"state": "NY"
}
},
"purchase_date": {"$date": 1701450691},
"seller": {
"name": "Tammy S.",
"location": "Staten Island NYC"
},
"items": [{
"car" : "Tesla Model Y",
"color": "White"
},
"Extended warranty - 5 years"
],
"amount": 58990,
"status" : "active"
},
{
"_id": "17",
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.09],
"customer": {
"name": "Yolanda Z.",
"phone": "123-456-1177",
"age": 61,
"credit_score": 694,
"address": {
"address_line": "1234 Main St",
"city": "Hoboken",
"state": "NJ"
}
},
"purchase_date": {"$date": 1702660291},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [{
"car" : "Tesla Model 3",
"color": "Blue"
},
"Extended warranty - 5 years"
],
"amount": 54900,
"status" : "active"
},
{
"_id": "18",
"purchase_type": "Online",
"$vector": [0.15, 0.17, 0.15, 0.43, 0.55],
"customer": {
"name": "Thomas D.",
"phone": "123-456-1188",
"age": 45,
"credit_score": 724,
"address": {
"address_line": "98980 20th St",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1703092291},
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [{
"car" : "BMW 750e xDrive Sedan",
"color": "Black"
},
"Extended warranty - 5 years"
],
"amount": 106900,
"status" : "active"
},
{
"_id": "19",
"purchase_type": "Online",
"$vector": [0.25, 0.25, 0.25, 0.25, 0.27],
"customer": {
"name": "Vivian W.",
"phone": "123-456-1199",
"age": 20,
"credit_score": 698,
"address": {
"address_line": "5678 Elm St",
"city": "Hartford",
"state": "CT"
}
},
"purchase_date": {"$date": 1704215491},
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": [{
"car" : "BMW 330i Sedan",
"color": "White"
},
"Extended warranty - 5 years"
],
"amount": 46980,
"status" : "active"
},
{
"_id": "20",
"purchase_type": "In Person",
"$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
"customer": {
"name": "Leslie E.",
"phone": null,
"age": 44,
"credit_score": 782,
"address": {
"address_line": "1234 Main St",
"city": "Newark",
"state": "NJ"
}
},
"purchase_date": {"$date": 1705338691},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [{
"car" : "Tesla Model Y",
"color": "Black"
},
"Extended warranty - 5 years"
],
"amount": 59800,
"status" : "active"
},
{
"_id": "21",
"purchase_type": "In Person",
"$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
"customer": {
"name": "Rachel I.",
"phone": null,
"age": 62,
"credit_score": 786,
"address": {
"address_line": "1234 Park Ave",
"city": "New York",
"state": "NY"
}
},
"purchase_date": {"$date": 1706202691},
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [{
"car" : "BMW M440i Gran Coupe",
"color": "Silver"
},
"Extended warranty - 5 years",
"Gap Insurance - 5 years"
],
"amount": 65250,
"status" : "active"
}
],
"options": {
"ordered": false
}
}
}' | json_pp
Response
{
"status" : {
"insertedIds" : [
"4",
"7",
"10",
"13",
"16",
"19",
"21",
"18",
"6",
"12",
"15",
"9",
"3",
"11",
"2",
"17",
"14",
"8",
"20",
"5"
]
}
}
Properties:
Name | Type | Summary |
---|---|---|
insertMany |
command |
Data API designation that many documents are being inserted. You can insert up to 20 documents at a time. |
document |
JSON object |
Contains the details of the record added. |
_id |
uuid4 |
A unique identifier for the document. Other |
purchase_type |
string |
Specifies how the purchase was made. |
$vector |
array |
A reserved property used to store vector data. The value is an array of numbers, or it can be generated. These numbers could be used for various purposes like similarity searches, clustering, or other mathematical commands that can be applied to vectors. Given that this is a reserved property, the vector-enabled Hyper-Converged Database (HCD) database has specialized handling for data stored in this format. That is, optimized query performance for vector similarity. |
customer |
string |
Information about the customer who made the purchase. |
customer.name |
string |
The customer’s name |
customer.phone |
string |
The customer’s contact phone number. |
customer.age |
number |
The customer’s age. Subsequent examples can use the |
customer.credit_score |
number |
The customer’s credit score at the time of the car’s purchase. Subsequent examples can use |
customer.address |
string |
Contains further details about the customer’s address. |
customer.address_line |
string |
The customer’s street or location address. |
customer.city |
string |
The customer’s city. |
customer.state |
string |
The state where the customer resides. |
purchase_date |
date |
The date on which the purchase was made, using the |
seller |
JSON object |
Information about the seller from whom the purchase was made. |
seller.name |
string |
The seller’s name. |
seller.location |
string |
The seller’s location. |
items |
JSON object |
An array detailing the items included in this purchase. |
items.car |
string |
Information about the make and model of the car. |
items.color |
string |
Information about the car’s color. |
Extended warranty - 5 years |
string |
Additional detail that’s part of the items array. Indicates the customer has an "Extended warranty - 5 years" as part of the purchase. |
amount |
number |
The total cost of the purchase. |
status |
string |
Current status of the purchase. |
preferred_customer |
boolean |
Whether the buyer is a preferred customer. |
Find a document
Retrieve a single document from a collection using various options.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
Retrieve a single document from a collection by its _id
.
document = collection.find_one({"_id": 101})
Retrieve a single document from a collection by any attribute, as long as it is covered by the collection’s indexing configuration.
As noted in The Indexing option in the Collections reference topic, any field that is part of a subsequent filter or sort operation must be an indexed field. If you elected to not index certain or all fields when you created the collection, you cannot reference that field in a filter/sort query. |
document = collection.find_one({"location": "warehouse_C"})
Retrieve a single document from a collection by an arbitrary filtering clause.
document = collection.find_one({"tag": {"$exists": True}})
Retrieve the most similar document to a given vector.
result = collection.find_one({}, vector=[.12, .52, .32])
Generate a vector and retrieve the most similar document.
result = collection.find_one({}, vectorize="Text to vectorize")
Retrieve only specific fields from a document.
result = collection.find_one({"_id": 101}, projection={"name": True})
Returns:
Union[Dict[str, Any], None]
- Either the found document as a dictionary or None if no matching document is found.
Example response
{'_id': 101, 'name': 'John Doe', '$vector': [0.12, 0.52, 0.32]}
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
projection |
|
Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search. That is, Approximate Nearest Neighbors (ANN) search, extracting the most similar document in the collection matching the filter. This parameter cannot be used together with |
vectorize |
|
A string to vectorize before performing a vector search.
This only works for collections associated with an embedding service.
This parameter cannot be used together with |
include_similarity |
|
A boolean to request the numeric value of the similarity to be returned as an added "$similarity" key in the returned document. Can only be used for vector ANN search, i.e. when either |
sort |
|
With this dictionary parameter one can control the order the documents are returned. See the discussion about sorting for details. |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
collection.find_one()
# prints: {'_id': '68d1e515-...', 'seq': 37}
collection.find_one({"seq": 10})
# prints: {'_id': 'd560e217-...', 'seq': 10}
collection.find_one({"seq": 1011})
# (returns None for no matches)
collection.find_one(projection={"seq": False})
# prints: {'_id': '68d1e515-...'}
collection.find_one(
{},
sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: {'_id': '97e85f81-...', 'seq': 69}
collection.find_one(vector=[1, 0], projection={"*": True})
# prints: {'_id': '...', 'tag': 'D', '$vector': [4.0, 1.0]}
View this topic in more detail on the API Reference.
Retrieve a single document from a collection by its _id
.
const doc = await collection.findOne({ _id: '101' });
Retrieve a single document from a collection by any attribute, as long as it is covered by the collection’s indexing configuration.
As noted in The Indexing option in the Collections reference topic, any field that is part of a subsequent filter or sort operation must be an indexed field. If you elected to not index certain or all fields when you created the collection, you cannot reference that field in a filter/sort query. |
const doc = await collection.findOne({ location: 'warehouse_C' });
Retrieve a single document from a collection by an arbitrary filtering clause.
const doc = await collection.findOne({ tag: { $exists: true } });
Retrieve the most similar document to a given vector.
const doc = await collection.findOne({}, { vector: [.12, .52, .32] });
Generate a vector and retrieve the most similar document.
const doc = await collection.findOne({}, { vectorize: 'Text to vectorize' });
Retrieve only specific fields from a document.
const doc = await collection.findOne({ _id: '101' }, { projection: { name: 1 } });
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to find. |
|
options? |
The options for this operation. |
Options (FindOneOptions
):
Name | Type | Summary |
---|---|---|
Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields. When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting. |
||
|
Requests the numeric value of the similarity to be returned as an added Can only be used when performing a vector search. |
|
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to vectorize before performing a vector search.
This only works for collections associated with an embedding service.
This parameter cannot be used together with |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<FoundDoc<Schema> | null>
- A promise that resolves
to the found document (inc. $similarity
if applicable), or null
if no matching document is found.
Example:
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
{ name: 'Jane', age: 25, },
{ name: 'Dave', age: 40, },
]);
// Unpredictably prints one of their names
const unpredictable = await collection.findOne({});
console.log(unpredictable?.name);
// Failed find by name (null)
const failed = await collection.findOne({ name: 'Carrie' });
console.log(failed);
// Find by $gt age (Dave)
const dave = await collection.findOne({ age: { $gt: 30 } });
console.log(dave?.name);
// Find by sorting by age (Jane)
const jane = await collection.findOne({}, { sort: { age: 1 } });
console.log(jane?.name);
// Find by vector similarity (John, 1)
const john = await collection.findOne({}, { vector: [1, 1, 1, 1, 1], includeSimilarity: true });
console.log(john?.name, john?.$similarity);
})();
-
Operations on documents are performed at
Collection
level, to get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
// Synchronous
Optional<T> findOne(Filter filter);
Optional<T> findOne(Filter filter, FindOneOptions options);
Optional<T> findById(Object id); // build the filter for you
// Asynchronous
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter);
CompletableFuture<Optional<DOC>> findOneAsync(Filter filter, FindOneOptions options);
CompletableFuture<Optional<DOC>> findByIdAsync(Filter filter);
Returns:
[Optional<T>
] - Return the working document matching the filter or Optional.empty()
if no document is found.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
Set the different options for the |
Things you must know about Data API requests:
|
{
"findOne": {
"filter": {
"$and": [
{"field2": {"$gt": 10}},
{"field3": {"$lt": 20}},
{"field4": {"$eq": "value"}}
]
},
"projection": {
"_id": 0,
"field": 1,
"field2": 1,
"field3": 1
},
"sort": {
"$vector": [ 0.25, 0.25, 0.25,0.25, 0.25]
},
"options": {
"includeSimilarity": true
}
}
}
To execute this exact query with Java here is the snippet
collection.findOne(
Filters.and(
Filters.gt("field2", 10),
Filters.lt("field3", 20),
Filters.eq("field4", "value")
),
new FindOneOptions()
.projection(Projections.include("field", "field2", "field3"))
.projection(Projections.exclude("_id"))
.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.includeSimilarity()
)
);
// with the import Static Magic
collection.findOne(
and(
gt("field2", 10),
lt("field3", 20),
eq("field4", "value")
),
vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.projection(Projections.include("field", "field2", "field3"))
.projection(Projections.exclude("_id"))
.includeSimilarity()
);
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.DataAPIOptions;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneOptions;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.and;
import static com.datastax.astra.client.model.Filters.eq;
import static com.datastax.astra.client.model.Filters.gt;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class FindOne {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Complete FindOne
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
FindOneOptions options = new FindOneOptions()
.projection(include("field", "field2", "field3"))
.projection(exclude("_id"))
.sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.includeSimilarity();
Optional<Document> result = collection.findOne(filter, options);
// with the import Static Magic
collection.findOne(and(
gt("field2", 10),
lt("field3", 20),
eq("field4", "value")),
new FindOneOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f})
.projection(include("field", "field2", "field3"))
.projection(exclude("_id"))
.includeSimilarity()
);
// find one with a vectorize
collection.findOne(and(
gt("field2", 10),
lt("field3", 20),
eq("field4", "value")),
new FindOneOptions().sort("Life is too short to be living somebody else's dream.")
.projection(include("field", "field2", "field3"))
.projection(exclude("_id"))
.includeSimilarity()
);
collection.insertOne(new Document()
.append("field", "value")
.append("field2", 15)
.append("field3", 15)
.vectorize("Life is too short to be living somebody else's dream."));
}
}
This Data API findOne
command retrieves a document based on a filter using a specific _id
value.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"findOne": {
"filter": {"_id" : "14"}
}
}' | json_pp
Result:
{
"data" : {
"document" : {
"$vector" : [
0.11,
0.02,
0.78,
0.21,
0.27
],
"_id" : "14",
"amount" : 110400,
"customer" : {
"address" : {
"address_line" : "1414 14th Pl",
"city" : "Brooklyn",
"state" : "NY"
},
"age" : 44,
"credit_score" : 702,
"name" : "Kris S.",
"phone" : "123-456-1144"
},
"items" : [
{
"car" : "Tesla Model X",
"color" : "White"
}
],
"purchase_date" : {
"$date" : 1698513091
},
"purchase_type" : "In Person",
"seller" : {
"location" : "Brooklyn NYC",
"name" : "Jasmine S."
},
"status" : "active"
}
}
}
Find documents using filtering options
Iterate over documents in a collection matching a given filter.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
doc_iterator = collection.find({"category": "house_appliance"}, limit=10)
Iterate over the documents most similar to a given query vector.
doc_iterator = collection.find({}, vector=[0.55, -0.40, 0.08], limit=5)
Generate a vector and iterate over the documents most similar to it.
doc_iterator = collection.find({}, vectorize="Text to vectorize", limit=5)
Returns:
Cursor
- A cursor for iterating over documents. An AstraPy cursor can
be used in a for loop, and provides a few additional features.
Example response
Cursor("vector_collection", new, retrieved: 0)
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
projection |
|
Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. |
skip |
|
With this integer parameter, what would be the first |
limit |
|
This (integer) parameter sets a limit over how many documents are returned. Once |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search; that is, Approximate Nearest Neighbors (ANN) search. When running similarity search on a collection, no other sorting criteria can be specified. Moreover, there is an upper bound to the number of documents that can be returned. For details, see the Data API Limits. |
vectorize |
|
A string to vectorize before performing a vector search.
This only works for collections associated with an embedding service.
This parameter cannot be used together with |
include_similarity |
|
A boolean to request the numeric value of the similarity to be returned as an added "$similarity" key in each returned document. Can only be used for vector ANN search, i.e. when either |
sort |
|
With this dictionary parameter one can control the order the documents are returned. See the discussion about sorting, including the note on upper bounds on the number of visited documents, for details. |
max_time_ms |
|
A timeout, in milliseconds, for each underlying HTTP request used to fetch documents as you iterate over the cursor. This method uses the collection-level timeout by default. |
Example:
# Find all documents in the collection
list(collection.find({}))
# Find all documents in the collection with a specific field value
list(collection.find({
"a": 123,
}))
# Find all documents in the collection that match a compound filter expression
list(collection.find({
"$and": [
{"f1": 1},
{"f2": 2},
]
}))
# Same as the preceeding example, but using the implicit AND operator
list(collection.find({
"f1": 1,
"f2": 2,
}))
# Use the "less than" operator in the filter expression
list(collection.find({
"$and": [
{"name": "John"},
{"price": {"$lt": 100}},
]
}))
View this topic in more detail on the API Reference.
const cursor = collection.find({ category: 'house_appliance' }, { limit: 10 });
Iterate over the documents most similar to a given query vector.
const cursor = collection.find({}, { vector: [0.55, -0.40, 0.08], limit: 5 });
Generate a vector and iterate over the documents most similar to it.
const cursor = collection.find({}, { vectorize: 'Text to vectorize', limit: 5 });
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to find. |
|
options? |
The options for this operation. |
Options (FindOptions
):
Name | Type | Summary |
---|---|---|
Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields. When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting. |
||
|
Requests the numeric value of the similarity to be returned as an added Can only be used when performing a vector search. |
|
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to vectorize before performing a vector search.
This only works for collections associated with an embedding service.
This parameter cannot be used together with |
|
|
The number of documents to skip before returning the first document. |
|
|
The maximum number of documents to return in the lifetime of the cursor. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
Returns:
FindCursor<FoundDoc<Schema>>
- A cursor for iterating over
the matching documents.
Example:
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'John', age: 30, $vector: [1, 1, 1, 1, 1] },
{ name: 'Jane', age: 25, },
{ name: 'Dave', age: 40, },
]);
// Gets all 3 in some order
const unpredictable = await collection.find({}).toArray();
console.log(unpredictable);
// Failed find by name ([])
const matchless = await collection.find({ name: 'Carrie' }).toArray();
console.log(matchless);
// Find by $gt age (John, Dave)
const gtAgeCursor = collection.find({ age: { $gt: 25 } });
for await (const doc of gtAgeCursor) {
console.log(doc.name);
}
// Find by sorting by age (Jane, John, Dave)
const sortedAgeCursor = collection.find({}, { sort: { age: 1 } });
await sortedAgeCursor.forEach(console.log);
// Find first by vector similarity (John, 1)
const john = await collection.find({}, { vector: [1, 1, 1, 1, 1], includeSimilarity: true }).next();
console.log(john?.name, john?.$similarity);
})();
-
Operations on documents are performed at
Collection
level. To get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
// Synchronous
FindIterable<T> find(Filter filter, FindOptions options);
// Helper to build filter and options above ^
FindIterable<T> find(FindOptions options); // no filter
FindIterable<T> find(Filter filter); // default options
FindIterable<T> find(); // default options + no filters
FindIterable<T> find(float[] vector, int limit); // semantic search
FindIterable<T> find(Filter filter, float[] vector, int limit);
Returns:
FindIterable<T>
- A cursor where the first up to 20 documents are fetched and the rest are fetched as needed. As the same stated it is an Iterable
.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
Set the different options for the |
The The It provides the method |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class Find {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Find Options
FindOptions options = new FindOptions()
.projection(include("field", "field2", "field3")) // select fields
.projection(exclude("_id")) // exclude some fields
.sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}) // similarity vector
.skip(1) // skip first item
.limit(10) // stop after 10 items (max records)
.pageState("pageState") // used for pagination
.includeSimilarity(); // include similarity
// Execute a find operation
FindIterable<Document> result = collection.find(filter, options);
// Iterate over the result
for (Document document : result) {
System.out.println(document);
}
}
}
There are two examples with Data API find
filters in this cURL section.
The first example uses a filter specifying two properties, customer.address.city
and customer.address.state
, to look for car sales by customers in Hoboken, NJ.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"find": {
"filter": {
"customer.address.city": "Hoboken",
"customer.address.state": "NJ"
}
}
}' | json_pp
Result:
{
"data" : {
"documents" : [
{
"$vector" : [
0.1,
0.15,
0.3,
0.12,
0.09
],
"_id" : "17",
"amount" : 54900,
"customer" : {
"address" : {
"address_line" : "1234 Main St",
"city" : "Hoboken",
"state" : "NJ"
},
"age" : 61,
"credit_score" : 694,
"name" : "Yolanda Z.",
"phone" : "123-456-1177"
},
"items" : [
{
"car" : "Tesla Model 3",
"color" : "Blue"
},
"Extended warranty - 5 years"
],
"purchase_date" : {
"$date" : 1702660291
},
"purchase_type" : "Online",
"seller" : {
"location" : "Jersey City NJ",
"name" : "Jim A."
},
"status" : "active"
}
],
"nextPageState" : null
}
}
Parameters:
Name | Type | Summary |
---|---|---|
find |
command |
Selects and returns documents from a collection based on a specified criteria. |
filter |
object |
Contains the criteria that the |
|
string |
Query values in this example that find customers from Hoboken, NJ. |
This next Data API find
example uses the $and
and $or
logical operators in a filter. The goal is to find documents where the customer’s city is "Jersey City" or "Orange" AND the seller’s name is "Jim A." or "Tammy S.". For a document to be returned, both these primary conditions (customer’s city and seller’s name) must be met.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"find": {
"filter": {
"$and": [
{
"$or": [
{
"customer.address.city": "Jersey City"
},
{
"customer.address.city": "Orange"
}
]
},
{
"$or": [
{
"seller.name": "Jim A."
},
{
"seller.name": "Tammy S."
}
]
}
]
}
}
}' | json_pp
Result:
{
"data" : {
"documents" : [
{
"$vector" : [
0.3,
0.23,
0.15,
0.17,
0.4
],
"_id" : "8",
"amount" : 46900,
"customer" : {
"address" : {
"address_line" : "1234 Main St",
"city" : "Orange",
"state" : "NJ"
},
"age" : 29,
"credit_score" : 710,
"name" : "Harold S.",
"phone" : "123-456-8888"
},
"items" : [
{
"car" : "BMW X3 SUV",
"color" : "Black"
},
"Extended warranty - 5 years"
],
"purchase_date" : {
"$date" : 1693329091
},
"purchase_type" : "In Person",
"seller" : {
"location" : "Staten Island NYC",
"name" : "Tammy S."
},
"status" : "active"
},
{
"$vector" : [
0.25,
0.045,
0.38,
0.31,
0.67
],
"_id" : "5",
"amount" : 94990,
"customer" : {
"address" : {
"address_line" : "32345 Main Ave",
"city" : "Jersey City",
"state" : "NJ"
},
"age" : 50,
"credit_score" : 800,
"name" : "David C.",
"phone" : "123-456-5555"
},
"items" : [
{
"car" : "Tesla Model S",
"color" : "Red"
},
"Extended warranty - 5 years"
],
"purchase_date" : {
"$date" : 1690996291
},
"purchase_type" : "Online",
"seller" : {
"location" : "Jersey City NJ",
"name" : "Jim A."
},
"status" : "active"
}
],
"nextPageState" : null
}
}
Parameters:
Name | Type | Summary |
---|---|---|
find |
command |
Selects and returns documents from collections based on a specified criteria. |
filter |
object |
Contains the criteria that the |
$and |
logical operator |
Ensures all nested conditions must be met for a record to be returned. |
$or |
logical operator |
A logical operator where any one of the nested conditions must be met. In this example, the first |
Example values for sort operations
-
Python
-
TypeScript
-
Java
-
cURL
When no particular order is required:
sort={} # (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
from astrapy.constants import SortDocuments
sort={"field": SortDocuments.ASCENDING}
sort={"field": SortDocuments.DESCENDING}
When sorting first by "field" and then by "subfield"
(while modern Python versions preserve the order of dictionaries,
it is suggested for clarity to employ a collections.OrderedDict
in these cases):
sort={
"field": SortDocuments.ASCENDING,
"subfield": SortDocuments.ASCENDING,
}
When running a vector similarity (ANN) search:
sort={"$vector": [0.4, 0.15, -0.5]}
Generate a vector to perform a vector similarity search. The collection must be associated with an embedding service.
sort={"$vectorize": "Text to vectorize"}
Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:
Keep in mind these provisions even when subsequently running a command such as |
When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents. The behavior of the cursor — in the case that documents have been added/removed after the |
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
print(doc["seq"])
...
# will print e.g.:
# 37
# 35
# 10
# 36
# 27
cursor1 = collection.find(
{},
limit=4,
sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
{"tag": "A", "$vector": [4, 5]},
{"tag": "B", "$vector": [3, 4]},
{"tag": "C", "$vector": [3, 2]},
{"tag": "D", "$vector": [4, 1]},
{"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
document["tag"]
for document in collection.find(
{},
limit=3,
vector=[3, 3],
)
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)
|
When no particular order is required:
{ sort: {} } // (default when parameter not provided)
When sorting by a certain value in ascending/descending order:
{ sort: { field: +1 } } // ascending
{ sort: { field: -1 } } // descending
When sorting first by "field" and then by "subfield" (order matters! ES2015+ guarantees string keys in order of insertion):
{ sort: { field: 1, subfield: 1 } }
When running a vector similarity (ANN) search:
{ sort: { $vector: [0.4, 0.15, -0.5] } }
Generate a vector to perform a vector similarity search. The collection must be associated with an embedding service.
{ sort: { $vectorize: "Text to vectorize" } }
Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:
Keep in mind these provisions even when subsequently running a command such as |
When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents. The behavior of the cursor — in the case that documents have been added/removed after the |
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([
{ name: 'Jane', age: 25, $vector: [1.0, 1.0, 1.0, 1.0, 1.0] },
{ name: 'Dave', age: 40, $vector: [0.4, 0.5, 0.6, 0.7, 0.8] },
{ name: 'Jack', age: 40, $vector: [0.1, 0.9, 0.0, 0.5, 0.7] },
]);
// Sort by age ascending, then by name descending (Jane, Jack, Dave)
const sorted1 = await collection.find({}, { sort: { age: 1, name: -1 } }).toArray();
console.log(sorted1.map(d => d.name));
// Sort by vector distance (Jane, Dave, Jack)
const sorted2 = await collection.find({}, { vector: [1, 1, 1, 1, 1] }).toArray();
console.log(sorted2.map(d => d.name));
})();
-
Use the
sort()
operations in different options only is you need them, it is optional -
It is important to keep the order when chaining multiple sorts.
Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
FindOptions.Builder.sort(s1, s2);
-
When running a vector similarity (ANN) search:
FindOptions.Builder
.sort(new float[] {0.4f, 0.15f, -0.5f});
-
Generate a vector to perform a vector similarity search.
FindOptions.Builder
.sort("Text to vectorize");
Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:
Keep in mind these provisions even when subsequently running a command such as |
When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents. The behavior of the cursor — in the case that documents have been added/removed after the |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindOptions;
import com.datastax.astra.client.model.Sort;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
public class WorkingWithSorts {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Sort Clause for a vector
Sorts.vector(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f});;
// Sort Clause for other fields
Sort s1 = Sorts.ascending("field1");
Sort s2 = Sorts.descending("field2");
// Build the sort clause
new FindOptions().sort(s1, s2);
// Adding vector
new FindOptions().sort(new float[] {0.25f, 0.25f, 0.25f,0.25f, 0.25f}, s1, s2);
}
}
This Data API command aims to find
and sort
documents that are most similar to the specified vector, based on a similarity metric, and uses a projection
clause to project specific properties from those documents in the response. The $similarity
score (such as 0.99444735
) is useful for understanding how close each result is to the queried vector.
|
In this example response, only the $vector
and $similarity
properties are returned for each document, making the output more focused and potentially reducing the amount of data transferred.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"find": {
"sort" : {"$vector" : [0.15, 0.1, 0.1, 0.35, 0.55]},
"projection" : {"$vector" : 1},
"options" : {
"includeSimilarity" : true,
"limit" : 100
}
}
}' | json_pp
Response:
{
"data" : {
"documents" : [
{
"$similarity" : 1,
"$vector" : [
0.15,
0.1,
0.1,
0.35,
0.55
],
"_id" : "3"
},
{
"$similarity" : 0.9953563,
"$vector" : [
0.15,
0.17,
0.15,
0.43,
0.55
],
"_id" : "18"
},
{
"$similarity" : 0.9732053,
"$vector" : [
0.21,
0.22,
0.33,
0.44,
0.53
],
"_id" : "21"
},
{
"$similarity" : 0.9732053,
"$vector" : [
0.21,
0.22,
0.33,
0.44,
0.53
],
"_id" : "7"
},
{
"$similarity" : 0.96955204,
"$vector" : [
0.25,
0.045,
0.38,
0.31,
0.68
],
"_id" : "10"
},
{
"$similarity" : 0.9691053,
"$vector" : [
0.25,
0.045,
0.38,
0.31,
0.67
],
"_id" : "5"
},
{
"$similarity" : 0.9600924,
"$vector" : [
0.44,
0.11,
0.33,
0.22,
0.88
],
"_id" : "11"
},
{
"$similarity" : 0.9600924,
"$vector" : [
0.44,
0.11,
0.33,
0.22,
0.88
],
"_id" : "20"
},
{
"$similarity" : 0.9600924,
"$vector" : [
0.44,
0.11,
0.33,
0.22,
0.88
],
"_id" : "16"
},
{
"$similarity" : 0.9468591,
"$vector" : [
0.33,
0.44,
0.55,
0.77,
0.66
],
"_id" : "12"
},
{
"$similarity" : 0.94535017,
"$vector" : [
0.3,
0.23,
0.15,
0.17,
0.4
],
"_id" : "8"
},
{
"$similarity" : 0.9163125,
"$vector" : [
0.25,
0.25,
0.25,
0.25,
0.27
],
"_id" : "19"
},
{
"$similarity" : 0.91263497,
"$vector" : [
0.25,
0.25,
0.25,
0.25,
0.26
],
"_id" : "4"
},
{
"$similarity" : 0.9087937,
"$vector" : [
0.25,
0.25,
0.25,
0.25,
0.25
],
"_id" : "1"
},
{
"$similarity" : 0.7909429,
"$vector" : [
0.1,
0.15,
0.3,
0.12,
0.09
],
"_id" : "17"
},
{
"$similarity" : 0.7820388,
"$vector" : [
0.1,
0.15,
0.3,
0.12,
0.08
],
"_id" : "15"
},
{
"$similarity" : 0.77284586,
"$vector" : [
0.1,
0.15,
0.3,
0.12,
0.07
],
"_id" : "13"
},
{
"$similarity" : 0.7711377,
"$vector" : [
0.11,
0.02,
0.78,
0.21,
0.27
],
"_id" : "14"
},
{
"$similarity" : 0.76337516,
"$vector" : [
0.1,
0.15,
0.3,
0.12,
0.06
],
"_id" : "9"
},
{
"$similarity" : 0.75363994,
"$vector" : [
0.1,
0.15,
0.3,
0.12,
0.05
],
"_id" : "2"
},
{
"$similarity" : 0.74406904,
"$vector" : [
0.11,
0.02,
0.78,
0.1,
0.27
],
"_id" : "6"
}
],
"nextPageState" : null
}
}
Parameters:
Name | Type | Summary |
---|---|---|
find |
command |
A "find" or search command is to be executed. It contains nested JSON objects that define the search criteria, projection, and other options. |
sort |
clause |
Specifies the vector against which other vectors in the vector-enabled Hyper-Converged Database (HCD) database are to be compared. The |
projection |
clause |
Specify which properties should be included in the returned documents. |
includeSimilarity |
boolean |
Setting this boolean to
|
limit |
number |
Specifies the maximum number of documents to be returned. It’s set to |
Example values for projection operations
Certain document operations — such as finding one or multiple documents, find-and-update,
find-and-replace, and find-and-delete — allow the use of a projection
option to control
which part of the document(s) is returned. The projection can generally take one of two
forms: either specifying which fields to include or which fields to exclude.
If no projection, or an empty projection, is specified, a default projection is applied by the Data API.
This default projection includes at least the identifier (_id
) of the document
and all its "regular" fields, which are those not starting with a dollar sign.
However, future versions of the Data API might
exclude other fields (such as $vector
) from the documents by default.
When a projection is provided, specific, individually overridable
inclusion and exclusion defaults apply for "special" fields,
such as _id
, $vector
, and $vectorize
.
Conversely, for the regular fields the projection must either list included
fields or excluded ones and cannot be a mixture of the two types of specifications.
In order to optimize the response size, a recommended performance improvement is to always provide, when reading, an explicit projection tailored to the needs of the application. If an application relies on the presence of A quick, if possibly suboptimal, way to ensure the presence of fields is
to use the |
A projection is expressed as a mapping of field names to boolean values.
To return the document ID, field1
, and field2
:
{"_id": true, "field1": true, "field2": true}
Specific fields can be excluded, keeping any other field found in the document:
{"field1": false, "field2": false}
Fields specified in the projection but not encountered in the document are simply ignored for that document.
The projection cannot mix include and exclude clauses for regular fields. In other words, it must either have all true or all false values. If a projection has false values, all non-mentioned fields found in the document are included; conversely, if it has true values, all non-mentioned fields in the document are excluded.
Special fields (_id
, $vector
, and $vectorize
)
behave differently, in that they have their own default and their presence
can be controlled in any way within the projection.
For example, the _id
field is included by default and can be excluded even in
an include-clause projection ({"_id": talse, "field1": true}
); conversely.
the $vector
field is excluded by default and can be included even in an exclude
projection ({"field1": false, "$vector": true}
).
So, the following are all valid projections:
{"_id": true, "field1": true, "field2": true}
{"_id": false, "field1": true, "field2": true}
{"_id": false, "field1": false, "field2": false}
{"_id": true, "field1": false, "field2": false}
{"_id": true, "field1": true, "field2": true, "$vector": true}
{"_id": true, "field1": true, "field2": true, "$vector": false}
{"_id": false, "field1": true, "field2": true, "$vector": true}
{"_id": false, "field1": true, "field2": true, "$vector": false}
{"_id": false, "field1": false, "field2": false, "$vector": true}
{"_id": false, "field1": false, "field2": false, "$vector": false}
{"_id": true, "field1": false, "field2": false, "$vector": true}
{"_id": true, "field1": false, "field2": false, "$vector": false}
However, the following projection is invalid and will result in an API error:
// Invalid:
{"field1": true, "field2": false}
The special projection path "*"
("star-projection"), which must be the only key in the projection,
represents the whole of the document. With the following projection all of the document
is returned:
{"*": true}
Conversely, with the following any document would return as {}
:
{"*": false}
The values in a projection map can be objects, booleans or number (decimal or integer), but are then treated as booleans by the API. The following two examples include and exclude the four fields respectively:
{"field1": true, "field2": 1, "field3": 90.0, "field4": {"keep": "yes!"}}
{"field1": false, "field2": 0, "field3": 0.0, "field4": {}}
Passing null-like things (such as {}
, null
or 0
) for the whole projection
has the same effect as not passing it altogether.
The projection cannot include the special $similarity
key — which is not part
of the document but is rather computed during vector ANN queries and is controlled
through a specific includeSimilarity
parameter in the search payload.
However, for array fields, a $slice
can be provided to specify which elements of the array
to return. It can be in one of the following formats:
// Return the first two elements
{"arr": {"$slice": 2}}
// Return the last two elements
{"arr": {"$slice": -2}}
// Skip 4 elements (from 0th index), return the next 2
{"arr": {"$slice": [4, 2]}}
// Skip backward 4 elements (from the end), return next 2 elements (forward)
{"arr": {"$slice": [-4, 2]}}
The projection can also refer to nested fields: in that case, keys in a subdocument will be included/excluded as requested. If all keys of an existing subdocument are excluded, the document will be returned with the subdocument still present, but consisting of an empty object:
Given the following document:
{
"_id": "z",
"a": {
"a1": 10,
"a2": 20
}
}
Here the result of different projections can be seen:
Projection | Result |
---|---|
|
|
|
|
|
|
|
|
|
|
Referencing overlapping (sub/)paths in the projection may lead to (possibly) conflicting clauses. These are rejected, so for instance this would yield an API error:
// Invalid:
{"a.a1": true, "a": true}
-
Python
-
TypeScript
-
Java
For the Python client, the type of the projection
argument can be not only
a Dict[str, Any]
in compliance with the general provisions above, but it can
also be a list — or other iterable — over key names. In this case it is implied
that there are all included in the projection. So, the two following statements
are equivalent:
document = collection.find_one(
{"_id": 101},
projection={"name": True, "city": True},
)
document = collection.find_one(
{"_id": 101},
projection={"name": True, "city": True},
)
The Typescript client simply takes in an untyped Plain Old JavaScript Object (POJO) for the projection
parameter.
However, it offers a StrictProjection<Schema>
type that provides
full autocomplete and type checking for your document schema.
import { StrictProjection } from '@datastax/astra-db-ts';
const doc = await collection.findOne({}, {
projection: {
'name': true,
'address.city': true,
},
});
interface MySchema {
name: string,
address: {
city: string,
state: string,
},
}
const doc = await collection.findOne({}, {
projection: {
'name': 1,
'address.city': 1,
// @ts-expect-error - 'address.car'
does not exist in type StrictProjection<MySchema>
'address.car': 0,
// @ts-expect-error - Type { $slice: number }
is not assignable to type boolean | 0 | 1 | undefined
'address.state': { $slice: 3 }
} satisfies StrictProjection<MySchema>,
});
To support the projection mechanism, the different Options
classes provide the projection
method
in the helpers. This method takes an array of Projection
classes providing the field name and
a boolean flag to choose between inclusion and exclusion.
Projection p1 = new Projection("field1", true);
Projection p2 = new Projection("field2", true);
FindOptions options1 = FindOptions.Builder.projection(p1, p2);
This syntax can be simplified by leveraging the syntactic sugar called Projections
:
FindOptions options2 = FindOptions.Builder
.projection(Projections.include("field1", "field2"));
FindOptions options3 = FindOptions.Builder
.projection(Projections.exclude("field1", "field2"));
When it comes to support of $slice
for array fields, the Projection
class provides a method as well:
// {"arr": {"$slice": 2}}
Projection sliceOnlyStart = Projections.slice("arr", 2, null);
// {"arr": {"$slice": [-4, 2]}}
Projection sliceOnlyRange =Projections.slice("arr", -4, 2);
// An you can use then freely in the different builders
FindOptions options4 = FindOptions.Builder
.projection(sliceOnlyStart);
Find and update a document
Locate a document matching a filter condition and apply changes to it, returning the document itself.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
collection.find_one_and_update(
{"Marco": {"$exists": True}},
{"$set": {"title": "Mr."}},
)
Locate and update a document, returning the document itself, creating a new one if nothing is found.
collection.find_one_and_update(
{"Marco": {"$exists": True}},
{"$set": {"title": "Mr."}},
upsert=True,
)
Returns:
Dict[str, Any]
- The document that was found, either before or after the update
(or a projection thereof, as requested). If no matches are found, None
is returned.
Example response
{'_id': 999, 'Marco': 'Polo'}
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
update |
|
The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: |
projection |
|
Used to select a subset of fields in the document being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with |
vectorize |
|
A string to be vectorized and used as the sorting criterion in a vector search.
This parameter cannot be used together with |
sort |
|
With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the updated one. See the |
upsert |
|
This parameter controls the behavior in absence of matches. If True, a new document (resulting from applying the |
return_document |
|
A flag controlling what document is returned: if set to |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_one({"Marco": "Polo"})
collection.find_one_and_update(
{"Marco": {"$exists": True}},
{"$set": {"title": "Mr."}},
)
# prints: {'_id': 'a80106f2-...', 'Marco': 'Polo'}
collection.find_one_and_update(
{"title": "Mr."},
{"$inc": {"rank": 3}},
projection={"title": True, "rank": True},
return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'a80106f2-...', 'title': 'Mr.', 'rank': 3}
collection.find_one_and_update(
{"name": "Johnny"},
{"$set": {"rank": 0}},
return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_update(
{"name": "Johnny"},
{"$set": {"rank": 0}},
upsert=True,
return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'cb4ef2ab-...', 'name': 'Johnny', 'rank': 0}
View this topic in more detail on the API Reference.
const docBefore = await collection.findOneAndUpdate(
{ $and: [{ name: 'Jesse' }, { gender: 'M' }] },
{ $set: { title: 'Mr.' } },
{ returnDocument: 'before' },
);
Locate and update a document, returning the document itself, creating a new one if nothing is found.
const docBefore = await collection.findOneAndUpdate(
{ $and: [{ name: 'Jesse' }, { gender: 'M' }] },
{ $set: { title: 'Mr.' } },
{ upsert: true, returnDocument: 'before' },
);
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to update. |
|
update |
The update to apply to the selected document. |
|
options |
The options for this operation. |
Options (FindOneAndUpdateOptions
):
Name | Type | Summary |
---|---|---|
|
Specifies whether to return the original or updated document. |
|
|
If true, creates a new document if no document matches the filter. |
|
Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields. When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting. Can only be used when performing a vector search. |
||
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to be vectorized and used as the sorting criterion in a vector search. Equivalent to setting the If you really need to use both, you can set the |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
|
|
When true, returns alongside the document, an ok field with a value of 1 if the command executed successfully. |
Returns:
Promise<WithId<Schema> | null>
- The document before/after
the update, depending on the type of returnDocument
, or null
if no matches are found.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert a document
await collection.insertOne({ 'Marco': 'Polo' });
// Prints 'Mr.'
const updated1 = await collection.findOneAndUpdate(
{ 'Marco': 'Polo' },
{ $set: { title: 'Mr.' } },
{ returnDocument: 'after' },
);
console.log(updated1?.title);
// Prints { _id: ..., title: 'Mr.', rank: 3 }
const updated2 = await collection.findOneAndUpdate(
{ title: 'Mr.' },
{ $inc: { rank: 3 } },
{ projection: { title: 1, rank: 1 }, returnDocument: 'after' },
);
console.log(updated2);
// Prints null
const updated3 = await collection.findOneAndUpdate(
{ name: 'Johnny' },
{ $set: { rank: 0 } },
{ returnDocument: 'after' },
);
console.log(updated3);
// Prints { _id: ..., name: 'Johnny', rank: 0 }
const updated4 = await collection.findOneAndUpdate(
{ name: 'Johnny' },
{ $set: { rank: 0 } },
{ upsert: true, returnDocument: 'after' },
);
console.log(updated4);
})();
-
Operations on documents are performed at
Collection
level, to get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
// Synchronous
Optional<T> findOneAndUpdate(Filter filter, Update update);
// Synchronous
CompletableFuture<Optional<T>> findOneAndUpdateAsync(Filter filter, Update update);
Returns:
[Optional<T>
] - Return the working document matching the filter or Optional.empty()
if no document is found.
Parameters:
Name | Type | Summary |
---|---|---|
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
|
Set the different options for the |
What you need to know: To build the different parts of the requests a set of helper classes are provided suffixed by a Update is no different and you can leverage the class
|
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.Updates;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.lt;
public class FindOneAndUpdate {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Building the update
Update update = Updates.set("field1", "value1")
.inc("field2", 1d)
.unset("field3");
Optional<Document> doc = collection.findOneAndUpdate(filter, update);
}
}
The following Data API findOneAndUpdate
command uses the $sort
and $set
operators to update the status
of one matching document (per $vector
) as active
.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"findOneAndUpdate": {
"sort": {
"$vector": [
0.25,
0.045,
0.38,
0.31,
0.67
]
},
"update": {
"$set": {
"status": "active"
}
},
"options": {
"returnDocument": "after"
}
}
}' | json_pp
Response:
In this case, notice that the response returns a |
{
"data": {
"document": {
"_id": "5",
"purchase_type": "Online",
"$vector": [
0.25,
0.045,
0.38,
0.31,
0.67
],
"customer": {
"name": "David C.",
"phone": "123-456-5555",
"age": 50,
"credit_score": 800,
"address": {
"address_line": "32345 Main Ave",
"city": "Jersey City",
"state": "NJ"
}
},
"purchase_date": {
"$date": 1690996291
},
"seller": {
"name": "Jim A.",
"location": "Jersey City NJ"
},
"items": [
{
"car": "Tesla Model S",
"color": "Red"
},
"Extended warranty - 5 years"
],
"amount": 94990,
"status": "active"
}
},
"status": {
"matchedCount": 1,
"modifiedCount": 0
}
}
Parameters:
Name | Type | Summary |
---|---|---|
findOneAndUpdate |
command |
Find one document based on certain criteria and determine if the document should be updated. |
sort |
clause |
Contains an object specifying the sort criteria for selecting the document. |
$vector |
array |
Indicates a vector-based sort operation, where the documents are sorted based on the provided vector values. In this example, |
$vectorize |
string |
A string to be vectorized and used as the sorting criterion in a vector search. |
update |
clause |
Contains the changes to be applied to the selected document. |
$set |
Update operator |
Used to set the value of a field. Here, it is used to set the |
options |
clause |
Provides additional settings for the |
returnDocument |
clause |
In this example, the |
Update a document
Update a single document on the collection as requested.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
update_result = collection.update_one(
{"_id": 456},
{"$set": {"name": "John Smith"}},
)
Update a single document on the collection, inserting a new one if no match is found.
update_result = collection.update_one(
{"_id": 456},
{"$set": {"name": "John Smith"}},
upsert=True,
)
Returns:
UpdateResult
- An object representing the response from the database after the update operation. It includes information about the operation.
Example response
UpdateResult(raw_results=[{'data': {'document': {'_id': '1', 'name': 'John Doe'}}, 'status': {'matchedCount': 1, 'modifiedCount': 1}}], update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
update |
|
The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with |
vectorize |
|
A string to be vectorized and used as the sorting criterion in a vector search.
This parameter cannot be used together with |
sort |
|
With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the updated one. See the |
upsert |
|
This parameter controls the behavior in absence of matches. If True, a new document (resulting from applying the |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_one({"Marco": "Polo"})
collection.update_one({"Marco": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
collection.update_one({"Mirko": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.update_one(
{"Mirko": {"$exists": True}},
{"$inc": {"rank": 3}},
upsert=True,
)
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '2a45ff60-...'})
View this topic in more detail on the API Reference.
const result = await collection.updateOne(
{ $and: [{ name: 'Jesse' }, { gender: 'M' }] },
{ $set: { title: 'Mr.' } },
);
Update a single document on the collection, inserting a new one if no match is found.
const result = await collection.updateOne(
{ $and: [{ name: 'Jesse' }, { gender: 'M' }] },
{ $set: { title: 'Mr.' } },
{ upsert: true },
);
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to update. |
|
update |
The update to apply to the selected document. |
|
options? |
The options for this operation. |
Options (UpdateOneOptions
):
Name | Type | Summary |
---|---|---|
|
If true, creates a new document if no document matches the filter. |
|
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to be vectorized and used as the sorting criterion in a vector search. Equivalent to setting the If you really need to use both, you can set the |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
Returns:
Promise<UpdateOneResult<Schema>>
- The result of the
update operation.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert a document
await collection.insertOne({ 'Marco': 'Polo' });
// Prints 1
const updated1 = await collection.updateOne(
{ 'Marco': 'Polo' },
{ $set: { title: 'Mr.' } },
);
console.log(updated1?.modifiedCount);
// Prints 0 0
const updated2 = await collection.updateOne(
{ name: 'Johnny' },
{ $set: { rank: 0 } },
);
console.log(updated2.matchedCount, updated2?.upsertedCount);
// Prints 0 1
const updated3 = await collection.updateOne(
{ name: 'Johnny' },
{ $set: { rank: 0 } },
{ upsert: true },
);
console.log(updated3.matchedCount, updated3?.upsertedCount);
})();
-
Operations on documents are performed at
Collection
level, to get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
// Synchronous
UpdateResult updateOne(Filter filter, Update update);
// Asynchronous
CompletableFuture<UpdateResult<T>> updateOneAsync(Filter filter, Update update);
Returns:
UpdateResults<T>
- Result of the operation with the number of documents matched (matchedCount
) and updated (modifiedCount
)
Parameters:
Name | Type | Summary |
---|---|---|
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
|
Set the different options for the |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.UpdateResult;
import com.datastax.astra.client.model.Updates;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.lt;
public class UpdateOne {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Building the update
Update update = Updates.set("field1", "value1")
.inc("field2", 1d)
.unset("field3");
UpdateResult result = collection.updateOne(filter, update);
}
The following Data API updateOne
command uses the $set
update operator to set the value of a property (which uses the dot notation customer.name
) to a new value.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"updateOne": {
"filter": {
"_id": "upsert-id"
},
"update" : {"$set" : { "customer.name" : "CUSTOMER 22"}}
}
}' | json_pp
Response:
{
"status" : {
"matchedCount" : 1,
"modifiedCount" : 1
}
}
Parameters:
Name | Type | Summary |
---|---|---|
updateOne |
command |
Updates a single document that matches the given criteria within a database collection. |
filter |
clause |
Used to select the document to be updated. |
_id |
key |
This key within the |
update |
object |
Specifies what updates are applied to the document that meets the filter criteria. It’s an object that contains database update operators and the modifications they perform. |
$set |
Update operator |
Sets the value of a property in a document. In this example, |
Update multiple documents
Update multiple documents in a collection.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
results = collection.update_many(
{"name": {"$exists": False}},
{"$set": {"name": "unknown"}},
)
Update multiple documents in a collection, inserting a new one if no matches are found.
results = collection.update_many(
{"name": {"$exists": False}},
{"$set": {"name": "unknown"}},
upsert=True,
)
Returns:
UpdateResult
- An object representing the response from the database after the update operation. It includes information about the operation.
Example response
UpdateResult(raw_results=[{'status': {'matchedCount': 2, 'modifiedCount': 2}}], update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2})
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
update |
|
The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: |
upsert |
|
This parameter controls the behavior in absence of matches. If True, a single new document (resulting from applying |
max_time_ms |
|
A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. You may need to increase the timeout duration when updating a large number of documents, as the update will require multiple HTTP requests in sequence. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_many([{"c": "red"}, {"c": "green"}, {"c": "blue"}])
collection.update_many({"c": {"$ne": "green"}}, {"$set": {"nongreen": True}})
# prints: UpdateResult(raw_results=..., update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2})
collection.update_many({"c": "orange"}, {"$set": {"is_also_fruit": True}})
# prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.update_many(
{"c": "orange"},
{"$set": {"is_also_fruit": True}},
upsert=True,
)
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '46643050-...'})
View this topic in more detail on the API Reference.
const result = await collection.updateMany(
{ name: { $exists: false } },
{ $set: { title: 'unknown' } },
);
Update multiple documents in a collection, inserting a new one if no matches are found.
const result = await collection.updateMany(
{ name: { $exists: false } },
{ $set: { title: 'unknown' } },
{ upsert: true },
);
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the documents to update. |
|
update |
The update to apply to the selected documents. |
|
options? |
The options for this operation. |
Options (UpdateManyOptions
):
Name | Type | Summary |
---|---|---|
|
If true, creates a new document if no document matches the filter. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
Returns:
Promise<UpdateManyResult<Schema>>
- The result of the
update operation.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany([{ c: 'red' }, { c: 'green' }, { c: 'blue' }]);
// { modifiedCount: 2, matchedCount: 2, upsertedCount: 0 }
await collection.updateMany({ c: { $ne: 'green' } }, { $set: { nongreen: true } });
// { modifiedCount: 0, matchedCount: 0, upsertedCount: 0 }
await collection.updateMany({ c: 'orange' }, { $set: { is_also_fruit: true } });
// { modifiedCount: 0, matchedCount: 0, upsertedCount: 1, upsertedId: '...' }
await collection.updateMany({ c: 'orange' }, { $set: { is_also_fruit: true } }, { upsert: true });
})();
-
Operations on documents are performed at
Collection
level, to get details on each signature you can access the Collection JavaDOC. -
Collection is a generic class, default type is
Document
but you can specify your own type and the object will be serialized by Jackson. -
Most methods come with synchronous and asynchronous flavors where the asynchronous version will be suffixed by
Async
and return aCompletableFuture
.
// Synchronous
UpdateResult updateMany(Filter filter, Update update);
UpdateResult updateMany(Filter filter, Update update, UpdateManyOptions);
// Synchronous
CompletableFuture<UpdateResult<T>> updateManyAsync(Filter filter, Update update);
CompletableFuture<UpdateResult<T>> updateManyAsync(Filter filter, Update update, UpdateManyOptions);
Returns:
UpdateResults<T>
- Result of the operation with the number of documents matched (matchedCount
) and updated (modifiedCount
)
Parameters:
Name | Type | Summary |
---|---|---|
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
|
Set the different options for the |
|
|
Contains the options for update many here you can set the |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Update;
import com.datastax.astra.client.model.UpdateManyOptions;
import com.datastax.astra.client.model.UpdateResult;
import com.datastax.astra.client.model.Updates;
import static com.datastax.astra.client.model.Filters.lt;
public class UpdateMany {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
Update update = Updates.set("field1", "value1")
.inc("field2", 1d)
.unset("field3");
UpdateManyOptions options =
new UpdateManyOptions().upsert(true);
UpdateResult result = collection.updateMany(filter, update, options);
}
}
Use the Data API updateMany
command to update multiple documents in a collection.
In this example, the JSON payload uses the $set
update operator to change a status to "inactive" for those documents that have an "active" status.
The updateMany
command includes pagination support in the event more documents that matched the filter are on a subsequent page. For more, see the pagination note after the cURL example.
The JSON structure is sent via an HTTP POST request to a server within an authenticated vector-enabled HCD database.
Via the environment variables, the keyspace name is default_keyspace
; and the collection name in this example is vector_collection
.
Example:
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"updateMany": {
"filter": {"status" : "active" },
"update" : {"$set" : { "status" : "inactive"}}
}
}' | json_pp
Result:
{
"status" : {
"matchedCount" : 20,
"modifiedCount" : 20,
"moreData" : true
}
}
Name | Type | Summary |
---|---|---|
updateMany |
command |
Updates multiple documents in the database’s collection. |
filter |
object |
Defines the criteria for selecting documents to which the command applies. The filter looks for documents where:
* |
update |
object |
Specifies the modifications to be applied to all documents that match the criteria set by the filter. |
$set |
operator |
An update operator indicating that the operation should overwrite the value of a property (or properties) in the selected documents. |
status |
String |
Specifies the property in the document to update. In this example, active or inactive will be set for all selected documents. In this context, it’s changing the |
In the
|
During the pagination process, you would then follow the sequence of one or more insertMany
commands until all pages with documents matching the filter have the update applied.
Find distinct values across documents
Get a list of the distinct values of a certain key in a collection.
-
Python
-
TypeScript
-
Java
View this topic in more detail on the API Reference.
collection.distinct("category")
Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.
collection.distinct(
"food.allergies",
filter={"registered_for_dinner": True},
)
Returns:
List[Any]
- A list of the distinct values encountered. Documents that lack the requested key are ignored.
Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]
Parameters:
Name | Type | Summary |
---|---|---|
key |
|
The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable |
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
max_time_ms |
|
A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. |
Keep in mind that |
For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_many(
[
{"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
{"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
]
)
collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]
View this topic in more detail on the API Reference.
const unique = await collection.distinct('category');
Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.
const unique = await collection.distinct(
'food.allergies',
{ registeredForDinner: true },
);
Parameters:
Name | Type | Summary |
---|---|---|
key |
|
The name of the field whose value is inspected across documents. Keys can use dot-notation to
descend to deeper document levels. Example of acceptable key values: |
filter? |
A filter to select the documents to use. If not provided, all documents will be used. |
Returns:
Promise<Flatten<(SomeDoc & ToDotNotation<FoundDoc<Schema>>)[Key]>[]>
- A promise which resolves to the
unique distinct values.
The return type is mostly accurate, but with complex keys, it may be required to manually cast the return type to the expected type. |
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertOne({ name: 'Marco', food: ['apple', 'orange'], city: 'Helsinki' });
await collection.insertOne({ name: 'Emma', food: { likes_fruit: true, allergies: [] } });
// ['Marco', 'Emma']
await collection.distinct('name')
// ['Helsinki']
await collection.distinct('city')
// ['apple', 'orange', { likes_fruit: true, allergies: [] }]
await collection.distinct('food')
// ['orange']
await collection.distinct('food.1')
// []
await collection.distinct('food.allergies')
// [true]
await collection.distinct('food.likes_fruit')
})();
Gets the distinct values of the specified field name.
// Synchronous
DistinctIterable<T,F> distinct(String fieldName, Filter filter, Class<F> resultClass);
DistinctIterable<T,F> distinct(String fieldName, Class<F> resultClass);
// Asynchronous
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Filter filter, Class<F> resultClass);
CompletableFuture<DistinctIterable<T,F>> distinctAsync(String fieldName, Class<F> resultClass);
Returns:
DistinctIterable<F>
- List of distinct values of the specified field name.
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The name of the field on which project the value. |
|
Criteria list to filter the document. The filter is a JSON object that can contain any valid Data API filter expression. |
|
|
|
The type of the field we are working on |
Keep in mind that |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DistinctIterable;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.FindOptions;
import static com.datastax.astra.client.model.Filters.lt;
import static com.datastax.astra.client.model.Projections.exclude;
import static com.datastax.astra.client.model.Projections.include;
public class Distinct {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Execute a find operation
DistinctIterable<Document, String> result = collection
.distinct("field", String.class);
DistinctIterable<Document, String> result2 = collection
.distinct("field", filter, String.class);
// Iterate over the result
for (String fieldValue : result) {
System.out.println(fieldValue);
}
}
}
Count documents in a collection
Get the count of documents in a collection. Count all documents or apply filtering to count a subset of documents.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
collection.count_documents({}, upper_bound=500)
Get the count of the documents in a collection matching a condition.
collection.count_documents({"seq":{"$gt": 15}}, upper_bound=50)
Returns:
int
- The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.
Example response
320
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax.
Examples are |
upper_bound |
|
A required ceiling on the result of the count operation.
If the actual number of documents exceeds this value, an exception is raised.
An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_many([{"seq": i} for i in range(20)])
collection.count_documents({}, upper_bound=100)
# prints: 20
collection.count_documents({"seq":{"$gt": 15}}, upper_bound=100)
# prints: 4
collection.count_documents({}, upper_bound=10)
# Raises: astrapy.exceptions.TooManyDocumentsToCountException
View this topic in more detail on the API Reference.
const numDocs = await collection.countDocuments({}, 500);
Get the count of the documents in a collection matching a filter.
const numDocs = await collection.countDocuments({ seq: { $gt: 15 } }, 50);
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the documents to count. If not provided, all documents will be counted. |
|
upperBound |
|
A required ceiling on the result of the count operation.
If the actual number of documents exceeds this value, an exception is raised.
An exception is also raised if the actual number of documents exceeds the maximum count that the Data API can reach, regardless of |
options? |
The options (the timeout) for this operation. |
Returns:
Promise<number>
- A promise that resolves to the exact count of the documents counted as requested, unless it exceeds
the caller-provided or API-set upper bound, in which case an exception is raised.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some documents
await collection.insertMany(Array.from({ length: 20 }, (_, i) => ({ seq: i })));
// Prints 20
await collection.countDocuments({}, 100);
// Prints 4
await collection.countDocuments({ seq: { $gt: 15 } }, 100);
// Throws TooManyDocumentsToCountError
await collection.countDocuments({}, 10);
})();
// Synchronous
int countDocuments(int upperBound)
throws TooManyDocumentsToCountException;
int countDocuments(Filter filter, int upperBound)
throws TooManyDocumentsToCountException;
Get the count of the documents in a collection matching a condition.
Returns:
int
- The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.
Parameters:
Name | Type | Summary |
---|---|---|
filter (optional) |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
upperBound |
|
A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception will be raised. Furthermore, if the actual number of documents exceeds the maximum count that the Data API can reach (regardless of upper_bound), an exception will be raised. |
The checked exception Consider modifying your conditions to count fewer documents at once.
If you need to count large numbers of documents, consider using the Data API |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import static com.datastax.astra.client.model.Filters.lt;
public class CountDocuments {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
try {
// Count with no filter
collection.countDocuments(500);
// Count with a filter
collection.countDocuments(filter, 500);
} catch(TooManyDocumentsToCountException tmde) {
// Explicit error if the count is above the upper limit or above the 1000 limit
}
}
}
Use the Data API countDocuments
command to obtain the exact count of documents in a collection:
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"countDocuments": {
}
}' | json_pp
You can provide an optional filter condition:
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"countDocuments": {
"filter": {
"year": {"$gt": 2000}
}
}
}' | json_pp
Returns:
count
- The exact count of the documents counted as requested, unless it exceeds the API-set upper bound, in which case the overflow is reported in the response by the moreData
flag.
Example response
{ "status": { "count": 105 } }
Properties:
Name | Type | Summary |
---|---|---|
countDocuments |
command |
Returns an exact count of documents in a collection. By default, all documents are counted. |
filter |
JSON object |
Optional filtering clause for |
This operation is suited to use cases where the number of documents to count is moderate.
Exact counting of an arbitrary number of documents is a slow, expensive operation that is not supported by the Data API.
If the count total exceeds the server-side threshold, the response includes
If you need to count large numbers of documents, consider using the Data API |
Estimate document count in a collection
Get an approximate document count for an entire collection. Filtering isn’t supported.
In the |
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
collection.estimated_document_count()
Returns:
int
- A server-side estimate of the total number of documents in the collection.
Example response
37500
Parameters:
Name | Type | Summary |
---|---|---|
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_DB_API_ENDPOINT("01234567-...")
collection = database.my_collection
collection.estimated_document_count()
# prints: 37500
View this topic in more detail on the API Reference.
const estNumDocs = await collection.estimatedDocumentCount();
Parameters:
Name | Type | Summary |
---|---|---|
options? |
The options (the timeout) for this operation. |
Returns:
Promise<number>
- A promise that resolves to a server-side estimate of the total number of documents in the collection.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
console.log(await collection.estimatedDocumentCount());
})();
View this topic in more detail on the API Reference.
long estimatedDocumentCount();
long estimatedDocumentCount(EstimatedCountDocumentsOptions options);
Parameters:
Name | Type | Summary |
---|---|---|
options? |
Set different options for the |
Returns:
long
- A server-side estimate of the total number of documents in the collection. This estimate is built from the SSTable files.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.exception.TooManyDocumentsToCountException;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.EstimatedCountDocumentsOptions;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.internal.command.LoggingCommandObserver;
import static com.datastax.astra.client.model.Filters.lt;
public class EstimateCountDocuments {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Count with no filter
long estimatedCount = collection.estimatedDocumentCount();
// Count with options (adding a logger)
EstimatedCountDocumentsOptions options = new EstimatedCountDocumentsOptions()
.registerObserver("logger", new LoggingCommandObserver(DataAPIClient.class));
long estimateCount2 = collection.estimatedDocumentCount(options);
}
}
Use the Data API estimatedDocumentCount
command to return the approximate number of documents in the collection.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"estimatedDocumentCount": {
}
}' | json_pp
Returns:
count
- An estimate of the total number of documents in the collection.
Example response
{ "status": { "count": 37500 } }
Properties:
Name | Type | Summary |
---|---|---|
estimatedDocumentCount |
command |
Returns an estimated count of documents within the context of the specified collection. |
The estimatedDocumentCount
object is empty ({}
) because there are no filters or options for this command.
Find and replace a document
Locate a document matching a filter condition and replace it with a new document, returning the document itself.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
collection.find_one_and_replace(
{"_id": "rule1"},
{"text": "some animals are more equal!"},
)
Locate and replace a document, returning the document itself, additionally creating it if nothing is found.
collection.find_one_and_replace(
{"_id": "rule1"},
{"text": "some animals are more equal!"},
upsert=True,
)
Returns:
Dict[str, Any]
- The document that was found, either before or after the
replacement (or a projection thereof, as requested). If no matches are found,
None
is returned.
Example response
{'_id': 'rule1', 'text': 'all animals are equal'}
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
replacement |
|
the new document to write into the collection. |
projection |
|
Used to select a subset of fields in the document being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with |
vectorize |
|
A string to be vectorized and used as the sorting criterion in a vector search.
This parameter cannot be used together with |
sort |
|
With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the replaced one. See the |
upsert |
|
This parameter controls the behavior in absence of matches. If True, |
return_document |
|
A flag controlling what document is returned: if set to |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
import astrapy
collection.insert_one({"_id": "rule1", "text": "all animals are equal"})
collection.find_one_and_replace(
{"_id": "rule1"},
{"text": "some animals are more equal!"},
)
# prints: {'_id': 'rule1', 'text': 'all animals are equal'}
collection.find_one_and_replace(
{"text": "some animals are more equal!"},
{"text": "and the pigs are the rulers"},
return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'rule1', 'text': 'and the pigs are the rulers'}
collection.find_one_and_replace(
{"_id": "rule2"},
{"text": "F=ma^2"},
return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_replace(
{"_id": "rule2"},
{"text": "F=ma"},
upsert=True,
return_document=astrapy.constants.ReturnDocument.AFTER,
projection={"_id": False},
)
# prints: {'text': 'F=ma'}
View this topic in more detail on the API Reference.
const docBefore = await collection.findOneAndReplace(
{ _id: 123 },
{ text: 'some animals are more equal!' },
{ returnDocument: 'before' },
);
Locate and replace a document, returning the document itself, additionally creating it if nothing is found.
const docBefore = await collection.findOneAndReplace(
{ _id: 123 },
{ text: 'some animals are more equal!' },
{ returnDocument: 'before', upsert: true },
);
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to replace. |
|
replacement |
The replacement document, which contains no _id field. |
|
options |
The options for this operation. |
Options (FindOneAndReplaceOptions
):
Name | Type | Summary |
---|---|---|
|
Specifies whether to return the original or replaced document. |
|
|
If true, creates a new document if no document matches the filter. |
|
Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields. When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting. Can only be used when performing a vector search. |
||
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to be vectorized and used as the sorting criterion in a vector search. Equivalent to setting the If you really need to use both, you can set the |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
|
|
When true, returns alongside the document, an ok field with a value of 1 if the command executed successfully. |
Returns:
Promise<WithId<Schema> | null>
- The document before/after
the update, depending on the type of returnDocument
, or null
if no matches are found.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.insertOne({ _id: "rule1", text: "all animals are equal" });
// { _id: 'rule1', text: 'all animals are equal' }
await collection.findOneAndReplace(
{ _id: "rule1" },
{ text: "some animals are more equal!" },
{ returnDocument: 'before' }
);
// { _id: 'rule1', text: 'and the pigs are the rulers' }
await collection.findOneAndReplace(
{ text: "some animals are more equal!" },
{ text: "and the pigs are the rulers" },
{ returnDocument: 'after' }
);
// null
await collection.findOneAndReplace(
{ _id: "rule2" },
{ text: "F=ma^2" },
{ returnDocument: 'after' }
);
// { text: 'F=ma' }
await collection.findOneAndReplace(
{ _id: "rule2" },
{ text: "F=ma" },
{ upsert: true, returnDocument: 'after', projection: { _id: false } }
);
})();
// Synchronous
Optional<T> findOneAndReplace(Filter filter, T replacement);
Optional<T> findOneAndReplace(Filter filter, T replacement, FindOneAndReplaceOptions options);
// Asynchronous
CompletableFuture<Optional<T>> findOneAndReplaceAsync(Filter filter, T replacement);
CompletableFuture<Optional<T>> findOneAndReplaceAsync(Filter filter, T replacement, FindOneAndReplaceOptions options);
Returns:
Optional<T>
- Return the a document that matches the filter. Whether returnDocument
is set to before or after it will return the document before or after update accordingly.
Parameters:
Name | Type | Summary |
---|---|---|
filter (optional) |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
replacement |
|
This is the document that will replace the existing one if exist. It flag |
options(optional) |
Provide list of options for findOneAndReplace operation as a |
Sample definition of FindOneAndReplaceOptions
:
FindOneAndReplaceOptions options = FindOneAndReplaceOptions.Builder
.projection(Projections.include("field1"))
.sort(Sorts.ascending("field1"))
.upsert(true)
.returnDocumentAfter();
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneAndReplaceOptions;
import com.datastax.astra.client.model.Projections;
import com.datastax.astra.client.model.Sorts;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.lt;
public class FindOneAndReplace {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
FindOneAndReplaceOptions options = new FindOneAndReplaceOptions()
.projection(Projections.include("field1"))
.sort(Sorts.ascending("field1"))
.upsert(true)
.returnDocumentAfter();
Document docForReplacement = new Document()
.append("field1", "value1")
.append("field2", 20)
.append("field3", 30)
.append("field4", "value4");
// It will return the document before deleting it
Optional<Document> docBeforeReplace = collection
.findOneAndReplace(filter, docForReplacement, options);
}
}
Use the Data API fineOneAndReplace
command to find an existing document that matches the filter criteria and replace the document with a new one.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"findOneAndReplace": {
"filter" : {
"_id" : "14"
},
"replacement" : { "customer.name": "Ann Jones", "status" : "inactive" }
}
}' | json_pp
Parameters:
Name | Type | Summary |
---|---|---|
findOneAndReplace |
command |
Finds a single document that matches a specified filter and replaces it with the provided replacement document. This operation is atomic within a single document. |
filter |
clause |
Specifies the criteria for selecting the document to replace. In this example, it’s a document with an |
replacement |
clause |
Specifies the new content of the document that will replace the existing document found using the filter criteria. The replacement content provided is a document with two fields:
|
Replace a document
Replace a document in the collection with a new one.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
replace_result = collection.replace_one(
{"Marco": {"$exists": True}},
{"Buda": "Pest"},
)
Replace a document in the collection with a new one, creating a new one if no match is found.
replace_result = collection.replace_one(
{"Marco": {"$exists": True}},
{"Buda": "Pest"},
upsert=True,
)
Returns:
UpdateResult
- An object representing the response from the database after the replace operation. It includes information about the operation.
Example response
UpdateResult(raw_results=[{'data': {'document': {'_id': '1', 'Marco': 'Polo'}}, 'status': {'matchedCount': 1, 'modifiedCount': 1}}], update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
replacement |
|
the new document to write into the collection. |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with |
vectorize |
|
A string to be vectorized and used as the sorting criterion in a vector search.
This parameter cannot be used together with |
sort |
|
With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the replaced one. See the |
upsert |
|
This parameter controls the behavior in absence of matches. If True, |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_one({"Marco": "Polo"})
collection.replace_one({"Marco": {"$exists": True}}, {"Buda": "Pest"})
prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
collection.find_one({"Buda": "Pest"})
prints: {'_id': '8424905a-...', 'Buda': 'Pest'}
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"})
prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"}, upsert=True)
prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '931b47d6-...'})
View this topic in more detail on the API Reference.
const result = await collection.replaceOne(
{ 'Marco': 'Polo' },
{ 'Buda': 'Pest' },
);
Replace a document in the collection with a new one, creating a new one if no match is found.
const result = await collection.replaceOne(
{ 'Marco': 'Polo' },
{ 'Buda': 'Pest' },
{ upsert: true },
);
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to replace. |
|
replacement |
The replacement document, which contains no _id field. |
|
options? |
The options for this operation. |
Options (ReplaceOneOptions
):
Name | Type | Summary |
---|---|---|
|
If true, creates a new document if no document matches the filter. |
|
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to be vectorized and used as the sorting criterion in a vector search. Equivalent to setting the If you really need to use both, you can set the |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
Returns:
Promise<ReplaceOneResult<Schema>>
- The result of the
replacement operation.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.insertOne({ 'Marco': 'Polo' });
// { modifiedCount: 1, matchedCount: 1, upsertedCount: 0 }
await collection.replaceOne(
{ 'Marco': { '$exists': true } },
{ 'Buda': 'Pest' }
);
// { _id: '3756ce75-aaf1-430d-96ce-75aaf1730dd3', Buda: 'Pest' }
await collection.findOne({ 'Buda': 'Pest' });
// { modifiedCount: 0, matchedCount: 0, upsertedCount: 0 }
await collection.replaceOne(
{ 'Mirco': { '$exists': true } },
{ 'Oh': 'yeah?' }
);
// { modifiedCount: 0, matchedCount: 0, upsertedId: '...', upsertedCount: 1 }
await collection.replaceOne(
{ 'Mirco': { '$exists': true } },
{ 'Oh': 'yeah?' },
{ upsert: true }
);
})();
// Synchronous
UpdateResult replaceOne(Filter filter, T replacement);
UpdateResult replaceOne(Filter filter, T replacement, ReplaceOneOptions options);
// Asynchronous
CompletableFuture<UpdateResult> replaceOneAsync(Filter filter, T replacement);
CompletableFuture<UpdateResult> replaceOneAsync(Filter filter, T replacement, ReplaceOneOptions options);
Returns:
UpdateResult - Return a wrapper object with the result of the operation. The object contains the number of documents matched (matchedCount
) and updated (modifiedCount
)
Parameters:
Name | Type | Summary |
---|---|---|
filter (optional) |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
replacement |
|
This is the document that will replace the existing one if exist. It flag |
options(optional) |
Provide list of options for |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.FindOneAndReplaceOptions;
import com.datastax.astra.client.model.Projections;
import com.datastax.astra.client.model.Sorts;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.lt;
public class FindOneAndReplace {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
FindOneAndReplaceOptions options = new FindOneAndReplaceOptions()
.projection(Projections.include("field1"))
.sort(Sorts.ascending("field1"))
.upsert(true)
.returnDocumentAfter();
Document docForReplacement = new Document()
.append("field1", "value1")
.append("field2", 20)
.append("field3", 30)
.append("field4", "value4");
// It will return the document before deleting it
Optional<Document> docBeforeReplace = collection
.findOneAndReplace(filter, docForReplacement, options);
}
}
Find and delete a document
Locate a document matching a filter condition and delete it, returning the document itself.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
collection.find_one_and_delete({"status": "stale_entry"})
Returns:
Dict[str, Any]
- The document that was just deleted (or a projection thereof, as requested). If no matches are
found, None
is returned.
Example response
{'_id': 199, 'status': 'stale_entry', 'request_id': 'A4431'}
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
projection |
|
Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the included field names; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to exclude specific fields from the response. Special document fields (e.g. |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search. That is, Approximate Nearest Neighbors (ANN) search, extracting the most similar document in the collection matching the filter. This parameter cannot be used together with |
vectorize |
|
A string to be vectorized and used as the sorting criterion in a vector search.
This parameter cannot be used together with |
sort |
|
With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the deleted one. See the |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_many(
[
{"species": "swan", "class": "Aves"},
{"species": "frog", "class": "Amphibia"},
],
)
collection.find_one_and_delete(
{"species": {"$ne": "frog"}},
projection={"species": True},
)
# prints: {'_id': '5997fb48-...', 'species': 'swan'}
collection.find_one_and_delete({"species": {"$ne": "frog"}})
# (returns None for no matches)
View this topic in more detail on the API Reference.
const deletedDoc = await collection.findOneAndDelete({ status: 'stale_entry' });
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to delete. |
|
options? |
The options for this operation. |
Options (FindOneAndDeleteOptions
):
Name | Type | Summary |
---|---|---|
Specifies which fields should be included/excluded in the returned documents. Defaults to including all fields. When specifying a projection, it’s the user’s responsibility to handle the return type carefully. Consider type-casting. Can only be used when performing a vector search. |
||
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to be vectorized and used as the sorting criterion in a vector search. Equivalent to setting the If you really need to use both, you can set the |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
|
|
When true, returns alongside the document, an ok field with a value of 1 if the command executed successfully. |
Returns:
Promise<WithId<Schema> | null>
- The document that was deleted, or
null
if no matches are found.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.insertMany([
{ species: 'swan', class: 'Aves' },
{ species: 'frog', class: 'Amphibia' },
]);
// { _id: '...', species: 'swan' }
await collection.findOneAndDelete(
{ species: { $ne: 'frog' } },
{ projection: { species: 1 } },
);
// null
await collection.findOneAndDelete(
{ species: { $ne: 'frog' } },
);
})();
// Synchronous
Optional<T> findOneAndDelete(Filter filter);
Optional<T> findOneAndDelete(Filter filter, FindOneAndDeleteOptions options);
// Asynchronous
CompletableFuture<Optional<T>> findOneAndDeleteAsync(Filter filter);
CompletableFuture<Optional<T>> findOneAndDeleteAsync(Filter filter, FindOneAndDeleteOptions options);
Returns:
DeleteResult
- Wrapper that contains the deleted count.
Parameters:
Name | Type | Summary |
---|---|---|
filter (optional) |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
options(optional) |
Provide list of options a delete one such as a |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import java.util.Optional;
import static com.datastax.astra.client.model.Filters.lt;
public class FindOneAndDelete {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Building a filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// It will return the document before deleting it
Optional<Document> docBeforeRelease = collection.findOneAndDelete(filter);
}
}
Use the Data API findOneAndDelete
command to find and delete a single document.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"findOneAndDelete": {
"filter": {
"customer.name": "Fred Smith",
"_id": "13"
}
}
}' | json_pp
Response:
{
"status" : {
"deletedCount" : 1
}
}
Parameters:
Name | Type | Summary |
---|---|---|
findOneAndDelete |
command |
Deletes the first document that matches the given criteria. If no matching document is found, no action is taken. |
filter |
clause |
Used to identify the document meant for deletion. In this example, the filter is comprised of |
Delete a document
Locate and delete a single document from a collection.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
response = collection.delete_one({ "_id": "1" })
Locate and delete a single document from a collection by any attribute (as long as it is covered by the collection’s indexing configuration).
document = collection.delete_one({"location": "warehouse_C"})
Locate and delete a single document from a collection by an arbitrary filtering clause.
document = collection.delete_one({"tag": {"$exists": True}})
Delete the most similar document to a given vector.
result = collection.delete_one({}, vector=[.12, .52, .32])
Generate a vector from a string and delete the most similar document.
result = collection.delete_one({}, vectorize="Text to vectorize")
Returns:
DeleteResult
- An object representing the response from the database after the delete operation. It includes information about the success of the operation.
Example response
DeleteResult(raw_results=[{'status': {'deletedCount': 1}}], deleted_count=1)
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
vector |
|
A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with |
vectorize |
|
A string to be vectorized and used as the sorting criterion in a vector search.
This parameter cannot be used together with |
sort |
|
With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the deleted one. See the |
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. This method uses the collection-level timeout by default. |
Example:
from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])
collection.delete_one({"seq": 1})
# prints: DeleteResult(raw_results=..., deleted_count=1)
collection.distinct("seq")
# prints: [0, 2]
collection.delete_one(
{"seq": {"$exists": True}},
sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: DeleteResult(raw_results=..., deleted_count=1)
collection.distinct("seq")
# prints: [0]
collection.delete_one({"seq": 2})
# prints: DeleteResult(raw_results=..., deleted_count=0)
View this topic in more detail on the API Reference.
const result = await collection.deleteOne({ _id: '1' });
Locate and delete a single document from a collection.
const result = await collection.deleteOne({ location: 'warehouse_C' });
Locate and delete a single document from a collection by an arbitrary filtering clause.
const result = await collection.deleteOne({ tag: { $exists: true } });
Delete the most similar document to a given vector.
const result = await collection.deleteOne({}, { vector: [.12, .52, .32] });
Generate a vector from a string and delete the most similar document.
const result = await collection.deleteOne({}, { vectorize: 'Text to vectorize' });
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to delete. |
|
options? |
The options for this operation. |
Options (DeleteOneOptions
):
Name | Type | Summary |
---|---|---|
Specifies the order in which the documents are returned. Defaults to the order in which the documents are stored on disk. |
||
|
An optional vector to use to perform a vector search on the collection to find the closest matching document. Equivalent to setting the If you really need to use both, you can set the |
|
|
A string to be vectorized and used as the sorting criterion in a vector search. Equivalent to setting the If you really need to use both, you can set the |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete for each single one of the underlying HTTP requests. |
Returns:
Promise<DeleteOneResult>
- The result of the
deletion operation.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.insertMany([{ seq: 1 }, { seq: 0 }, { seq: 2 }]);
// { deletedCount: 1 }
await collection.deleteOne({ seq: 1 });
// [0, 2]
await collection.distinct('seq');
// { deletedCount: 1 }
await collection.deleteOne({ seq: { $exists: true } }, { sort: { seq: -1 } });
// [0]
await collection.distinct('seq');
// { deletedCount: 0 }
await collection.deleteOne({ seq: 2 });
})();
// Synchronous
DeleteResult deleteOne(Filter filter);
DeleteResult deleteOne(Filter filter, DeleteOneOptions options);
// Asynchronous
CompletableFuture<DeleteResult> deleteOneAsync(Filter filter);
CompletableFuture<DeleteResult> deleteOneAsync(Filter filter, DeleteOneOptions options);
Returns:
DeleteResult
- Wrapper that contains the deleted count.
Parameters:
Name | Type | Summary |
---|---|---|
filter (optional) |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
options(optional) |
Provide list of options a delete one such as a |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteOneOptions;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import com.datastax.astra.client.model.Sorts;
import static com.datastax.astra.client.model.Filters.lt;
public class DeleteOne {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Sample Filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
// Delete one options
DeleteOneOptions options = new DeleteOneOptions()
.sort(Sorts.ascending("field2"));
DeleteResult result = collection.deleteOne(filter, options);
System.out.println("Deleted Count:" + result.getDeletedCount());
}
}
The Data API deleteOne
command deletes a single document. In this example, the deletion would occur where the tags
value is first
.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"deleteOne": {
"filter": {
"tags": "first"
}
}
}' | json_pp
Response:
{
"status" : {
"deletedCount" : 1
}
}
Properties:
Name | Type | Summary |
---|---|---|
deleteOne |
command |
Delete a matching document from a collection based on the provided filter criteria. |
filter |
clause |
Provides the conditions that the database uses to identify one or more document(s) meant for deletion. |
tags |
string |
A filtering key that targets a specific property in the database’s documents. |
first |
string |
A value associated with the |
Delete documents
Delete multiple documents from a collection.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
delete_result = collection.delete_many({"status": "processed"})
Returns:
DeleteResult
- An object representing the response from the database after the delete operation. It includes information about the success of the operation.
Example response
DeleteResult(raw_results=[{'status': {'deletedCount': 2}}], deleted_count=2)
Parameters:
Name | Type | Summary |
---|---|---|
filter |
|
A predicate expressed as a dictionary according to the Data API filter syntax. Examples are |
max_time_ms |
|
A timeout, in milliseconds, for the operation. This method uses the collection-level timeout by default. You may need to increase the timeout duration when deleting a large number of documents, as the operation will require multiple HTTP requests in sequence. |
This method would not admit an empty filter clause: use the |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])
collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=2)
collection.distinct("seq")
# prints: [2]
collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=0)
View this topic in more detail on the API Reference.
const result = await collection.deleteMany({ status: 'processed' });
Parameters:
Name | Type | Summary |
---|---|---|
filter |
A filter to select the document to delete. |
|
options? |
The options (the timeout) for this operation. |
This method does not admit an empty filter clause; use the |
Returns:
Promise<DeleteManyResult>
- The result of the
deletion operation.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.insertMany([{ seq: 1 }, { seq: 0 }, { seq: 2 }]);
// { deletedCount: 1 }
await collection.deleteMany({ seq: { $lte: 1 } });
// [2]
await collection.distinct('seq');
// { deletedCount: 0 }
await collection.deleteMany({ seq: { $lte: 1 } });
})();
// Synchronous
DeleteResult deleteMany(Filter filter);
// Asynchronous
CompletableFuture<DeleteResult> deleteManyAsync(Filter filter);
Returns:
DeleteResult
- Wrapper that contains the deleted count.
Same as a few other methods the delete operation can delete only 20 documents at a time. |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.Filter;
import com.datastax.astra.client.model.Filters;
import static com.datastax.astra.client.model.Filters.lt;
public class DeleteMany {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Sample Filter
Filter filter = Filters.and(
Filters.gt("field2", 10),
lt("field3", 20),
Filters.eq("field4", "value"));
DeleteResult result = collection.deleteMany(filter);
System.out.println("Deleted Count:" + result.getDeletedCount());
}
}
The following JSON payload is designed to delete documents where the status is inactive.
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"deleteMany": {
"filter": {
"status": "inactive"
}
}
}' | json_pp
Response:
{
"status" : {
"deletedCount" : 20
}
}
Properties:
Name | Type | Summary |
---|---|---|
deleteMany |
command |
Deletes all matching documents from a collection based on the provided filter criteria. |
filter |
option |
Provides the conditions that the database uses to identify one or more document(s) meant for deletion. |
status |
option |
Used for filtering to decide which documents may be deleted. |
inactive |
string |
The |
Execute multiple write operations
Execute a (reusable) list of write operations on a collection with a single command.
-
Python
-
TypeScript
-
Java
View this topic in more detail on the API Reference.
bw_results = collection.bulk_write(
[
InsertMany([{"a": 1}, {"a": 2}]),
ReplaceOne(
{"z": 9},
replacement={"z": 9, "replaced": True},
upsert=True,
),
],
)
Returns:
BulkWriteResult
- A single object summarizing the whole list of requested operations. The keys in the map attributes of the result (when present) are the integer indices of the corresponding operation in the requests
iterable.
Example response
BulkWriteResult(bulk_api_results={0: ..., 1: ...}, deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'})
Parameters:
Name | Type | Summary |
---|---|---|
requests |
|
An iterable over concrete subclasses of |
ordered |
|
Whether to launch the |
concurrency |
|
Maximum number of concurrent operations executing at a given time. It cannot be more than one for ordered bulk writes. |
max_time_ms |
|
A timeout, in milliseconds, for the whole bulk write. This method uses the collection-level timeout by default. You may need to increase the timeout duration depending on the number of operations. If the method call times out, there’s no guarantee about how much of the bulk write was completed. |
Example:
from astrapy import DataAPIClient
from astrapy.operations import (
InsertOne,
InsertMany,
UpdateOne,
UpdateMany,
ReplaceOne,
DeleteOne,
DeleteMany,
)
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
op1 = InsertMany([{"a": 1}, {"a": 2}])
op2 = ReplaceOne({"z": 9}, replacement={"z": 9, "replaced": True}, upsert=True)
collection.bulk_write([op1, op2])
# prints: BulkWriteResult(bulk_api_results={0: ..., 1: ...}, deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'})
collection.count_documents({}, upper_bound=100)
# prints: 3
collection.distinct("replaced")
# prints: [True]
View this topic in more detail on the API Reference.
const results = await collection.bulkWrite([
{ insertOne: { a: '1' } },
{ insertOne: { a: '2' } },
{ replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);
Parameters:
Name | Type | Summary |
---|---|---|
operations |
The operations to perform. |
|
options? |
The options for this operation. |
Options (BulkWriteOptions
):
Name | Type | Summary |
---|---|---|
|
You may set the |
|
|
You can set the Not available for ordered operations. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<BulkWriteResult<Schema>>
- A promise that resolves
to a summary of the performed operations.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.bulkWrite([
{ insertOne: { document: { a: 1 } } },
{ insertOne: { document: { a: 2 } } },
{ replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
]);
// 3
await collection.countDocuments({}, 100);
// [true]
await collection.distinct('replaced');
})();
// Synchronous
BulkWriteResult bulkWrite(List<Command> commands);
BulkWriteResult bulkWrite(List<Command> commands, BulkWriteOptions options);
// Asynchronous
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands);
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands, BulkWriteOptions options);
Returns:
BulkWriteResult
- Wrapper with the list of responses for each command.
Parameters:
Name | Type | Summary |
---|---|---|
commands |
List of the generic |
|
options(optional) |
Provide list of options for those commands like |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.BulkWriteOptions;
import com.datastax.astra.client.model.BulkWriteResult;
import com.datastax.astra.client.model.Command;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.internal.api.ApiResponse;
import java.util.List;
public class BulkWrite {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Set a couple of Commands
Command cmd1 = Command.create("insertOne").withDocument(new Document().id(1).append("name", "hello"));
Command cmd2 = Command.create("insertOne").withDocument(new Document().id(2).append("name", "hello"));
// Set the options for the bulk write
BulkWriteOptions options1 = BulkWriteOptions.Builder.ordered(false).concurrency(1);
// Execute the queries
BulkWriteResult result = collection.bulkWrite(List.of(cmd1, cmd2), options1);
// Retrieve the LIST of responses
for(ApiResponse res : result.getResponses()) {
System.out.println(res.getData());
}
}
}
Delete all documents from a collection
Delete all documents in a collection.
-
Python
-
TypeScript
-
Java
-
cURL
View this topic in more detail on the API Reference.
result = collection.delete_all()
Returns:
Dict
- A dictionary in the form {"ok": 1}
if the method succeeds.
Example response
{'ok': 1}
Parameters:
Name | Type | Summary |
---|---|---|
max_time_ms |
|
A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead. |
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database("DB_API_ENDPOINT")
collection = database.my_collection
my_coll.distinct("seq")
# prints: [2, 1, 0]
my_coll.count_documents({}, upper_bound=100)
# prints: 4
my_coll.delete_all()
# prints: {'ok': 1}
my_coll.count_documents({}, upper_bound=100)
# prints: 0
View this topic in more detail on the API Reference.
const results = await collection.bulkWrite([
{ insertOne: { a: '1' } },
{ insertOne: { a: '2' } },
{ replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);
Parameters:
Name | Type | Summary |
---|---|---|
operations |
The operations to perform. |
|
options? |
The options for this operation. |
Options (BulkWriteOptions
):
Name | Type | Summary |
---|---|---|
|
You may set the |
|
|
You can set the Not available for ordered operations. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<BulkWriteResult<Schema>>
- A promise that resolves
to a summary of the performed operations.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('DB_API_ENDPOINT', { keyspace: 'DB_KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.bulkWrite([
{ insertOne: { document: { a: 1 } } },
{ insertOne: { document: { a: 2 } } },
{ replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
]);
// 3
await collection.countDocuments({}, 100);
// [true]
await collection.distinct('replaced');
})();
// Synchronous
DeleteResult deleteAll();
// Asynchronous
CompletableFuture<DeleteResult> deleteAllAsync();
Returns:
DeleteResult
- Wrapper that contains the deleted count.
Same as a few other methods, the delete operation can delete only 20 documents at a time. To implement a |
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.DeleteResult;
import com.datastax.astra.client.model.Document;
public class DeleteAll {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Show the deleted count
DeleteResult result = collection.deleteAll();
System.out.println("Deleted Count:" + result.getDeletedCount());
}
}
The following JSON payload is designed to delete all documents in a collection.
If used with an empty |
curl -s --location \
--request POST ${DB_DB_API_ENDPOINT}/${DB_KEYSPACE}/${DB_COLLECTION} \
--header "Token: ${DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
"deleteMany": {
}
}' | json_pp
Response:
{
"status" : {
"deletedCount" : -1
}
}
Properties:
Name | Type | Summary |
---|---|---|
deleteMany |
command |
Deletes all matching documents from a collection based on the provided filter criteria. |
See also
See the Keyspaces reference topic.