Insert documents reference
Documents represent a single row or record of data in Astra DB Serverless databases.
You use the Collection
class to work with documents through the Data API.
For instructions to get a Collection
object, see the Collections reference.
Astra DB APIs use the term keyspace to refer to both namespaces and keyspaces. |
For general information about working with documents, including common operations and operators, see the Documents reference.
Prerequisites
-
Review the prerequisites and other information in Intro to Astra DB APIs.
-
Create a Serverless (Vector) database.
-
Learn how to instantiate a
DataAPIClient
object and connect to your database.
Insert a single document
Insert a single document into a collection.
When you create a collection, you decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.
When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data:
-
The
$vector
parameter is a reserved field that stores vector arrays.-
If the collection requires that you bring your own embeddings, you can include this parameter when you load data.
-
If the collection uses vectorize, you don’t include
$vector
when you load data. Instead, Astra DB populates the$vector
field with the automatically generated embeddings.
Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use
$vector
to fetch documents by vector search. You can also use projections to include$vector
in responses. -
-
The
$vectorize
parameter is a reserved field that generates embeddings automatically based on a given text string.-
If the collection requires that you bring your own embeddings, you can not use this parameter.
-
If the collection uses vectorize, you must include this parameter when you load data. The value of
$vectorize
is the text string from which you want to generate a document’s embedding. Astra DB stores the resulting vector array in$vector
.
When you find, update, replace, or delete documents in a collection that uses vectorize, you can use
$vectorize
to fetch documents by vector search with vectorize. You can also use projections to include$vectorize
in responses. -
If you load a document that doesn’t need an embedding, then you can omit $vector
and $vectorize
.
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Insert a document:
insert_result = collection.insert_one({"name": "Jane Doe"})
Insert a document with an associated vector:
insert_result = collection.insert_one(
{
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
)
Insert a document and generate a vector automatically:
insert_result = collection.insert_one(
{
"name": "Jane Doe",
"$vectorize": "Text to vectorize",
},
)
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The dictionary expressing the document to insert. The |
|
|
A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead. |
Returns:
InsertOneResult
- An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.
Example response
InsertOneResult(inserted_id='92b4c4f4-db44-4440-b4c4-f4db44e440b8', raw_results=...)
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
# Insert a document with a specific ID
response1 = collection.insert_one(
{
"_id": 101,
"name": "John Doe",
"$vector": [.12, .52, .32],
},
)
# Insert a document without specifying an ID
# so that _id
is generated automatically
response2 = collection.insert_one(
{
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
)
For more information, see the Client reference.
const result = await collection.insertOne({ name: 'Jane Doe' });
Insert a document with an associated vector:
const result = await collection.insertOne({
name: 'Jane Doe',
$vector: [.08, .68, .30],
});
Insert a document and generate a vector automatically:
const result = await collection.insertOne({
name: 'Jane Doe',
$vectorize: 'Text to vectorize',
});
Parameters:
Name | Type | Summary |
---|---|---|
|
The document to insert. If the document does not have an |
|
|
The options for this operation. |
Options (InsertOneOptions
):
Name | Type | Summary |
---|---|---|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<InsertOneResult<Schema>>
- A promise that resolves
to the inserted ID.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert a document with a specific ID
await collection.insertOne({ _id: '1', name: 'John Doe' });
// Insert a document with an autogenerated ID
await collection.insertOne({ name: 'Jane Doe' });
// Insert a document with a vector
await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();
Operations on documents are performed at the Collection
level.
For more information, see the Client reference.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
InsertOneResult insertOne(DOC document);
InsertOneResult insertOne(DOC document, float[] embeddings);
// Equivalent in asynchronous
CompletableFuture<InsertOneResult> insertOneAsync(DOC document);
CompletableFuture<InsertOneResult> insertOneAsync(DOC document, float[] embeddings);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Object representing the document to insert.
The |
|
|
A vector of embeddings (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the |
Returns:
InsertOneResult
- Wrapper with the inserted document Id.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertOneOptions;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
public class InsertOne {
@Data @AllArgsConstructor
public static class Product {
@JsonProperty("_id")
private String id;
private String name;
}
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Insert a document
Document doc1 = new Document("1").append("name", "joe");
InsertOneResult res1 = collectionDoc.insertOne(doc1);
System.out.println(res1.getInsertedId()); // should be "1"
// Insert a document with embeddings
Document doc2 = new Document("2").append("name", "joe");
collectionDoc.insertOne(doc2, new float[] {.1f, .2f});
// Given an existing collection
Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION2_NAME", Product.class);
// Insert a document with custom bean
collectionProduct.insertOne(new Product("1", "joe"));
collectionProduct.insertOne(new Product("2", "joe"), new float[] {.1f, .2f});
}
}
Insert a document with a predefined vector:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"key1": "value1",
"key2": "value2"
}
}
}' | jq
Insert one and generate a vector automatically:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"$vectorize": "Text to use to generate a vector",
"key1": "value1",
"key2": "value2"
}
}
}' | jq
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Data API command to insert one document in a collection. |
|
|
Contains the details of the record to add. With the exception of reserved fields (
|
|
reserved, multi-type |
An optional identifier for the document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option. |
|
reserved |
An optional reserved property used to store an array of numbers representing a vector embedding. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.
|
|
reserved |
An optional reserved property used to store a string that you want to use to automatically generate an embedding with vectorize.
|
Returns:
A successful response contains the _id
of the inserted document:
{
"status": {
"insertedIds": [
"12"
]
}
}
The insertedIds
content depends on the ID type and how it was generated, for example:
-
"insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]
-
`"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"
For more information, see Document IDs.
Example:
Example with $vector
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"purchase_type": "Online",
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"customer": {
"name": "Jim A.",
"phone": "123-456-1111",
"age": 51,
"credit_score": 782,
"address": {
"address_line": "1234 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690045891 },
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car": "BMW 330i Sedan",
"color": "Silver"
},
"Extended warranty - 5 years"
],
"amount": 47601,
"status": "active",
"preferred_customer": true
}
}
}' | jq
Example with $vectorize
curl --location "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/cars_collection" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--header "x-embedding-api-key;" \
--data '{
"insertOne": {
"document": {
"_id": "1",
"purchase_type": "Online",
"$vectorize": "Purchase of a silver BMW sedan in New York.",
"customer": {
"name": "Jim A.",
"phone": "123-456-1111",
"age": 51,
"credit_score": 782,
"address": {
"address_line": "1234 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690045891 },
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car": "BMW 330i Sedan",
"color": "Silver"
},
"Extended warranty - 5 years"
],
"amount": 47601,
"status": "active",
"preferred_customer": true
}
}
}'
Insert many documents
Insert multiple documents into a collection.
When you create a collection, you decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.
When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data:
-
The
$vector
parameter is a reserved field that stores vector arrays.-
If the collection requires that you bring your own embeddings, you can include this parameter when you load data.
-
If the collection uses vectorize, you don’t include
$vector
when you load data. Instead, Astra DB populates the$vector
field with the automatically generated embeddings.
Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use
$vector
to fetch documents by vector search. You can also use projections to include$vector
in responses. -
-
The
$vectorize
parameter is a reserved field that generates embeddings automatically based on a given text string.-
If the collection requires that you bring your own embeddings, you can not use this parameter.
-
If the collection uses vectorize, you must include this parameter when you load data. The value of
$vectorize
is the text string from which you want to generate a document’s embedding. Astra DB stores the resulting vector array in$vector
.
When you find, update, replace, or delete documents in a collection that uses vectorize, you can use
$vectorize
to fetch documents by vector search with vectorize. You can also use projections to include$vectorize
in responses. -
If you load a document that doesn’t need an embedding, then you can omit $vector
and $vectorize
.
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Insert documents with vector embeddings:
response = collection.insert_many(
[
{
"_id": 101,
"name": "John Doe",
"$vector": [.12, .52, .32],
},
{
# ID is generated automatically
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
],
)
Insert multiple documents and generate vectors automatically:
response = collection.insert_many(
[
{
"name": "John Doe",
"$vectorize": "Text to vectorize for John Doe",
},
{
"name": "Jane Doe",
"$vectorize": "Text to vectorize for Jane Doe",
},
],
)
Parameters:
Name | Type | Summary | ||
---|---|---|---|---|
|
|
An iterable of dictionaries, each a document to insert. Documents may specify their |
||
|
|
If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance.
|
||
|
|
How many documents to include in a single API request. The default is 50, and the maximum is 100. |
||
|
|
Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions. |
||
|
|
A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully. |
Returns:
InsertManyResult
- An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.
Example response
InsertManyResult(inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'], raw_results=...)
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])
collection.insert_many(
[{"seq": i} for i in range(50)],
concurrency=5,
)
collection.insert_many(
[
{"tag": "a", "$vector": [1, 2]},
{"tag": "b", "$vector": [3, 4]},
]
)
For more information, see the Client reference.
Insert multiple documents with vectors:
const result = await collection.insertMany([
{
_id: '1',
name: 'John Doe',
$vector: [.12, .52, .32],
},
{
name: 'Jane Doe',
$vector: [.08, .68, .30],
},
], {
ordered: true,
});
Insert multiple documents and generate vectors automatically:
const result = await collection.insertMany([
{
name: 'John Doe',
$vectorize: 'Text to vectorize for John Doe',
},
{
name: 'Jane Doe',
$vectorize: 'Text to vectorize for Jane Doe',
},
], {
ordered: true,
});
Parameters:
Name | Type | Summary |
---|---|---|
|
The documents to insert. If any document does not have an |
|
|
The options for this operation. |
Options (InsertManyOptions
):
Name | Type | Summary | ||
---|---|---|---|---|
|
You may set the
|
|||
|
You can set the |
|||
|
Control how many documents are sent with each network request. The default is 50, and the maximum is 100. |
|||
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<InsertManyResult<Schema>>
- A promise that resolves to the inserted IDs.
Example:
import { DataAPIClient, InsertManyError } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
try {
// Insert many documents
await collection.insertMany([
{ _id: '1', name: 'John Doe' },
{ name: 'Jane Doe' }, // Will autogen ID
], { ordered: true });
// Insert many with vectors
await collection.insertMany([
{ name: 'John Doe', $vector: [.12, .52, .32] },
{ name: 'Jane Doe', $vector: [.32, .52, .12] },
]);
} catch (e) {
if (e instanceof InsertManyError) {
console.log(e.partialResult);
}
}
})();
Operations on documents are performed at the Collection
level.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
For more information, see the Client reference.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);
// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A list of documents to insert.
Documents may specify their |
|
Set the different options for the insert operation. The options are The java operation As a best practice, try to always provide
The default value of If not provided the default values are |
Returns:
InsertManyResult
- Wrapper with the list of inserted document ids.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
import java.util.List;
public class InsertMany {
@Data @AllArgsConstructor
public static class Product {
@JsonProperty("_id")
private String id;
private String name;
}
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Insert a document
Document doc1 = new Document("1").append("name", "joe");
Document doc2 = new Document("2").append("name", "joe");
InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
System.out.println("Identifiers inserted: " + res1.getInsertedIds());
// Given an existing collection
Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION2_NAME", Product.class);
// Insert a document with embeddings
InsertManyOptions options = new InsertManyOptions()
.chunkSize(20) // how many process per request
.concurrency(1) // parallel processing
.ordered(false) // allows parallel processing
.timeout(1000); // timeout in millis
InsertManyResult res2 = collectionProduct.insertMany(
List.of(new Product("1", "joe"),
new Product("2", "joe")),
options);
}
}
With insertMany
, you provide an array of document objects.
The document objects have the same format as insertOne
.
The Data API accepts up to 100 documents per insertMany
request.
Insert multiple documents with vectors:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertMany": {
"documents": [
{
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"key1": "value1",
"key2": "value2"
},
{
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"key1": "value3",
"key2": "value4"
},
{
"$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
"key1": "value3",
"key2": "value4"
},
]
"options": {
"ordered": false
}
}
}' | jq
Insert multiple documents and generate vectors automatically:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertMany": {
"documents": [
{
"$vectorize": "text to vectorize for first document",
"key1": "value1",
"key2": "value2"
},
{
"$vectorize": "text to vectorize for second document",
"key1": "value3",
"key2": "value4"
},
{
"$vectorize": "text to vectorize for third document",
"key1": "value3",
"key2": "value4"
},
]
"options": {
"ordered": false
}
}
}' | jq
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Data API command to insert multiple documents. You can insert up to 100 documents at a time. |
|
|
Contains the details of the records to add. It is an array of objects where each object represents a document. With the exception of reserved fields (
|
|
reserved multi-type |
An optional identifier for a document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option. |
|
reserved |
An optional reserved property used to store an array of numbers representing a vector embedding for a document. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.
|
|
reserved |
An optional reserved property used to store a string that you want to use to automatically generate an embedding for a document.
|
|
|
If false, insertions occur in an arbitrary order with possible concurrency.
If true, insertions occur sequentially.
If you don’t need ordered inserts, DataStax recommends |
Returns:
A successful response contains the _id
of the inserted documents:
{
"status": {
"insertedIds": [
"4",
"7",
"10"
]
}
}
The insertedIds
content depends on the ID type and how it was generated, for example:
-
"insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]
-
`"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"
For more information, see Document IDs.
Example:
Example request
The following insertMany
request adds three documents to a collection:
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertMany": {
"documents": [
{
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"customer": {
"name": "Jack B.",
"phone": "123-456-2222",
"age": 34,
"credit_score": 700,
"address": {
"address_line": "888 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690391491 },
"seller": {
"name": "Tammy S.",
"location": "Staten Island NYC"
},
"items": [
{
"car": "Tesla Model 3",
"color": "White"
},
"Extended warranty - 10 years",
"Service - 5 years"
],
"amount": 53990,
"status": "active"
},
{
"purchase_type": "Online",
"$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
"customer": {
"name": "Jill D.",
"phone": "123-456-3333",
"age": 30,
"credit_score": 742,
"address": {
"address_line": "12345 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690564291 },
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": "Extended warranty - 10 years",
"amount": 4600,
"status": "active"
},
{
"purchase_type": "In Person",
"$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
"customer": {
"name": "Rachel I.",
"phone": null,
"age": 62,
"credit_score": 786,
"address": {
"address_line": "1234 Park Ave",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1706202691 },
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car": "BMW M440i Gran Coupe",
"color": "Silver"
},
"Extended warranty - 5 years",
"Gap Insurance - 5 years"
],
"amount": 65250,
"status": "active"
}
],
"options": {
"ordered": false
}
}
}' | jq
Bulk write (deprecated)
Bulk write is deprecated and scheduled for removal in a future client release. Instead, use insertMany and insertOne instead. You can use a loop or other standard practice to execute multiple sequential insert operations. |
Bulk write is a single command that executes a (reusable) list of insertMany
operations on a collection.
It is available in the clients only.
For HTTP, use insertMany and insertOne instead.
Bulk write (Python)
For more information, see the Client reference.
bw_results = collection.bulk_write(
[
InsertMany([{"a": 1}, {"a": 2}]),
ReplaceOne(
{"z": 9},
replacement={"z": 9, "replaced": True},
upsert=True,
),
],
)
Parameters:
Name | Type | Summary |
---|---|---|
|
|
An iterable over concrete subclasses of |
|
|
Whether to launch the |
|
|
Maximum number of concurrent operations executing at a given time. It cannot be more than one for ordered bulk writes. |
|
|
A timeout, in milliseconds, for the whole bulk write. This method uses the collection-level timeout by default. You may need to increase the timeout duration depending on the number of operations. If the method call times out, there’s no guarantee about how much of the bulk write was completed. |
Returns:
BulkWriteResult
- A single object summarizing the whole list of requested operations. The keys in the map attributes of the result (when present) are the integer indices of the corresponding operation in the requests
iterable.
BulkWriteResult(deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'}, bulk_api_results=...)
Example:
from astrapy import DataAPIClient
from astrapy.operations import (
InsertOne,
InsertMany,
UpdateOne,
UpdateMany,
ReplaceOne,
DeleteOne,
DeleteMany,
)
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
op1 = InsertMany([{"a": 1}, {"a": 2}])
op2 = ReplaceOne({"z": 9}, replacement={"z": 9, "replaced": True}, upsert=True)
collection.bulk_write([op1, op2])
# prints: BulkWriteResult(deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'}, bulk_api_results=...)
collection.count_documents({}, upper_bound=100)
# prints: 3
collection.distinct("replaced")
# prints: [True]
Bulk write (TypeScript)
For more information, see the Client reference.
const results = await collection.bulkWrite([
{ insertOne: { a: '1' } },
{ insertOne: { a: '2' } },
{ replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);
Parameters:
Name | Type | Summary |
---|---|---|
|
The operations to perform. |
|
|
The options for this operation. |
Options (BulkWriteOptions
):
Name | Type | Summary |
---|---|---|
|
You may set the |
|
|
You can set the Not available for ordered operations. |
|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<BulkWriteResult<Schema>>
- A promise that resolves
to a summary of the performed operations.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert some document
await collection.bulkWrite([
{ insertOne: { document: { a: 1 } } },
{ insertOne: { document: { a: 2 } } },
{ replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
]);
// 3
await collection.countDocuments({}, 100);
// [true]
await collection.distinct('replaced');
})();
Bulk write (Java)
// Synchronous
BulkWriteResult bulkWrite(List<Command> commands);
BulkWriteResult bulkWrite(List<Command> commands, BulkWriteOptions options);
// Asynchronous
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands);
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands, BulkWriteOptions options);
Parameters:
Name | Type | Summary |
---|---|---|
|
List of the generic |
|
|
Provide list of options for those commands like |
Returns:
BulkWriteResult
- Wrapper with the list of responses for each command.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.BulkWriteOptions;
import com.datastax.astra.client.model.BulkWriteResult;
import com.datastax.astra.client.model.Command;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.internal.api.ApiResponse;
import java.util.List;
public class BulkWrite {
public static void main(String[] args) {
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Set a couple of Commands
Command cmd1 = Command.create("insertOne").withDocument(new Document().id(1).append("name", "hello"));
Command cmd2 = Command.create("insertOne").withDocument(new Document().id(2).append("name", "hello"));
// Set the options for the bulk write
BulkWriteOptions options1 = BulkWriteOptions.Builder.ordered(false).concurrency(1);
// Execute the queries
BulkWriteResult result = collection.bulkWrite(List.of(cmd1, cmd2), options1);
// Retrieve the LIST of responses
for(ApiResponse res : result.getResponses()) {
System.out.println(res.getData());
}
}
}