Insert documents reference

Documents represent a single row or record of data in Astra DB Serverless databases. You use the Collection class to work with documents through the Data API. For instructions to get a Collection object, see the Collections reference.

Astra DB APIs use the term keyspace to refer to both namespaces and keyspaces.

For general information about working with documents, including common operations and operators, see the Documents reference.

Insert a single document

Insert a single document into a collection.

When you create a collection, you decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.

When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data:

  • The $vector parameter is a reserved field that stores vector arrays.

    • If the collection requires that you bring your own embeddings, you can include this parameter when you load data.

    • If the collection uses vectorize, you don’t include $vector when you load data. Instead, Astra DB populates the $vector field with the automatically generated embeddings.

    Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use $vector to fetch documents by vector search. You can also use projections to include $vector in responses.

  • The $vectorize parameter is a reserved field that generates embeddings automatically based on a given text string.

    • If the collection requires that you bring your own embeddings, you can not use this parameter.

    • If the collection uses vectorize, you must include this parameter when you load data. The value of $vectorize is the text string from which you want to generate a document’s embedding. Astra DB stores the resulting vector array in $vector.

    When you find, update, replace, or delete documents in a collection that uses vectorize, you can use $vectorize to fetch documents by vector search with vectorize. You can also use projections to include $vectorize in responses.

If you load a document that doesn’t need an embedding, then you can omit $vector and $vectorize.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Insert a document:

insert_result = collection.insert_one({"name": "Jane Doe"})

Insert a document with an associated vector:

insert_result = collection.insert_one(
    {
      "name": "Jane Doe",
      "$vector": [.08, .68, .30],
    },
)

Insert a document and generate a vector automatically:

insert_result = collection.insert_one(
    {
      "name": "Jane Doe",
      "$vectorize": "Text to vectorize",
    },
)

Parameters:

Name Type Summary

document

Dict

The dictionary expressing the document to insert. The _id field of the document can be left out, in which case it will be created automatically. The document may contain the $vector or the $vectorize fields, but not both.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead.

Returns:

InsertOneResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertOneResult(inserted_id='92b4c4f4-db44-4440-b4c4-f4db44e440b8', raw_results=...)

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

# Insert a document with a specific ID
response1 = collection.insert_one(
    {
        "_id": 101,
        "name": "John Doe",
        "$vector": [.12, .52, .32],
    },
)

# Insert a document without specifying an ID
# so that _id is generated automatically
response2 = collection.insert_one(
    {
        "name": "Jane Doe",
        "$vector": [.08, .68, .30],
    },
)

For more information, see the Client reference.

const result = await collection.insertOne({ name: 'Jane Doe' });

Insert a document with an associated vector:

const result = await collection.insertOne({
  name: 'Jane Doe',
  $vector: [.08, .68, .30],
});

Insert a document and generate a vector automatically:

const result = await collection.insertOne({
  name: 'Jane Doe',
  $vectorize: 'Text to vectorize',
});

Parameters:

Name Type Summary

document

MaybeId<Schema>

The document to insert. If the document does not have an _id field, the server generates one. It may contain a $vector or $vectorize field to enable semantic searching.

options?

InsertOneOptions

The options for this operation.

Options (InsertOneOptions):

Name Type Summary

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertOneResult<Schema>> - A promise that resolves to the inserted ID.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert a document with a specific ID
  await collection.insertOne({ _id: '1', name: 'John Doe' });

  // Insert a document with an autogenerated ID
  await collection.insertOne({ name: 'Jane Doe' });

  // Insert a document with a vector
  await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();

Operations on documents are performed at the Collection level. For more information, see the Client reference.

Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

InsertOneResult insertOne(DOC document);
InsertOneResult insertOne(DOC document, float[] embeddings);

// Equivalent in asynchronous
CompletableFuture<InsertOneResult> insertOneAsync(DOC document);
CompletableFuture<InsertOneResult> insertOneAsync(DOC document, float[] embeddings);

Parameters:

Name Type Summary

document

DOC

Object representing the document to insert. The _id field of the document can be left out, in which case it will be created automatically. If the collection is associated with an embedding service, it will generate a vector automatically from the $vectorize field.

embeddings

float[]

A vector of embeddings (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the $vector field of the document itself.

Returns:

InsertOneResult - Wrapper with the inserted document Id.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertOneOptions;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

public class InsertOne {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        InsertOneResult res1 = collectionDoc.insertOne(doc1);
        System.out.println(res1.getInsertedId()); // should be "1"

        // Insert a document with embeddings
        Document doc2 = new Document("2").append("name", "joe");
        collectionDoc.insertOne(doc2, new float[] {.1f, .2f});

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with custom bean
        collectionProduct.insertOne(new Product("1", "joe"));
        collectionProduct.insertOne(new Product("2", "joe"), new float[] {.1f, .2f});

    }
}

Insert a document with a predefined vector:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "key1": "value1",
      "key2": "value2"
    }
  }
}' | jq
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "$vectorize": "Text to use to generate a vector",
      "key1": "value1",
      "key2": "value2"
    }
  }
}' | jq

Parameters:

Name Type Summary

insertOne

command

Data API command to insert one document in a collection.

document

object

Contains the details of the record to add.

With the exception of reserved fields (_id, $vector, and $vectorize), document data can be any valid JSON, including strings, integers, booleans, dates, objects, nested objects, and arrays:

    "document": {
      "string_example": "string value",
      "object_example": {
        "a": "one",
        "b": 2,
        "nested_object": {
          "c": false
        }
      },
      "date_example": { "$date": 1690045891 },
      "array_example": [
        {
          "d.e": "hello",
          "f.g": "goodbye"
        },
        "arbitrary string in an array"
      ]
    }

_id

reserved, multi-type

An optional identifier for the document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option.

$vector

reserved array

An optional reserved property used to store an array of numbers representing a vector embedding. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.

$vector and $vectorize are mutually exclusive.

$vectorize

reserved string

An optional reserved property used to store a string that you want to use to automatically generate an embedding with vectorize.

$vector and $vectorize are mutually exclusive.

Returns:

A successful response contains the _id of the inserted document:

{
  "status": {
    "insertedIds": [
      "12"
    ]
  }
}

The insertedIds content depends on the ID type and how it was generated, for example:

  • "insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]

  • `"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"

For more information, see Document IDs.

Example:

Example with $vector
curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertOne": {
    "document": {
      "purchase_type": "Online",
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": { "$date": 1690045891 },
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car": "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status": "active",
      "preferred_customer": true
    }
  }
}' | jq
Example with $vectorize
curl --location "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/cars_collection" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--header "x-embedding-api-key;" \
--data '{
  "insertOne": {
    "document": {
      "_id": "1",
      "purchase_type": "Online",
      "$vectorize": "Purchase of a silver BMW sedan in New York.",
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": { "$date": 1690045891 },
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car": "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status": "active",
      "preferred_customer": true
    }
  }
}'

Insert many documents

Insert multiple documents into a collection.

When you create a collection, you decide if the collection can store structured vector data. For vector-enabled collections, you also decide how to provide embeddings. You can either configure the collection to automatically generate embeddings with vectorize or provide embeddings when you load data (also known as bring your own embeddings). You must decide this when you create the collection.

When working with documents in the Astra Portal or Data API, there are two reserved fields for vector data:

  • The $vector parameter is a reserved field that stores vector arrays.

    • If the collection requires that you bring your own embeddings, you can include this parameter when you load data.

    • If the collection uses vectorize, you don’t include $vector when you load data. Instead, Astra DB populates the $vector field with the automatically generated embeddings.

    Regardless of the embedding generation method, when you find, update, replace, or delete documents, you can use $vector to fetch documents by vector search. You can also use projections to include $vector in responses.

  • The $vectorize parameter is a reserved field that generates embeddings automatically based on a given text string.

    • If the collection requires that you bring your own embeddings, you can not use this parameter.

    • If the collection uses vectorize, you must include this parameter when you load data. The value of $vectorize is the text string from which you want to generate a document’s embedding. Astra DB stores the resulting vector array in $vector.

    When you find, update, replace, or delete documents in a collection that uses vectorize, you can use $vectorize to fetch documents by vector search with vectorize. You can also use projections to include $vectorize in responses.

If you load a document that doesn’t need an embedding, then you can omit $vector and $vectorize.

  • Python

  • TypeScript

  • Java

  • curl

For more information, see the Client reference.

Insert documents with vector embeddings:

response = collection.insert_many(
    [
        {
            "_id": 101,
            "name": "John Doe",
            "$vector": [.12, .52, .32],
        },
        {
            # ID is generated automatically
            "name": "Jane Doe",
            "$vector": [.08, .68, .30],
        },
    ],
)

Insert multiple documents and generate vectors automatically:

response = collection.insert_many(
    [
        {
            "name": "John Doe",
            "$vectorize": "Text to vectorize for John Doe",
        },
        {
            "name": "Jane Doe",
            "$vectorize": "Text to vectorize for Jane Doe",
        },
    ],
)

Parameters:

Name Type Summary

documents

Iterable[Dict[str, Any]]

An iterable of dictionaries, each a document to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. Each document may contain the $vector or the $vectorize fields, but not both.

ordered

bool

If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance.

DataStax recommends ordered = False, which typically results in a much higher insert throughput than an equivalent ordered insertion.

chunk_size

Optional[int]

How many documents to include in a single API request. The default is 50, and the maximum is 100.

concurrency

Optional[int]

Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully.

Returns:

InsertManyResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertManyResult(inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'], raw_results=...)

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])

collection.insert_many(
    [{"seq": i} for i in range(50)],
    concurrency=5,
)

collection.insert_many(
    [
        {"tag": "a", "$vector": [1, 2]},
        {"tag": "b", "$vector": [3, 4]},
    ]
)

For more information, see the Client reference.

Insert multiple documents with vectors:

const result = await collection.insertMany([
  {
    _id: '1',
    name: 'John Doe',
    $vector: [.12, .52, .32],
  },
  {
    name: 'Jane Doe',
    $vector: [.08, .68, .30],
  },
], {
  ordered: true,
});

Insert multiple documents and generate vectors automatically:

const result = await collection.insertMany([
  {
    name: 'John Doe',
    $vectorize: 'Text to vectorize for John Doe',
  },
  {
    name: 'Jane Doe',
    $vectorize: 'Text to vectorize for Jane Doe',
  },
], {
  ordered: true,
});

Parameters:

Name Type Summary

documents

MaybeId<Schema>[]

The documents to insert. If any document does not have an _id field, the server generates one. They may each contain a $vector or $vectorize field to enable semantic searching.

options?

InsertManyOptions

The options for this operation.

Options (InsertManyOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all documents may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

DataStax recommends ordered: false, which typically results in a much higher insert throughput than an equivalent ordered insertion.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered insertions. Defaults to 8. This is not available for ordered insertions.

chunkSize?

number

Control how many documents are sent with each network request. The default is 50, and the maximum is 100.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertManyResult<Schema>> - A promise that resolves to the inserted IDs.

Example:

import { DataAPIClient, InsertManyError } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  try {
    // Insert many documents
    await collection.insertMany([
      { _id: '1', name: 'John Doe' },
      { name: 'Jane Doe' }, // Will autogen ID
    ], { ordered: true });

    // Insert many with vectors
    await collection.insertMany([
      { name: 'John Doe', $vector: [.12, .52, .32] },
      { name: 'Jane Doe', $vector: [.32, .52, .12] },
    ]);
  } catch (e) {
    if (e instanceof InsertManyError) {
      console.log(e.partialResult);
    }
  }
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the Client reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);

// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);

Parameters:

Name Type Summary

docList

List<? extends DOC>

A list of documents to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. If the collection is associated with an embedding service, it will generate vectors automatically from the $vectorize field in each document. You can also set the $vector field directly.

options (optional)

InsertManyOptions

Set the different options for the insert operation. The options are ordered, concurrency, chunkSize.

The java operation insertMany can take as many documents as you want as long as it fits in your JVM memory. It will split the documents in chunks of chunkSize and send them to the server in a distributed way through an ExecutorService.

As a best practice, try to always provide InsertManyOptions, even when using defaults, because it brings visibility to the readers:

InsertManyOptions.Builder
  .chunkSize(20)  // batch size, 100 is max
  .concurrency(8) // concurrent insertions
  .ordered(false) // unordered insertions
  .build();

The default value of chunkSize is 50, and the maximum value is 100. To set the size of the executor use concurrency. DataStax recommends ordered(false) for performance reasons because it can insert chunks in parallel.

If not provided the default values are chunkSize=50, concurrency=1 and ordered=false.

Returns:

InsertManyResult - Wrapper with the list of inserted document ids.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

import java.util.List;

public class InsertMany {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        Document doc2 = new Document("2").append("name", "joe");
        InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
        System.out.println("Identifiers inserted: " + res1.getInsertedIds());

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with embeddings
        InsertManyOptions options = new InsertManyOptions()
                .chunkSize(20)  // how many process per request
                .concurrency(1) // parallel processing
                .ordered(false) // allows parallel processing
                .timeout(1000); // timeout in millis

        InsertManyResult res2 = collectionProduct.insertMany(
                List.of(new Product("1", "joe"),
                        new Product("2", "joe")),
                options);
    }
}

With insertMany, you provide an array of document objects. The document objects have the same format as insertOne.

The Data API accepts up to 100 documents per insertMany request.

Insert multiple documents with vectors:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "key1": "value1",
        "key2": "value2"
      },
      {
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
        "key1": "value3",
        "key2": "value4"
      },
      {
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "key1": "value3",
        "key2": "value4"
      },
    ]
    "options": {
      "ordered": false
    }
  }
}' | jq

Insert multiple documents and generate vectors automatically:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "$vectorize": "text to vectorize for first document",
        "key1": "value1",
        "key2": "value2"
      },
      {
        "$vectorize": "text to vectorize for second document",
        "key1": "value3",
        "key2": "value4"
      },
      {
        "$vectorize": "text to vectorize for third document",
        "key1": "value3",
        "key2": "value4"
      },
    ]
    "options": {
      "ordered": false
    }
  }
}' | jq

Parameters:

Name Type Summary

insertMany

command

Data API command to insert multiple documents. You can insert up to 100 documents at a time.

documents

array of objects

Contains the details of the records to add. It is an array of objects where each object represents a document.

With the exception of reserved fields (_id, $vector, and $vectorize), document data can be any valid JSON, including strings, integers, booleans, dates, objects, nested objects, and arrays:

    "documents": [
      {
        "string_example": "string value",
        "object_example": {
          "a": "one",
          "b": 2,
          "nested_object": {
            "c": false
          }
        },
        "date_example": { "$date": 1690045891 },
        "array_example": [
          {
            "d.e": "hello",
            "f.g": "goodbye"
          },
          "arbitrary string in an array"
        ]
      }
    ]

_id

reserved multi-type

An optional identifier for a document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option.

$vector

reserved array

An optional reserved property used to store an array of numbers representing a vector embedding for a document. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.

$vector and $vectorize are mutually exclusive.

$vectorize

reserved string

An optional reserved property used to store a string that you want to use to automatically generate an embedding for a document.

$vector and $vectorize are mutually exclusive.

options.ordered

boolean

If false, insertions occur in an arbitrary order with possible concurrency. If true, insertions occur sequentially. If you don’t need ordered inserts, DataStax recommends "ordered": false, which typically results in a much higher insert throughput than an equivalent ordered insertion.

Returns:

A successful response contains the _id of the inserted documents:

{
  "status": {
    "insertedIds": [
      "4",
      "7",
      "10"
    ]
  }
}

The insertedIds content depends on the ID type and how it was generated, for example:

  • "insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]

  • `"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"

For more information, see Document IDs.

Example:

Example request

The following insertMany request adds three documents to a collection:

curl -sS --location -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "customer": {
          "name": "Jack B.",
          "phone": "123-456-2222",
        "age": 34,
        "credit_score": 700,
          "address": {
            "address_line": "888 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1690391491 },
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [
          {
            "car": "Tesla Model 3",
            "color": "White"
          },
          "Extended warranty - 10 years",
            "Service - 5 years"
        ],
        "amount": 53990,
      "status": "active"
      },
      {
        "purchase_type": "Online",
        "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
        "customer": {
          "name": "Jill D.",
          "phone": "123-456-3333",
        "age": 30,
        "credit_score": 742,
          "address": {
            "address_line": "12345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1690564291 },
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": "Extended warranty - 10 years",
        "amount": 4600,
        "status": "active"
      },
      {
        "purchase_type": "In Person",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Rachel I.",
          "phone": null,
        "age": 62,
        "credit_score": 786,
          "address": {
            "address_line": "1234 Park Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1706202691 },
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [
          {
            "car": "BMW M440i Gran Coupe",
            "color": "Silver"
          },
          "Extended warranty - 5 years",
          "Gap Insurance - 5 years"
        ],
        "amount": 65250,
        "status": "active"
      }
    ],
    "options": {
      "ordered": false
    }
  }
}' | jq

Bulk write (deprecated)

Bulk write is deprecated and scheduled for removal in a future client release.

Instead, use insertMany and insertOne instead. You can use a loop or other standard practice to execute multiple sequential insert operations.

Bulk write is a single command that executes a (reusable) list of insertMany operations on a collection. It is available in the clients only. For HTTP, use insertMany and insertOne instead.

Bulk write (Python)

For more information, see the Client reference.

bw_results = collection.bulk_write(
    [
        InsertMany([{"a": 1}, {"a": 2}]),
        ReplaceOne(
            {"z": 9},
            replacement={"z": 9, "replaced": True},
            upsert=True,
        ),
    ],
)

Parameters:

Name Type Summary

requests

Iterable[BaseOperation]

An iterable over concrete subclasses of BaseOperation, such as InsertMany or ReplaceOne. Each such object represents an operation ready to be executed on a collection, and is instantiated by passing the same parameters as one would the corresponding collection method.

ordered

bool

Whether to launch the requests one after the other or in arbitrary order, possibly in a concurrent fashion. DataStax suggests False (default) when possible for faster performance.

concurrency

Optional[int]

Maximum number of concurrent operations executing at a given time. It cannot be more than one for ordered bulk writes.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the whole bulk write. This method uses the collection-level timeout by default. You may need to increase the timeout duration depending on the number of operations. If the method call times out, there’s no guarantee about how much of the bulk write was completed.

Returns:

BulkWriteResult - A single object summarizing the whole list of requested operations. The keys in the map attributes of the result (when present) are the integer indices of the corresponding operation in the requests iterable.

Example response
BulkWriteResult(deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'}, bulk_api_results=...)

Example:

from astrapy import DataAPIClient
from astrapy.operations import (
    InsertOne,
    InsertMany,
    UpdateOne,
    UpdateMany,
    ReplaceOne,
    DeleteOne,
    DeleteMany,
)
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

op1 = InsertMany([{"a": 1}, {"a": 2}])
op2 = ReplaceOne({"z": 9}, replacement={"z": 9, "replaced": True}, upsert=True)
collection.bulk_write([op1, op2])
# prints: BulkWriteResult(deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'}, bulk_api_results=...)
collection.count_documents({}, upper_bound=100)
# prints: 3
collection.distinct("replaced")
# prints: [True]
Bulk write (TypeScript)

For more information, see the Client reference.

const results = await collection.bulkWrite([
  { insertOne: { a: '1' } },
  { insertOne: { a: '2' } },
  { replaceOne: { z: '9' }, replacement: { z: '9', replaced: true }, upsert: true },
]);

Parameters:

Name Type Summary

operations

AnyBulkWriteOperation<Schema>[]

The operations to perform.

options?

BulkWriteOptions

The options for this operation.

Options (BulkWriteOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all operations may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered operations. Defaults to 8.

Not available for ordered operations.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<BulkWriteResult<Schema>> - A promise that resolves to a summary of the performed operations.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  // Insert some document
  await collection.bulkWrite([
    { insertOne: { document: { a: 1 } } },
    { insertOne: { document: { a: 2 } } },
    { replaceOne: { filter: { z: 9 }, replacement: { z: 9, replaced: true }, upsert: true } },
  ]);

  // 3
  await collection.countDocuments({}, 100);

  // [true]
  await collection.distinct('replaced');
})();
Bulk write (Java)
// Synchronous
BulkWriteResult bulkWrite(List<Command> commands);
BulkWriteResult bulkWrite(List<Command> commands, BulkWriteOptions options);

// Asynchronous
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands);
CompletableFuture<BulkWriteResult> bulkWriteAsync(List<Command> commands, BulkWriteOptions options);

Parameters:

Name Type Summary

commands

List<Command>

List of the generic Command to execute.

options (optional)

BulkWriteOptions

Provide list of options for those commands like ordered or concurrency.

Returns:

BulkWriteResult - Wrapper with the list of responses for each command.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.BulkWriteOptions;
import com.datastax.astra.client.model.BulkWriteResult;
import com.datastax.astra.client.model.Command;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.internal.api.ApiResponse;

import java.util.List;

public class BulkWrite {
    public static void main(String[] args) {
        Collection<Document> collection = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Set a couple of Commands
        Command cmd1 = Command.create("insertOne").withDocument(new Document().id(1).append("name", "hello"));
        Command cmd2 = Command.create("insertOne").withDocument(new Document().id(2).append("name", "hello"));

        // Set the options for the bulk write
        BulkWriteOptions options1 = BulkWriteOptions.Builder.ordered(false).concurrency(1);

        // Execute the queries
        BulkWriteResult result = collection.bulkWrite(List.of(cmd1, cmd2), options1);

        // Retrieve the LIST of responses
        for(ApiResponse res : result.getResponses()) {
            System.out.println(res.getData());
        }
    }

}

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com