Insert documents

Documents represent a single row or record of data in Astra DB Serverless databases.

You use the Collection class to work with documents through the Data API clients. For instructions to get a Collection object, see Work with collections.

For general information about working with documents, including common operations and operators, see the Work with documents.

For more information about the Data API and clients, see Get started with the Data API.

Insert many documents

Insert multiple documents into a collection.

For information about the $vector and $vectorize reserved fields, see Vector and vectorize.

Python
TypeScript
Java
curl

For more information, see the Client reference.

Insert documents with vector embeddings:

response = collection.insert_many(
    [
        {
            "_id": 101,
            "name": "John Doe",
            "$vector": [.12, .52, .32],
        },
        {
            # ID is generated automatically
            "name": "Jane Doe",
            "$vector": [.08, .68, .30],
        },
    ],
)

Insert multiple documents and generate vector embeddings automatically:

response = collection.insert_many(
    [
        {
            "name": "John Doe",
            "$vectorize": "Text to vectorize for John Doe",
        },
        {
            "name": "Jane Doe",
            "$vectorize": "Text to vectorize for Jane Doe",
        },
    ],
)

Parameters:

Name Type Summary

documents

Iterable[Dict[str, Any]]

An iterable of dictionaries, each a document to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. Each document may contain the $vector or the $vectorize fields, but not both.

ordered

bool

If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance.

DataStax recommends ordered = False, which typically results in a much higher insert throughput than an equivalent ordered insertion.

chunk_size

Optional[int]

How many documents to include in a single API request. The default is 50, and the maximum is 100.

concurrency

Optional[int]

Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully.

Returns:

InsertManyResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response

InsertManyResult(inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'], raw_results=...)

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection

collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])

collection.insert_many(
    [{"seq": i} for i in range(50)],
    concurrency=5,
)

collection.insert_many(
    [
        {"tag": "a", "$vector": [1, 2]},
        {"tag": "b", "$vector": [3, 4]},
    ]
)

For more information, see the Client reference.

Insert multiple documents with vectors:

const result = await collection.insertMany([
  {
    _id: '1',
    name: 'John Doe',
    $vector: [.12, .52, .32],
  },
  {
    name: 'Jane Doe',
    $vector: [.08, .68, .30],
  },
], {
  ordered: true,
});

Insert multiple documents and generate vector embeddings automatically:

const result = await collection.insertMany([
  {
    name: 'John Doe',
    $vectorize: 'Text to vectorize for John Doe',
  },
  {
    name: 'Jane Doe',
    $vectorize: 'Text to vectorize for Jane Doe',
  },
], {
  ordered: true,
});

Parameters:

Name Type Summary

Name	Type	Summary
`documents`	`MaybeId<Schema>[]`	The documents to insert. If any document does not have an `_id` field, the server generates one. They may each contain a `$vector` or `$vectorize` field to enable semantic searching.
`options?`	`InsertManyOptions`	The options for this operation.

documents

MaybeId<Schema>[]

The documents to insert. If any document does not have an _id field, the server generates one. They may each contain a $vector or $vectorize field to enable semantic searching.

options?

InsertManyOptions

The options for this operation.

Options (InsertManyOptions):

Name Type Summary

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all documents may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

DataStax recommends ordered: false, which typically results in a much higher insert throughput than an equivalent ordered insertion.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered insertions. Defaults to 8. This is not available for ordered insertions.

chunkSize?

number

Control how many documents are sent with each network request. The default is 50, and the maximum is 100.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertManyResult<Schema>> - A promise that resolves to the inserted IDs.

Example:

import { DataAPIClient, InsertManyError } from '@datastax/astra-db-ts';

// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');

(async function () {
  try {
    // Insert many documents
    await collection.insertMany([
      { _id: '1', name: 'John Doe' },
      { name: 'Jane Doe' }, // Will autogen ID
    ], { ordered: true });

    // Insert many with vectors
    await collection.insertMany([
      { name: 'John Doe', $vector: [.12, .52, .32] },
      { name: 'Jane Doe', $vector: [.32, .52, .12] },
    ]);
  } catch (e) {
    if (e instanceof InsertManyError) {
      console.log(e.partialResult);
    }
  }
})();

Operations on documents are performed at the Collection level. Collection is a generic class with the default type of Document. You can specify your own type, and the object is serialized by Jackson. For more information, see the Client reference.

Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async and returns a CompletableFuture:

// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);

// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);

Parameters:

Name Type Summary

Name	Type	Summary
`docList`	`List<? extends DOC>`	A list of documents to insert. Documents may specify their `_id` field or leave it out, in which case it will be added automatically. If the collection is associated with an embedding service, it will generate vector embeddings automatically from the `$vectorize` field in each document. You can also set the `$vector` field directly.
`options` (optional)	`InsertManyOptions`	Set the different options for the insert operation. The options are `ordered`, `concurrency`, `chunkSize`. The Java operation `insertMany` can take as many documents as you want as long as it fits in your JVM memory. It will split the documents in chunks of `chunkSize` and send them to the server in a distributed way through an `ExecutorService`. As a best practice, try to always provide `InsertManyOptions`, even when using defaults, because it brings visibility to the readers: `InsertManyOptions.Builder .chunkSize(20) // batch size, 100 is max .concurrency(8) // concurrent insertions .ordered(false) // unordered insertions .build();` The default value of `chunkSize` is 50, and the maximum value is 100. To set the size of the executor use `concurrency`. DataStax recommends `ordered(false)` for performance reasons because it can insert chunks in parallel. If not provided the default values are `chunkSize=50`, `concurrency=1` and `ordered=false`.

docList

List<? extends DOC>

A list of documents to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically. If the collection is associated with an embedding service, it will generate vector embeddings automatically from the $vectorize field in each document. You can also set the $vector field directly.

options (optional)

InsertManyOptions

Set the different options for the insert operation. The options are ordered, concurrency, chunkSize.

The Java operation insertMany can take as many documents as you want as long as it fits in your JVM memory. It will split the documents in chunks of chunkSize and send them to the server in a distributed way through an ExecutorService.

As a best practice, try to always provide InsertManyOptions, even when using defaults, because it brings visibility to the readers:

InsertManyOptions.Builder
  .chunkSize(20)  // batch size, 100 is max
  .concurrency(8) // concurrent insertions
  .ordered(false) // unordered insertions
  .build();

The default value of chunkSize is 50, and the maximum value is 100. To set the size of the executor use concurrency. DataStax recommends ordered(false) for performance reasons because it can insert chunks in parallel.

If not provided the default values are chunkSize=50, concurrency=1 and ordered=false.

Returns:

InsertManyResult - Wrapper with the list of inserted document ids.

Example:

package com.datastax.astra.client.collection;

import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;

import java.util.List;

public class InsertMany {

    @Data @AllArgsConstructor
    public static class Product {
        @JsonProperty("_id")
        private String id;
        private String name;
    }

    public static void main(String[] args) {
        // Given an existing collection
        Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION_NAME");

        // Insert a document
        Document doc1 = new Document("1").append("name", "joe");
        Document doc2 = new Document("2").append("name", "joe");
        InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
        System.out.println("Identifiers inserted: " + res1.getInsertedIds());

        // Given an existing collection
        Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
                .getDatabase("API_ENDPOINT")
                .getCollection("COLLECTION2_NAME", Product.class);

        // Insert a document with embeddings
        InsertManyOptions options = new InsertManyOptions()
                .chunkSize(20)  // how many process per request
                .concurrency(1) // parallel processing
                .ordered(false) // allows parallel processing
                .timeout(1000); // timeout in millis

        InsertManyResult res2 = collectionProduct.insertMany(
                List.of(new Product("1", "joe"),
                        new Product("2", "joe")),
                options);
    }
}

With insertMany, you provide an array of document objects. The document objects have the same format as insertOne.

The Data API accepts up to 100 documents per insertMany request.

Insert multiple documents with vectors:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "key1": "value1",
        "key2": "value2"
      },
      {
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
        "key1": "value3",
        "key2": "value4"
      },
      {
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "key1": "value3",
        "key2": "value4"
      },
    ]
    "options": {
      "ordered": false
    }
  }
}' | jq

Insert multiple documents and generate vector embeddings automatically:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "$vectorize": "text to vectorize for first document",
        "key1": "value1",
        "key2": "value2"
      },
      {
        "$vectorize": "text to vectorize for second document",
        "key1": "value3",
        "key2": "value4"
      },
      {
        "$vectorize": "text to vectorize for third document",
        "key1": "value3",
        "key2": "value4"
      },
    ]
    "options": {
      "ordered": false
    }
  }
}' | jq

Parameters:

Name Type Summary

Name	Type	Summary
`insertMany`	`command`	Data API command to insert multiple documents. You can insert up to 100 documents per request.
`documents`	`array`	Contains the details of the records to add. It is an array of objects where each object represents a document. With the exception of reserved fields (`_id`, `$vector`, and `$vectorize`), document data can be any valid JSON, including strings, integers, booleans, dates, objects, nested objects, and arrays: `"documents": [ { "string_example": "string value", "object_example": { "a": "one", "b": 2, "nested_object": { "c": false } }, "date_example": { "$date": 1690045891 }, "array_example": [ { "d.e": "hello", "f.g": "goodbye" }, "arbitrary string in an array" ] } ]`
`_id`	reserved multi-type	An optional identifier for a document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option.
`$vector`	reserved `array`	An optional reserved property used to store an array of numbers representing a vector embedding for a document. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search. `$vector` and `$vectorize` are mutually exclusive.
`$vectorize`	reserved `string`	An optional reserved property used to store a string that you want to use to automatically generate an embedding for a document. `$vector` and `$vectorize` are mutually exclusive.
`options.ordered`	`boolean`	If false, insertions occur in an arbitrary order with possible concurrency. If true, insertions occur sequentially. If you don’t need ordered inserts, DataStax recommends `"ordered": false`, which typically results in a much higher insert throughput than an equivalent ordered insertion.

insertMany

command

Data API command to insert multiple documents. You can insert up to 100 documents per request.

documents

array

Contains the details of the records to add. It is an array of objects where each object represents a document.

With the exception of reserved fields (_id, $vector, and $vectorize), document data can be any valid JSON, including strings, integers, booleans, dates, objects, nested objects, and arrays:

    "documents": [
      {
        "string_example": "string value",
        "object_example": {
          "a": "one",
          "b": 2,
          "nested_object": {
            "c": false
          }
        },
        "date_example": { "$date": 1690045891 },
        "array_example": [
          {
            "d.e": "hello",
            "f.g": "goodbye"
          },
          "arbitrary string in an array"
        ]
      }
    ]

_id

reserved multi-type

An optional identifier for a document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option.

$vector

reserved array

An optional reserved property used to store an array of numbers representing a vector embedding for a document. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.

$vector and $vectorize are mutually exclusive.

$vectorize

reserved string

An optional reserved property used to store a string that you want to use to automatically generate an embedding for a document.

$vector and $vectorize are mutually exclusive.

options.ordered

boolean

If false, insertions occur in an arbitrary order with possible concurrency. If true, insertions occur sequentially. If you don’t need ordered inserts, DataStax recommends "ordered": false, which typically results in a much higher insert throughput than an equivalent ordered insertion.

Returns:

A successful response contains the _id of the inserted documents:

{
  "status": {
    "insertedIds": [
      "4",
      "7",
      "10"
    ]
  }
}

The insertedIds content depends on the ID type and how it was generated, for example:

"insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]
`"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"

For more information, see Document IDs.

Example:

Example request

The following insertMany request adds three documents to a collection:

curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "customer": {
          "name": "Jack B.",
          "phone": "123-456-2222",
        "age": 34,
        "credit_score": 700,
          "address": {
            "address_line": "888 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1690391491 },
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [
          {
            "car": "Tesla Model 3",
            "color": "White"
          },
          "Extended warranty - 10 years",
            "Service - 5 years"
        ],
        "amount": 53990,
      "status": "active"
      },
      {
        "purchase_type": "Online",
        "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
        "customer": {
          "name": "Jill D.",
          "phone": "123-456-3333",
        "age": 30,
        "credit_score": 742,
          "address": {
            "address_line": "12345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1690564291 },
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": "Extended warranty - 10 years",
        "amount": 4600,
        "status": "active"
      },
      {
        "purchase_type": "In Person",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Rachel I.",
          "phone": null,
        "age": 62,
        "credit_score": 786,
          "address": {
            "address_line": "1234 Park Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": { "$date": 1706202691 },
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [
          {
            "car": "BMW M440i Gran Coupe",
            "color": "Silver"
          },
          "Extended warranty - 5 years",
          "Gap Insurance - 5 years"
        ],
        "amount": 65250,
        "status": "active"
      }
    ],
    "options": {
      "ordered": false
    }
  }
}' | jq

Insert documents

Insert many documents

Was this helpful?

Give Feedback