Insert documents reference
Documents represent a single row or record of data in Astra DB Serverless databases.
You use the Collection
class to work with documents through the Data API clients.
For instructions to get a Collection
object, see Work with collections.
For general information about working with documents, including common operations and operators, see the Work with documents.
For more information about the Data API and clients, see Get started with the Data API.
Insert a single document
Insert a single document into a collection.
For information about the $vector
and $vectorize
reserved fields, see Vector and vectorize.
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Insert a document:
insert_result = collection.insert_one({"name": "Jane Doe"})
Insert a document with an associated vector:
insert_result = collection.insert_one(
{
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
)
Insert a document and generate an embedding automatically:
insert_result = collection.insert_one(
{
"name": "Jane Doe",
"$vectorize": "Text to vectorize",
},
)
Parameters:
Name | Type | Summary |
---|---|---|
|
|
The dictionary expressing the document to insert. The |
|
|
A timeout, in milliseconds, for the underlying HTTP request. If not passed, the collection-level setting is used instead. |
Returns:
InsertOneResult
- An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.
Example response
InsertOneResult(inserted_id='92b4c4f4-db44-4440-b4c4-f4db44e440b8', raw_results=...)
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
# Insert a document with a specific ID
response1 = collection.insert_one(
{
"_id": 101,
"name": "John Doe",
"$vector": [.12, .52, .32],
},
)
# Insert a document without specifying an ID
# so that _id
is generated automatically
response2 = collection.insert_one(
{
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
)
For more information, see the Client reference.
const result = await collection.insertOne({ name: 'Jane Doe' });
Insert a document with an associated vector:
const result = await collection.insertOne({
name: 'Jane Doe',
$vector: [.08, .68, .30],
});
Insert a document and generate an embedding automatically:
const result = await collection.insertOne({
name: 'Jane Doe',
$vectorize: 'Text to vectorize',
});
Parameters:
Name | Type | Summary |
---|---|---|
|
The document to insert. If the document does not have an |
|
|
The options for this operation. |
Options (InsertOneOptions
):
Name | Type | Summary |
---|---|---|
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<InsertOneResult<Schema>>
- A promise that resolves
to the inserted ID.
Example:
import { DataAPIClient } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
// Insert a document with a specific ID
await collection.insertOne({ _id: '1', name: 'John Doe' });
// Insert a document with an autogenerated ID
await collection.insertOne({ name: 'Jane Doe' });
// Insert a document with a vector
await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();
Operations on documents are performed at the Collection
level.
For more information, see the Client reference.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
InsertOneResult insertOne(DOC document);
InsertOneResult insertOne(DOC document, float[] embeddings);
// Equivalent in asynchronous
CompletableFuture<InsertOneResult> insertOneAsync(DOC document);
CompletableFuture<InsertOneResult> insertOneAsync(DOC document, float[] embeddings);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Object representing the document to insert.
The |
|
|
A vector of embeddings (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the |
Returns:
InsertOneResult
- Wrapper with the inserted document Id.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertOneOptions;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
public class InsertOne {
@Data @AllArgsConstructor
public static class Product {
@JsonProperty("_id")
private String id;
private String name;
}
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Insert a document
Document doc1 = new Document("1").append("name", "joe");
InsertOneResult res1 = collectionDoc.insertOne(doc1);
System.out.println(res1.getInsertedId()); // should be "1"
// Insert a document with embeddings
Document doc2 = new Document("2").append("name", "joe");
collectionDoc.insertOne(doc2, new float[] {.1f, .2f});
// Given an existing collection
Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION2_NAME", Product.class);
// Insert a document with custom bean
collectionProduct.insertOne(new Product("1", "joe"));
collectionProduct.insertOne(new Product("2", "joe"), new float[] {.1f, .2f});
}
}
Insert a document with a predefined vector:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"key1": "value1",
"key2": "value2"
}
}
}' | jq
Insert one and generate an embedding automatically:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"$vectorize": "Text to use to generate an embedding",
"key1": "value1",
"key2": "value2"
}
}
}' | jq
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Data API command to insert one document in a collection. |
|
|
Contains the details of the record to add. With the exception of reserved fields (
|
|
reserved, multi-type |
An optional identifier for the document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option. |
|
reserved |
An optional reserved property used to store an array of numbers representing a vector embedding. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.
|
|
reserved |
An optional reserved property used to store a string that you want to use to automatically generate an embedding with vectorize.
|
Returns:
A successful response contains the _id
of the inserted document:
{
"status": {
"insertedIds": [
"12"
]
}
}
The insertedIds
content depends on the ID type and how it was generated, for example:
-
"insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]
-
`"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"
For more information, see Document IDs.
Example:
Example with $vector
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertOne": {
"document": {
"purchase_type": "Online",
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"customer": {
"name": "Jim A.",
"phone": "123-456-1111",
"age": 51,
"credit_score": 782,
"address": {
"address_line": "1234 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690045891 },
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car": "BMW 330i Sedan",
"color": "Silver"
},
"Extended warranty - 5 years"
],
"amount": 47601,
"status": "active",
"preferred_customer": true
}
}
}' | jq
Example with $vectorize
curl -sS -L -X "ASTRA_DB_API_ENDPOINT/api/json/v1/default_keyspace/cars_collection" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--header "x-embedding-api-key;" \
--data '{
"insertOne": {
"document": {
"_id": "1",
"purchase_type": "Online",
"$vectorize": "Purchase of a silver BMW sedan in New York.",
"customer": {
"name": "Jim A.",
"phone": "123-456-1111",
"age": 51,
"credit_score": 782,
"address": {
"address_line": "1234 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690045891 },
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car": "BMW 330i Sedan",
"color": "Silver"
},
"Extended warranty - 5 years"
],
"amount": 47601,
"status": "active",
"preferred_customer": true
}
}
}'
Insert many documents
Insert multiple documents into a collection.
For information about the $vector
and $vectorize
reserved fields, see Vector and vectorize.
-
Python
-
TypeScript
-
Java
-
curl
For more information, see the Client reference.
Insert documents with vector embeddings:
response = collection.insert_many(
[
{
"_id": 101,
"name": "John Doe",
"$vector": [.12, .52, .32],
},
{
# ID is generated automatically
"name": "Jane Doe",
"$vector": [.08, .68, .30],
},
],
)
Insert multiple documents and generate vector embeddings automatically:
response = collection.insert_many(
[
{
"name": "John Doe",
"$vectorize": "Text to vectorize for John Doe",
},
{
"name": "Jane Doe",
"$vectorize": "Text to vectorize for Jane Doe",
},
],
)
Parameters:
Name | Type | Summary | ||
---|---|---|---|---|
|
|
An iterable of dictionaries, each a document to insert. Documents may specify their |
||
|
|
If False (default), the insertions can occur in arbitrary order and possibly concurrently. If True, they are processed sequentially. If you don’t need ordered inserts, DataStax recommends setting this parameter to False for faster performance.
|
||
|
|
How many documents to include in a single API request. The default is 50, and the maximum is 100. |
||
|
|
Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions. |
||
|
|
A timeout, in milliseconds, for the operation. If not passed, the collection-level setting is used instead: If you are inserting many documents, this method will require multiple HTTP requests. You may need to increase the timeout duration for the method to complete successfully. |
Returns:
InsertManyResult
- An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.
Example response
InsertManyResult(inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'], raw_results=...)
Example:
from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = client.get_database("API_ENDPOINT")
collection = database.my_collection
collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])
collection.insert_many(
[{"seq": i} for i in range(50)],
concurrency=5,
)
collection.insert_many(
[
{"tag": "a", "$vector": [1, 2]},
{"tag": "b", "$vector": [3, 4]},
]
)
For more information, see the Client reference.
Insert multiple documents with vectors:
const result = await collection.insertMany([
{
_id: '1',
name: 'John Doe',
$vector: [.12, .52, .32],
},
{
name: 'Jane Doe',
$vector: [.08, .68, .30],
},
], {
ordered: true,
});
Insert multiple documents and generate vector embeddings automatically:
const result = await collection.insertMany([
{
name: 'John Doe',
$vectorize: 'Text to vectorize for John Doe',
},
{
name: 'Jane Doe',
$vectorize: 'Text to vectorize for Jane Doe',
},
], {
ordered: true,
});
Parameters:
Name | Type | Summary |
---|---|---|
|
The documents to insert. If any document does not have an |
|
|
The options for this operation. |
Options (InsertManyOptions
):
Name | Type | Summary | ||
---|---|---|---|---|
|
You may set the
|
|||
|
You can set the |
|||
|
Control how many documents are sent with each network request. The default is 50, and the maximum is 100. |
|||
|
The maximum time in milliseconds that the client should wait for the operation to complete. |
Returns:
Promise<InsertManyResult<Schema>>
- A promise that resolves to the inserted IDs.
Example:
import { DataAPIClient, InsertManyError } from '@datastax/astra-db-ts';
// Reference an untyped collection
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
const collection = db.collection('COLLECTION');
(async function () {
try {
// Insert many documents
await collection.insertMany([
{ _id: '1', name: 'John Doe' },
{ name: 'Jane Doe' }, // Will autogen ID
], { ordered: true });
// Insert many with vectors
await collection.insertMany([
{ name: 'John Doe', $vector: [.12, .52, .32] },
{ name: 'Jane Doe', $vector: [.32, .52, .12] },
]);
} catch (e) {
if (e instanceof InsertManyError) {
console.log(e.partialResult);
}
}
})();
Operations on documents are performed at the Collection
level.
Collection is a generic class with the default type of Document
.
You can specify your own type, and the object is serialized by Jackson.
For more information, see the Client reference.
Most methods have synchronous and asynchronous flavors, where the asynchronous version is suffixed by Async
and returns a CompletableFuture
:
// Synchronous
InsertManyResult insertMany(List<? extends DOC> documents);
InsertManyResult insertMany(List<? extends DOC> documents, InsertManyOptions options);
// Asynchronous
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList);
CompletableFuture<InsertManyResult> insertManyAsync(List<? extends DOC> docList, InsertManyOptions options);
Parameters:
Name | Type | Summary |
---|---|---|
|
|
A list of documents to insert.
Documents may specify their |
|
Set the different options for the insert operation. The options are The Java operation As a best practice, try to always provide
The default value of If not provided the default values are |
Returns:
InsertManyResult
- Wrapper with the list of inserted document ids.
Example:
package com.datastax.astra.client.collection;
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.InsertManyOptions;
import com.datastax.astra.client.model.InsertManyResult;
import com.datastax.astra.client.model.InsertOneResult;
import com.fasterxml.jackson.annotation.JsonProperty;
import lombok.AllArgsConstructor;
import lombok.Data;
import java.util.List;
public class InsertMany {
@Data @AllArgsConstructor
public static class Product {
@JsonProperty("_id")
private String id;
private String name;
}
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collectionDoc = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Insert a document
Document doc1 = new Document("1").append("name", "joe");
Document doc2 = new Document("2").append("name", "joe");
InsertManyResult res1 = collectionDoc.insertMany(List.of(doc1, doc2));
System.out.println("Identifiers inserted: " + res1.getInsertedIds());
// Given an existing collection
Collection<Product> collectionProduct = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION2_NAME", Product.class);
// Insert a document with embeddings
InsertManyOptions options = new InsertManyOptions()
.chunkSize(20) // how many process per request
.concurrency(1) // parallel processing
.ordered(false) // allows parallel processing
.timeout(1000); // timeout in millis
InsertManyResult res2 = collectionProduct.insertMany(
List.of(new Product("1", "joe"),
new Product("2", "joe")),
options);
}
}
With insertMany
, you provide an array of document objects.
The document objects have the same format as insertOne
.
The Data API accepts up to 100 documents per insertMany
request.
Insert multiple documents with vectors:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertMany": {
"documents": [
{
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"key1": "value1",
"key2": "value2"
},
{
"$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
"key1": "value3",
"key2": "value4"
},
{
"$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
"key1": "value3",
"key2": "value4"
},
]
"options": {
"ordered": false
}
}
}' | jq
Insert multiple documents and generate vector embeddings automatically:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertMany": {
"documents": [
{
"$vectorize": "text to vectorize for first document",
"key1": "value1",
"key2": "value2"
},
{
"$vectorize": "text to vectorize for second document",
"key1": "value3",
"key2": "value4"
},
{
"$vectorize": "text to vectorize for third document",
"key1": "value3",
"key2": "value4"
},
]
"options": {
"ordered": false
}
}
}' | jq
Parameters:
Name | Type | Summary |
---|---|---|
|
|
Data API command to insert multiple documents. You can insert up to 100 documents per request. |
|
|
Contains the details of the records to add. It is an array of objects where each object represents a document. With the exception of reserved fields (
|
|
reserved multi-type |
An optional identifier for a document. If omitted, the server automatically generates a document ID. You can include identifiers in other fields as well. For more information, see Document IDs and The defaultId option. |
|
reserved |
An optional reserved property used to store an array of numbers representing a vector embedding for a document. Serverless (Vector) databases have specialized handling for vector data, including optimized query performance for similarity search.
|
|
reserved |
An optional reserved property used to store a string that you want to use to automatically generate an embedding for a document.
|
|
|
If false, insertions occur in an arbitrary order with possible concurrency.
If true, insertions occur sequentially.
If you don’t need ordered inserts, DataStax recommends |
Returns:
A successful response contains the _id
of the inserted documents:
{
"status": {
"insertedIds": [
"4",
"7",
"10"
]
}
}
The insertedIds
content depends on the ID type and how it was generated, for example:
-
"insertedIds": [{"$objectId": "6672e1cbd7fabb4e5493916f"}]
-
`"insertedIds": [{"$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739"}]"
For more information, see Document IDs.
Example:
Example request
The following insertMany
request adds three documents to a collection:
curl -sS -L -X POST "ASTRA_DB_API_ENDPOINT/api/json/v1/ASTRA_DB_KEYSPACE/ASTRA_DB_COLLECTION" \
--header "Token: ASTRA_DB_APPLICATION_TOKEN" \
--header "Content-Type: application/json" \
--data '{
"insertMany": {
"documents": [
{
"purchase_type": "Online",
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"customer": {
"name": "Jack B.",
"phone": "123-456-2222",
"age": 34,
"credit_score": 700,
"address": {
"address_line": "888 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690391491 },
"seller": {
"name": "Tammy S.",
"location": "Staten Island NYC"
},
"items": [
{
"car": "Tesla Model 3",
"color": "White"
},
"Extended warranty - 10 years",
"Service - 5 years"
],
"amount": 53990,
"status": "active"
},
{
"purchase_type": "Online",
"$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
"customer": {
"name": "Jill D.",
"phone": "123-456-3333",
"age": 30,
"credit_score": 742,
"address": {
"address_line": "12345 Broadway",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1690564291 },
"seller": {
"name": "Jasmine S.",
"location": "Brooklyn NYC"
},
"items": "Extended warranty - 10 years",
"amount": 4600,
"status": "active"
},
{
"purchase_type": "In Person",
"$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
"customer": {
"name": "Rachel I.",
"phone": null,
"age": 62,
"credit_score": 786,
"address": {
"address_line": "1234 Park Ave",
"city": "New York",
"state": "NY"
}
},
"purchase_date": { "$date": 1706202691 },
"seller": {
"name": "Jon B.",
"location": "Manhattan NYC"
},
"items": [
{
"car": "BMW M440i Gran Coupe",
"color": "Silver"
},
"Extended warranty - 5 years",
"Gap Insurance - 5 years"
],
"amount": 65250,
"status": "active"
}
],
"options": {
"ordered": false
}
}
}' | jq