Document IDs
Documents in a collection are always identified by an ID that is unique within the collection.
This identifier is stored in the reserved field _id
.
There are multiple types of document identifiers, such as string, integer, or datetime; however, the uuid
and ObjectId
types are recommended.
The Data API supports uuid
identifiers up to version 8 and ObjectId
identifiers as provided by the bson
library.
Default document IDs
When you insert a document into a collection, you can either pass an explicit _id
or use an automatically generated ID.
The collection’s defaultId
option controls how the Data API allocates an _id
for any document that doesn’t otherwise specify an _id
when added to a collection.
After you create a collection, you can’t change the |
If you omit the defaultId
option on createCollection
, the default type is uuid
.
This means that the server generates a random stringified UUIDv4 as the _id
for any document without an explicit _id
field.
This enables backwards compatibility with Data API versions 1.0.2 and earlier.
If you include the defaultId
option with createCollection
, you must specify one of the following case-sensitive ID types:
-
objectId
: Each document’s generated_id
is anobjectId
. -
uuidv6
: Each document’s generated_id
is a version 6 UUID. This is field-compatible with version 1 time UUIDs, and it supports lexicographical sorting. -
uuidv7
: Each document’s_id
is a version 7 UUID. This is designed as a replacement for version 1 time UUID, and it is recommended for use in new systems. -
uuid
: Each document’s generated_id
is a version 4 random UUID. This type is analogous to theuuid
type and functions in Apache Cassandra®.
For examples of setting the default ID when creating a collection, see Create a collection.
Set document IDs when inserting documents
When you use the Data API to add documents to a collection, the _id
field is optional.
If you omit the _id
field, then the server generates a unique identifier for each document based on the collection’s default ID type.
If you provide an explicit _id
value, then the server uses this value instead of generating an ID.
If explicitly defined, the _id
field must be a top-level document property.
_id
cannot be nested within another property.
Benefits of automatically generated document IDs
There are advantages to using generated document IDs instead of manual document IDs. For example, the advantages of generated UUIDv7 document IDs include the following:
-
Uniqueness across the database: A generated
_id
value is designed to be globally unique across the entire database. This uniqueness is achieved through a combination of timestamp, machine identifier, process identifier, and a sequence number. Explicitly numbering documents might lead to clashes unless carefully managed, especially in distributed systems. -
Automatic generation: The
_id
values are automatically generated by Astra DB Serverless. This means you won’t have to worry about creating and maintaining a unique ID system, reducing the complexity of the code and the risk of errors. -
Timestamp information: A generated
_id
value includes a timestamp as its first component, representing the document’s creation time. This can be useful for tracking when a document was created without needing an additional field. In particular, typeuuidv7
values provide a high degree of granularity (milliseconds) in timestamps. -
Avoids manual sequence management: Managing sequential numeric IDs manually can be challenging, especially in environments with high concurrency or distributed systems. There’s a risk of ID collision or the need to lock tables or sequences to generate a new ID, which can affect performance. Generated
_id
values are designed to handle these issues automatically.While numeric
_id
values might be simpler and more human-readable, the benefits of using generated_id
values make it a superior choice for most applications, especially those that have many documents.
Other document identifiers
Regardless of the defaultId
setting, the Data API honors document identifiers of any type, anywhere in a document, that you explicitly provide at any time:
-
You can include identifiers anywhere in a document, not only in the
_id
field. -
You can include different types of identifiers in different parts of the same document.
-
You can define identifiers at any time, such as when inserting or updating a document.
-
You can use any of a document’s identifiers for filter clauses and update/replace operations, just like any other data type.
-
Python
-
TypeScript
-
Java
-
curl
To use and generate identifiers, astra-db-ts provides the UUID
and ObjectId
classes.
These are not the same as those exported from the bson
or uuid
libraries.
Instead, these are custom classes that you must import from the astra-db-ts
package:
import { UUID, ObjectId } from '@datastax/astra-db-ts';
To generate new identifiers, you can use UUID.v1()
, UUID.v4()
, UUID.v6()
, UUID.v7()
, or new ObjectId()
.
You can also use the uuid
and oid
shorthand methods.
import { DataAPIClient, UUID, ObjectId, uuid, oid } from '@datastax/astra-db-ts';
// Schema for the collection
interface Person {
_id: UUID | ObjectId;
name: string;
friendId?: UUID;
}
// Reference the DB instance
const client = new DataAPIClient('TOKEN');
const db = client.db('ENDPOINT', { keyspace: 'KEYSPACE' });
(async function () {
// Create the collection
const collection = await db.createCollection<Person>('people');
// Insert documents w/ various IDs
await collection.insertOne({ name: 'John', _id: UUID.v4() });
await collection.insertOne({ name: 'Jane', _id: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') });
await collection.insertOne({ name: 'Dan', _id: new ObjectId()});
await collection.insertOne({ name: 'Tim', _id: new ObjectId('65fd9b52d7fabba03349d013') });
await collection.insertOne({ name: 'Amy', _id: uuid('bb3def0c-2ff2-43e1-b346-6cf0e5e36f10') });
await collection.insertOne({ name: 'Beth', _id: uuid.v7() });
await collection.insertOne({ name: 'Lia', _id: oid('67ea409a5e6499dabe0831bc') });
await collection.insertOne({ name: 'Tina', _id: oid() });
// Update a document with a UUID in a non-_id field
await collection.updateOne(
{ name: 'John' },
{ $set: { friendId: new UUID('016b1cac-14ce-660e-8974-026c927b9b91') } },
);
// Find a document by a UUID in a non-_id field
const john = await collection.findOne({ name: 'John' });
const jane = await collection.findOne({ _id: john!.friendId });
// Prints 'Jane 016b1cac-14ce-660e-8974-026c927b9b91 6'
console.log(jane?.name, jane?._id.toString(), (<UUID>jane?._id).version);
})();
All UUID methods return an instance of the same class, which exposes a version
property, if you need to access it.
UUIDs can also be constructed from a string representation of the IDs, if you want to use custom generation.
The Java client defines dedicated classes to support different implementations of UUID
, particularly v6 and v7.
When a unique identifier is retrieved from the server, it is returned as a uuid
, and then it is converted to the appropriate UUID
class, based on the class definition in the defaultId option.
ObjectId
classes are extracted from the BSON package, and they represent the ObjectId
type.
UUIDs from the Java UUID
class are implemented in the UUID v4 standard.
To generate new identifiers, you can use methods like new UUIDv6()
, new UUIDv7()
, or new ObjectId()
:
package com.datastax.astra.client.collections;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.collections.definition.documents.Document;
import com.datastax.astra.client.collections.definition.documents.types.ObjectId;
import com.datastax.astra.client.collections.definition.documents.types.UUIDv6;
import com.datastax.astra.client.collections.definition.documents.types.UUIDv7;
import java.time.Instant;
import java.util.UUID;
import static com.datastax.astra.client.collections.commands.Updates.set;
import static com.datastax.astra.client.core.query.Filters.eq;
public class WorkingWithDocumentIds {
public static void main(String[] args) {
// Given an existing collection
Collection<Document> collection = new DataAPIClient("TOKEN")
.getDatabase("API_ENDPOINT")
.getCollection("COLLECTION_NAME");
// Ids can be different Json scalar
// ('defaultId' options NOT set for collection)
new Document().id("abc");
new Document().id(123);
new Document().id(Instant.now());
// Working with UUIDv4
new Document().id(UUID.randomUUID());
// Working with UUIDv6
collection.insertOne(new Document().id(new UUIDv6()).append("tag", "new_id_v_6"));
UUID uuidv4 = UUID.fromString("018e77bc-648d-8795-a0e2-1cad0fdd53f5");
collection.insertOne(new Document().id(new UUIDv6(uuidv4)).append("tag", "id_v_8"));
// Working with UUIDv7
collection.insertOne(new Document().id(new UUIDv7()).append("tag", "new_id_v_7"));
// Working with ObjectIds
collection.insertOne(new Document().id(new ObjectId()).append("tag", "obj_id"));
collection.insertOne(new Document().id(new ObjectId("6601fb0f83ffc5f51ba22b88")).append("tag", "obj_id"));
collection.findOneAndUpdate(
eq((new ObjectId("6601fb0f83ffc5f51ba22b88"))),
set("item_inventory_id", UUID.fromString("1eeeaf80-e333-6613-b42f-f739b95106e6")));
}
}
When you insert a document, you can omit _id
to automatically generate an ID or you can manually specify an _id
, such as "_id": "12"
.
The following example inserts two documents with manually-defined _id
values.
One document uses the objectId
type, and the other uses the uuid
type.
"insertMany": {
"documents": [
{
"_id": { "$objectId": "6672e1cbd7fabb4e5493916f" },
"$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
"key": "value",
"amount": 53990
},
{
"_id": { "$uuid": "1ef2e42c-1fdb-6ad6-aae4-e84679831739" },
"$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
"key": "value",
"amount": 4600
}
]
}
When you add or update a document, you can include additional identifiers in any document property, other than _id
, just as you would any other data type.