Applies multiple data modification language (DML) statements with atomicity and/or in isolation.
Combines multiple DML statements to achieve atomicity and isolation when targeting a single partition or only atomicity when targeting multiple partitions. A batch applies all DMLs within a single partition before the data is available, ensuring atomicity and isolation. For multiple partition batches, logging can ensure that all DMLs are applied before data is available (isolation), while atomicity is achieved per partition.
A well constructed batch targeting a single partition can reduce client-server traffic and more efficiently update a table with a single row mutation. A batch can also target multiple partitions, when atomicity and isolation is required. Multi-partition batches may decrease throughput and increase latency. Only use a multi-partition batch when there is no other viable option, such as asynchronous statements.
Optionally, a batch can apply a client-supplied timestamp.
BEGIN [UNLOGGED | LOGGED] BATCH [USING TIMESTAMP [epoch_microseconds]] dml_statement [USING TIMESTAMP [epoch_microseconds]]; [dml_statement; ...] APPLY BATCH;
||Variable value. Replace with a user-defined value.|
||Optional. Square brackets (
||Group. Parentheses (
||Or. A vertical bar (
||Repeatable. An ellipsis (
||Single quotation (
||Map collection. Braces (
||Set, list, map, or tuple. Angle brackets (
||End CQL statement. A semicolon (
||Separate the command line options from the command arguments with two hyphens (
||Search CQL only: Single quotation marks (
||Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.|
A batch can contain the following types of dml_statements:
- LOGGED | UNLOGGED
If multiple partitions are involved, batches are logged by default. Running a batch with logging enabled ensures that either all or none of the batch operations will succeed, ensuring atomicity. Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. After Cassandra has successfully written and persisted (or hinted) the rows in the batch, it removes the batchlog data. There is a performance penalty associated with the batchlog, as it is written to two other nodes. Thresholds for warning about or failure due to batch size can be set.
If you do not want to incur a penalty for logging, run the batch operation without using the batchlog table by using the
UNLOGGEDkeyword. Unlogged batching will issue a warning if too many operations or too many partitions are involved. Single partition batch operations are unlogged by default, and are the only unlogged batch operations recommended.
Although a logged batch enforces atomicity (that is, it guarantees if all DML statements in the batch succeed or none do), Cassandra does no other transactional enforcement at the batch level. For example, there is no batch isolation unless the batch operation is writing to a single partition. In multiple partition batch operations, clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. In single partition batch operations, clients cannot read a partial update from any row until the batch is completed.
- USING TIMESTAMPS
Sets the write time for transactions executed in a BATCH.Restriction:
USING TIMESTAMPdoes not support LWT (lightweight transactions), such as DML statements that have an
IF NOT EXISTSclause.
By default, Cassandra applies the same timestamp to all data modified by the batch; therefore statement order does not matter within a batch, thus a batch statement is not very useful for writing data that must be timestamped in a particular order. Use client-supplied timestamps to achieve a particular order.
User-defined timestampSpecify the epoch time in micoseconds after USING TIMESTAMP:
When the time is not specified, Cassandra inserts the current time.
USING TIMESTAMP [epoch_microseconds]
Same timestamp for all DMLsInsert on first line of batch.
BEGIN BATCH USING TIMESTAMP [epoch_microseconds] DML_statement1; DML_statement2; DML_statement3; APPLY BATCH;
Individual transactionsInsert at the end of a DML:
BEGIN BATCH DML_statement1; DML_statement2 USING TIMESTAMP [epoch_microseconds]; DML_statement3; APPLY BATCH;
Applying a client supplied timestamp to all DMLs
Insert meals paid for Vera Adrian using the user-defined date when inserting the records:
BEGIN BATCH USING TIMESTAMP 1481124356754405 INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 2, 13.44, 'Lunch', true); INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 3, 25.00, 'Dinner', true); APPLY BATCH;
SELECT cyclist_name, expense_id, amount, WRITETIME(amount), description, WRITETIME(description), paid,WRITETIME(paid) FROM cycling.cyclist_expenses WHERE cyclist_name = 'Vera ADRIAN';
@ Row 1 ------------------------+------------------ cyclist_name | Vera ADRIAN expense_id | 2 amount | 13.44 writetime(amount) | 1481124356754405 description | Lunch writetime(description) | 1481124356754405 paid | True writetime(paid) | 1481124356754405 @ Row 2 ------------------------+------------------ cyclist_name | Vera ADRIAN expense_id | 3 amount | 25 writetime(amount) | 1481124356754405 description | Dinner writetime(description) | 1481124356754405 paid | False writetime(paid) | 1481124356754405 (2 rows)
If any DML statement in the batch uses compare-and-set (CAS) logic, for example the
following batch with
IF NOT EXISTS, an error is returned:
BEGIN BATCH USING TIMESTAMP 1481124356754405 INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 2, 13.44, 'Lunch', true); INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 3, 25.00, 'Dinner', false) IF NOT EXISTS; APPLY BATCH;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot provide custom timestamp for conditional BATCH"
Batching conditional updates
Batch conditional updates introduced as lightweight transactions. However, a batch containing conditional updates can only operate within a single partition, because the underlying Paxos implementation only works at partition-level granularity. If one statement in a batch is a conditional update, the conditional logic must return true, or the entire batch fails. If the batch contains two or more conditional updates, all the conditions must return true, or the entire batch fails. This example shows batching of conditional updates:
BEGIN BATCH INSERT INTO purchases (user, balance) VALUES ('user1', -8) IF NOT EXISTS; INSERT INTO purchases (user, expense_id, amount, description, paid) VALUES ('user1', 1, 8, 'burrito', false); APPLY BATCH;
BEGIN BATCH UPDATE purchases SET balance = -208 WHERE user='user1' IF balance = -8; INSERT INTO purchases (user, expense_id, amount, description, paid) VALUES ('user1', 2, 200, 'hotel room', false); APPLY BATCH;Conditional batches cannot provide custom timestamps.
DELETEstatements within a conditional batch cannot use
INconditions to filter rows.
A continuation of this example shows how to use a static column with conditional updates in batch.
Batching counter updates
A batch of counters should use the
COUNTER option because, unlike other
writes in Cassandra, a counter update is not an idempotent operation.
BEGIN COUNTER BATCH UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523; UPDATE AdminActionCounts SET total = total + 2 WHERE keyalias = 701; APPLY BATCH;Counter batches cannot include non-counter columns in the DML statements, just as a non-counter batch cannot include counter columns. Counter batch statements cannot provide custom timestamps.