BATCH
Groups multiple modification DML statements into a single statement. BATCH may only contain UPDATE, INSERT, DELETE statements.
The BATCH
statement groups multiple Data Manipulation Language (DML)
statements into a single statement.
A BATCH
may only contain UPDATE
, INSERT
,
or DELETE
statements.
- Whether the batch is logged or unlogged.
- Whether a single partition, or multiple partitions, are modified by the batch.
Regarding isolation and atomicity
BATCH
statement while other remaining statements in the batch are in
progress. If consistency is critical, clients can read data from tables modified by batch
statements using a consistency level of SERIAL
or
LOCAL_SERIAL
. This consistency level will ensure that a failed batch
request returns the most up-to-date result, as any committed batches are replayed first
before the read. SERIAL
or LOCAL_SERIAL
, comes with a high performance
penalty.Logged vs unlogged batches
By default, batches in Apache Cassandra® and DataStax Enterprise (DSE) are logged. A logged batch provides a guarantee that all statements in the group will eventually succeed. To achieve this, the coordinator first writes all the statements in the group to a batch log. The batch log on the coordinator is replicated to two other nodes in the cluster. If the coordinator node fails and the batch is not completed, then one of the other logged batch replicas will take over.
An unlogged batch, as the name suggests, means the group of statements is never stored in the batch log. Nor is the group replicated to other nodes in case of coordinator failure. All statements in an unlogged batch are sent to the coordinator and processed "as is."
Single vs multiple partitions
A batch statement can contain DML statements that target one or a number of partitions. There will be a performance degradation when batches contain statements that target multiple partitions. This degradation would occur because the onus of distributing the statements to the correct replica is now put on the coordinating node. As a result, the benefit of reducing round-trips between the client and server is at the expense of slowing down the coordinator node.
Logged batch statements that target multiple partitions should be used in the rare case where consistency between tables is critical. In this case, the number of partitions targeted by the batch statement should be kept to a minimum. An example of this scenario is where a table has been denormalised and there is a requirement that modifications to both tables must succeed.
Never use unlogged batch statements that target multiple partitions because doing so would place the responsibility
of the statement distribution on the coordinating node. Instead, batch statements should be broken up into their respective statements.
The responsibility of the statement distribution will then be placed on the client application and will reduce the pressure on the
coordinator node.Logged batch statements that target a single partition in the same keyspace
will be converted to an unlogged batch statement. This implementation is an optimisation on the server
side. There is no additional work on the coordinator for batch statements that target a
single partition, which is the recommended use of a BATCH
statement. Meaning, the
statements in the batch are for a single partition in the same table (or only a few
tables) in the same keyspace.
Synopsis
BEGIN [ ( UNLOGGED | COUNTER ) ] BATCH [ USING TIMESTAMP [ epoch_microseconds ] ] dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] ; [ dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] [ ; ... ] ] ; APPLY BATCH ;
Syntax conventions | Description |
---|---|
UPPERCASE | Literal keyword. |
Lowercase | Not literal. |
Italics |
Variable value. Replace with a user-defined value. |
[] |
Optional. Square brackets ( [] ) surround
optional command arguments. Do not type the square brackets. |
( ) |
Group. Parentheses ( ( ) ) identify a group to
choose from. Do not type the parentheses. |
| |
Or. A vertical bar ( | ) separates alternative
elements. Type any one of the elements. Do not type the vertical
bar. |
... |
Repeatable. An ellipsis ( ... ) indicates that
you can repeat the syntax element as often as required. |
'Literal string' |
Single quotation ( ' ) marks must surround
literal strings in CQL statements. Use single quotation marks to
preserve upper case. |
{ key : value
} |
Map collection. Braces ( { } ) enclose map
collections or key value pairs. A colon separates the key and the
value. |
<datatype1,datatype2> |
Set, list, map, or tuple. Angle brackets ( <
> ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma. |
cql_statement; |
End CQL statement. A semicolon ( ; ) terminates
all CQL statements. |
[--] |
Separate the command line options from the command arguments with
two hyphens ( -- ). This syntax is useful when
arguments might be mistaken for command line options. |
' <schema> ... </schema>
' |
Search CQL only: Single quotation marks ( ' )
surround an entire XML schema declaration. |
@xml_entity='xml_entity_type' |
Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files. |
A batch can contain these dml_statements:
- UNLOGGED | COUNTER
-
A batch will be logged unless the
UNLOGGED
flag is used. For important points regarding batches and logging, refer to the sections above:Use the
COUNTER
option for batched counter updates. Unlike other updates, counter updates are not idempotent. - USING TIMESTAMPS
-
Sets the write time for transactions executed in a batch, or the time for an individual statement in the batch.Restriction:
USING TIMESTAMP
does not support LWT (lightweight transactions), such as DML statements that have anIF NOT EXISTS
clause.By default, the database applies the same timestamp to all data modified by the batch. The timestamp can be either the one generated by the server, or the one supplied at the batch level. In this case statements may be applied in an order that is different from the order they were added in the batch statement. This is due to the way timestamp ties are resolved in Cassandra.
Client-supplied timestamps must be used to maintain the ordering of the statements in a batch.
To use the same timestamp for all DML statements, insert on the first line of the batch:BEGIN BATCH USING TIMESTAMP [ epoch_microseconds ] DML_statement1 ; DML_statement2 ; DML_statement3 ; APPLY BATCH ;
For individual statements, insert at the end of the DML:BEGIN BATCH DML_statement1 USING TIMESTAMP [ epoch_microseconds ] ; DML_statement2 USING TIMESTAMP [ epoch_microseconds ] ; DML_statement3 USING TIMESTAMP [ epoch_microseconds ] ; APPLY BATCH ;
Examples
Applying a client supplied timestamp to all DMLs
Insert meals paid for Vera Adrian using the user-defined date when inserting the records:
BEGIN BATCH USING TIMESTAMP 1481124356754405 INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 2, 13.44, 'Lunch', true ); INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 3, 25.00, 'Dinner', true ); APPLY BATCH;
EXPAND ON
SELECT cyclist_name, expense_id, amount, WRITETIME(amount), description, WRITETIME(description), paid, WRITETIME(paid) FROM cycling.cyclist_expenses WHERE cyclist_name = 'Vera ADRIAN';
@ Row 1
------------------------+------------------
cyclist_name | Vera ADRIAN
expense_id | 2
amount | 13.44
writetime(amount) | 1481124356754405
description | Lunch
writetime(description) | 1481124356754405
paid | True
writetime(paid) | 1481124356754405
@ Row 2
------------------------+------------------
cyclist_name | Vera ADRIAN
expense_id | 3
amount | 25
writetime(amount) | 1481124356754405
description | Dinner
writetime(description) | 1481124356754405
paid | False
writetime(paid) | 1481124356754405
(2 rows)
If any DML statement in the batch uses compare-and-set (CAS) logic, for example the
following batch with IF NOT EXISTS
, an error is returned:
BEGIN BATCH USING TIMESTAMP 1481124356754405 INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 2, 13.44, 'Lunch', true ); INSERT INTO cycling.cyclist_expenses ( cyclist_name, expense_id, amount, description, paid ) VALUES ( 'Vera ADRIAN', 3, 25.00, 'Dinner', false ) IF NOT EXISTS; APPLY BATCH;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot provide custom timestamp for conditional BATCH"
Batching conditional updates
Batch conditional updates introduced as lightweight transactions. However, a batch containing conditional updates can only operate within a single partition, because the underlying Paxos implementation only works at partition-level granularity. If one statement in a batch is a conditional update, the conditional logic must return true, or the entire batch fails. If the batch contains two or more conditional updates, all the conditions must return true, or the entire batch fails. This example shows batching of conditional updates:
IF
conditional
clause.BEGIN BATCH INSERT INTO purchases ( user, balance ) VALUES ( 'user1', -8 ) IF NOT EXISTS; INSERT INTO purchases ( user, expense_id, amount, description, paid ) VALUES ( 'user1', 1, 8, 'burrito', false ); APPLY BATCH;
BEGIN BATCH UPDATE purchases SET balance = -208 WHERE user='user1' IF balance = -8; INSERT INTO purchases ( user, expense_id, amount, description, paid ) VALUES ( 'user1', 2, 200, 'hotel room', false ); APPLY BATCH;Conditional batches cannot provide custom timestamps.
UPDATE
and
DELETE
statements within a conditional batch cannot use
IN
conditions to filter rows.A continuation of this example shows how to use a static column with conditional updates in batch.
Batching counter updates
A batch of counters should use the COUNTER
option because, unlike other
writes in DataStax Enterprise, a counter update is not an idempotent operation.
BEGIN COUNTER BATCH UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523; UPDATE AdminActionCounts SET total = total + 2 WHERE keyalias = 701; APPLY BATCH;Counter batches cannot include non-counter columns in the DML statements, just as a non-counter batch cannot include counter columns. Counter batch statements cannot provide custom timestamps.