BATCH

Groups multiple modification DML statements into a single statement. BATCH may only contain UPDATE, INSERT, DELETE statements.

The BATCH statement groups multiple Data Manipulation Language (DML) statements into a single statement.

A BATCH may only contain UPDATE, INSERT, or DELETE statements.

This CQL batching feature has the benefit of reducing round-trips between client and server to apply the statements. However the feature comes with caveats, as discussed in this topic. The behavior of the DML statements in a batch will vary depending on two important factors:
  • Whether the batch is logged or unlogged.
  • Whether a single partition, or multiple partitions, are modified by the batch.

Regarding isolation and atomicity

Batches are only isolated within a single partition. A single partition batch is atomic; meaning, all the DML statements in the batch succeed, or none do. In effect, a single partition batch has the same semantics as a single row write operation.
Note: If there are two different tables in the same keyspace and the two tables have the same partition key, this scenario is considered a single partition batch. There will be a single mutation for each table because each table could have different columns. Batches allow a caller to bundle multiple operations into a single batch request. All the operations are performed by the same coordinator. The best use of a batch request is for a single partition with multiple tables in the same keyspace. Also, batches provide a guarantee that mutations will be applied in a particular order.
When writing to multiple partitions, there is no isolation provided by batches between client programs. Other client programs can read the rows modified by the BATCH statement while other remaining statements in the batch are in progress. If consistency is critical, clients can read data from tables modified by batch statements using a consistency level of SERIAL or LOCAL_SERIAL. This consistency level will ensure that a failed batch request returns the most up-to-date result, as any committed batches are replayed first before the read.
Attention: However, this consistency level, SERIAL or LOCAL_SERIAL, comes with a high performance penalty.
There is no batch rollback functionality, which means that a batch cannot be undone. Before implementing or executing a batch, see Batching inserts, updates, and deletes.

Logged vs unlogged batches

By default, batches in Apache Cassandra® and DataStax Enterprise (DSE) are logged. A logged batch provides a guarantee that all statements in the group will eventually succeed. To achieve this, the coordinator first writes all the statements in the group to a batch log. The batch log on the coordinator is replicated to two other nodes in the cluster. If the coordinator node fails and the batch is not completed, then one of the other logged batch replicas will take over.

An unlogged batch, as the name suggests, means the group of statements is never stored in the batch log. Nor is the group replicated to other nodes in case of coordinator failure. All statements in an unlogged batch are sent to the coordinator and processed "as is."

Single vs multiple partitions

A batch statement can contain DML statements that target one or a number of partitions. There will be a performance degradation when batches contain statements that target multiple partitions. This degradation would occur because the onus of distributing the statements to the correct replica is now put on the coordinating node. As a result, the benefit of reducing round-trips between the client and server is at the expense of slowing down the coordinator node.

Logged batch statements that target multiple partitions should be used in the rare case where consistency between tables is critical. In this case, the number of partitions targeted by the batch statement should be kept to a minimum. An example of this scenario is where a table has been denormalised and there is a requirement that modifications to both tables must succeed.

Never use unlogged batch statements that target multiple partitions because doing so would place the responsibility of the statement distribution on the coordinating node. Instead, batch statements should be broken up into their respective statements. The responsibility of the statement distribution will then be placed on the client application and will reduce the pressure on the coordinator node.Logged batch statements that target a single partition in the same keyspace will be converted to an unlogged batch statement. This implementation is an optimisation on the server side. There is no additional work on the coordinator for batch statements that target a single partition, which is the recommended use of a BATCH statement. Meaning, the statements in the batch are for a single partition in the same table (or only a few tables) in the same keyspace.

Synopsis

BEGIN [ ( UNLOGGED | COUNTER ) ] BATCH 
        [ USING TIMESTAMP [ epoch_microseconds ] ]
        dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] ;
        [ dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] [ ; ... ] ] ;
        APPLY BATCH ;
Table 1. Legend
Syntax conventions Description
UPPERCASE Literal keyword.
Lowercase Not literal.
Italics Variable value. Replace with a user-defined value.
[] Optional. Square brackets ( [] ) surround optional command arguments. Do not type the square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
{ key : value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.
[--] Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrConfig files.

A batch can contain these dml_statements:

UNLOGGED | COUNTER
A batch will be logged unless the UNLOGGED flag is used. For important points regarding batches and logging, refer to the sections above:

Use the COUNTER option for batched counter updates. Unlike other updates, counter updates are not idempotent.

USING TIMESTAMPS
Sets the write time for transactions executed in a batch, or the time for an individual statement in the batch.
Restriction: USING TIMESTAMP does not support LWT (lightweight transactions), such as DML statements that have an IF NOT EXISTS clause.

By default, the database applies the same timestamp to all data modified by the batch. The timestamp can be either the one generated by the server, or the one supplied at the batch level. In this case statements may be applied in an order that is different from the order they were added in the batch statement. This is due to the way timestamp ties are resolved in Cassandra.

Client-supplied timestamps must be used to maintain the ordering of the statements in a batch.

To use the same timestamp for all DML statements, insert on the first line of the batch:
BEGIN BATCH USING TIMESTAMP [ epoch_microseconds ]
                DML_statement1 ;
                DML_statement2 ;
                DML_statement3 ;
                APPLY BATCH ;
For individual statements, insert at the end of the DML:
BEGIN BATCH
                DML_statement1 USING TIMESTAMP [ epoch_microseconds ] ;
                DML_statement2 USING TIMESTAMP [ epoch_microseconds ] ;
                DML_statement3 USING TIMESTAMP [ epoch_microseconds ] ;
                APPLY BATCH ;

Examples

Applying a client supplied timestamp to all DMLs

Insert meals paid for Vera Adrian using the user-defined date when inserting the records:

BEGIN BATCH USING TIMESTAMP 1481124356754405

  INSERT INTO cycling.cyclist_expenses (
    cyclist_name, expense_id, amount, description, paid
  ) VALUES (
    'Vera ADRIAN', 2, 13.44, 'Lunch', true
  );

  INSERT INTO cycling.cyclist_expenses (
    cyclist_name, expense_id, amount, description, paid
  ) VALUES (
    'Vera ADRIAN', 3, 25.00, 'Dinner', true
  );

APPLY BATCH;
Note: Combining two statements for the same partition results in a single table mutation.
View the records vertically:
EXPAND ON
Verify that the timestamps are all the same:
SELECT
   cyclist_name, expense_id,
   amount, WRITETIME(amount),
   description, WRITETIME(description),
   paid, WRITETIME(paid)
 FROM
   cycling.cyclist_expenses
  WHERE
    cyclist_name = 'Vera ADRIAN';
Both records were entered with the same timestamp.
@ Row 1
            ------------------------+------------------
            cyclist_name           | Vera ADRIAN
            expense_id             | 2
            amount                 | 13.44
            writetime(amount)      | 1481124356754405
            description            | Lunch
            writetime(description) | 1481124356754405
            paid                   | True
            writetime(paid)        | 1481124356754405
            
            @ Row 2
            ------------------------+------------------
            cyclist_name           | Vera ADRIAN
            expense_id             | 3
            amount                 | 25
            writetime(amount)      | 1481124356754405
            description            | Dinner
            writetime(description) | 1481124356754405
            paid                   | False
            writetime(paid)        | 1481124356754405
            
            (2 rows)

If any DML statement in the batch uses compare-and-set (CAS) logic, for example the following batch with IF NOT EXISTS, an error is returned:

BEGIN BATCH USING TIMESTAMP 1481124356754405

  INSERT INTO cycling.cyclist_expenses (
    cyclist_name, expense_id, amount, description, paid
  ) VALUES (
    'Vera ADRIAN', 2, 13.44, 'Lunch', true
  );

  INSERT INTO cycling.cyclist_expenses (
    cyclist_name, expense_id, amount, description, paid
  ) VALUES (
    'Vera ADRIAN', 3, 25.00, 'Dinner', false
  ) IF NOT EXISTS;

APPLY BATCH;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot provide custom timestamp for conditional BATCH"

Batching conditional updates

Batch conditional updates introduced as lightweight transactions. However, a batch containing conditional updates can only operate within a single partition, because the underlying Paxos implementation only works at partition-level granularity. If one statement in a batch is a conditional update, the conditional logic must return true, or the entire batch fails. If the batch contains two or more conditional updates, all the conditions must return true, or the entire batch fails. This example shows batching of conditional updates:

The statements for inserting values into purchase records use the IF conditional clause.
BEGIN BATCH

  INSERT INTO purchases (
    user, balance
  ) VALUES (
    'user1', -8
  ) IF NOT EXISTS;

  INSERT INTO purchases (
    user, expense_id, amount, description, paid
  ) VALUES (
    'user1', 1, 8, 'burrito', false
  );

APPLY BATCH;
BEGIN BATCH

  UPDATE purchases
    SET balance = -208
    WHERE user='user1'
    IF balance = -8;

  INSERT INTO purchases (
    user, expense_id, amount, description, paid
  ) VALUES (
    'user1', 2, 200, 'hotel room', false
  );

APPLY BATCH;
Conditional batches cannot provide custom timestamps. UPDATE and DELETE statements within a conditional batch cannot use IN conditions to filter rows.

A continuation of this example shows how to use a static column with conditional updates in batch.

Batching counter updates

A batch of counters should use the COUNTER option because, unlike other writes in DataStax Enterprise, a counter update is not an idempotent operation.

BEGIN COUNTER BATCH

  UPDATE UserActionCounts
    SET total = total + 2
    WHERE keyalias = 523;

  UPDATE AdminActionCounts
    SET total = total + 2
    WHERE keyalias = 701;

APPLY BATCH;
Counter batches cannot include non-counter columns in the DML statements, just as a non-counter batch cannot include counter columns. Counter batch statements cannot provide custom timestamps.