BATCH

Write multiple DML statements.

Synopsis

Apache Cassandra™ 2.1 and later

BEGIN UNLOGGED  BATCH
  USING TIMESTAMP timestamp
  dml_statement;
  dml_statement;
  ...
APPLY BATCH;

dml_statement is:

INSERT
UPDATE
DELETE

Table 1. Legend
Uppercase means literal Lowercase means not literal Italics mean optional The pipe (\|) symbol means OR or AND/OR Ellipsis (...) means repeatable A semicolon that terminates CQL statements is not included in the synopsis.

Description

A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the statements in the batch. Batching multiple statements can save network exchanges between the client/server and server coordinator/replicas. However, because of the distributed nature of Cassandra, spread requests across nearby nodes as much as possible to optimize performance. Using batches to optimize performance is usually not successful, as described in Using and misusing batches section. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."

Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will. To achieve atomicity, Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty for atomicity. If you do not want to incur this penalty, prevent Cassandra from writing to the batchlog system by using the UNLOGGED option: BEGIN UNLOGGED BATCH

Although an atomic batch guarantees that if any part of the batch succeeds, all of it will, no other transactional enforcement is done at the batch level. For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. However, transactional row updates within a partition key are isolated: clients cannot read a partial update.

Statement order does not matter within a batch; Cassandra applies all rows using the same timestamp. Use client-supplied timestamps to achieve a particular order.

Using a timestamp

BATCH supports setting a client-supplied timestamp, an integer, in the USING clause with one exception: if a DML statement in the batch contains a compare-and-set (CAS) statement, such as the following statement, do not attempt to use a timestamp:

INSERT INTO users (id, lastname) VALUES (999, 'Sparrow')  IF NOT EXISTS

The timestamp applies to all statements in the batch. If not specified, the current time of the insertion (in microseconds) is used. The individual DML statements inside a BATCH can specify a timestamp if one is not specified in the USING clause.

For example, specify a timestamp in an INSERT statement.

BEGIN BATCH
  INSERT INTO purchases (user, balance) VALUES ('user1', -8) USING TIMESTAMP 19998889022757000;
  INSERT INTO purchases (user, expense_id, amount, description, paid)
    VALUES ('user1', 1, 8, 'burrito', false);
APPLY BATCH;

Verify that balance column has the client-provided timestamp.

SELECT balance, WRITETIME(balance) FROM PURCHASES;

 balance | writetime_balance
---------+-------------------
      -8 | 19998889022757000

Batching conditional updates

Only updates made to the same partition can be included in the batch because the underlying Paxos implementation works at the granularity of the partition. You can group updates that have conditions with those that do not, but when a single statement in a batch uses a condition, the entire batch is committed using a single Paxos proposal, as if all of the conditions contained in the batch apply. This example shows batching of conditional updates:

The statements for inserting values into purchase records use the IF conditional clause.

BEGIN BATCH
  INSERT INTO purchases (user, balance) VALUES ('user1', -8) IF NOT EXISTS;
  INSERT INTO purchases (user, expense_id, amount, description, paid)
    VALUES ('user1', 1, 8, 'burrito', false);
APPLY BATCH;
  
BEGIN BATCH
  UPDATE purchases SET balance = -208 WHERE user='user1' IF balance = -8;
  INSERT INTO purchases (user, expense_id, amount, description, paid)
    VALUES ('user1', 2, 200, 'hotel room', false);
APPLY BATCH;

A continuation of this example shows how to use a static column with conditional updates in batch.

Batching counter updates

In Cassandra 2.1 and later, batches of counters should use UNLOGGED because, unlike other writes in Cassandra, counter updates are not an idempotent operation.

Cassandra 2.1 Example

BEGIN UNLOGGED BATCH
  UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523;
  UPDATE AdminActionCounts SET total = total + 2 WHERE keyalias = 701;
APPLY BATCH;