Using and misusing batches

When to use batches.

Batches are often mistakenly used in an attempt to optimize performance. Unlogged batches require the coordinator to manage inserts, which can place a heavy load on the coordinator node. If other nodes own partition keys, the coordinator node needs to deal with a network hop, resulting in inefficient delivery. Use unlogged batches when making updates to the same partition key.

Using a primary key of (date, timestamp) for example, this unlogged batch resolves to only one write internally, regardless of the number of writes, assuming all have the same date value.

  INSERT INTO sensor_readings (date, time, reading) values (20140910,'2014-09-10T11:00:00.00+0000', 6335.2); 
  INSERT INTO sensor_readings (date, time, reading) values (20140910,'2014-09-10T11:00:15.00+0000', 5222.2); 

The coordinator node might also need to work hard to process a logged batch while maintaining consistency between tables. For example, upon receiving a batch, the coordinator node sends batch logs to two other nodes. In the event of a coordinator failure, the other nodes retry the batch. The entire cluster is affected. Use a logged batch to synchronize tables, as shown in this example:

  UPDATE users
    SET state = 'TX'
    WHERE user_uuid = 8a172618-b121-4136-bb10-f665cfc469eb;
  UPDATE users_by_ssn 
    SET state = 'TX'
    WHERE ssn = '888-99-3987';

For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."