Change Data Capture (CDC) logging

Change Data Capture (CDC) logging captures changes to data.

Change Data Capture (CDC) logging captures and tracks data that has changed. CDC logging is configured per table, with limits on the amount of disk space to consume for storing the CDC logs. CDC logs use the same binary format as the commit log.

In DataStax Enterprise (DSE) 6.8.15 and prior releases, when all tables having mutations in a completed commitlog segment are flushed, a hard link to that commitlog segment is created in the cdc_raw directory. This action makes the CDC-enabled mutations available.

Starting in DSE 6.8.16, upon CommitLogSegment creation, a hard-link to the segment is created in the directory specified by the cdc_raw_directory property in cassandra.yaml. On segment fsync to disk, if CDC data is present anywhere in the segment, a [segment_name]_cdc.idx file is also created with the integer offset of how much data in the original segment is persisted to disk. Upon final segment flush, a second line with the human-readable word "COMPLETED" will be added to the _cdc.idx file indicating that Cassandra has completed all processing on the file. We use an index file, rather than encouraging clients to parse the log in real-time memory, because the mapped handle may refer to kernel buffer data that is not yet persisted to disk. Parsing only up to the listed offset in the _cdc.idx file will ensure that you only parse CDC data for data that is durable.

In all DSE 5.1.x to 6.8.x releases, after the disk space limit is reached, CDC-enabled tables reject write until space is freed.

CDC directory location

The location of the CDC directory depends on the type of installation:
Package installations /var/lib/cassandra/cdc_raw
Tarball installations /var/lib/cassandra/cdc_raw

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Prerequisites

Before enabling CDC logging, define a plan for moving and consuming the CDC log information. DataStax recommends a physical device for the CDC log that is separate from the data directories.

Procedure

  1. Enable CDC logging and configure CDC directories and space in cassandra.yaml.
    For example, to enable CDC logging with default values:
    cdc_enabled: true
    cdc_total_space_in_mb: 4096
    cdc_free_space_check_interval_ms: 250
    cdc_raw_directory: /var/lib/cassandra/cdc_raw
  2. Optional: To enable CDC logging for a database table, create or alter the table with the table property.
    For example, to enable CDC logging on the cycling table:
    ALTER TABLE cycling WITH cdc=true;