Change Data Capture (CDC) logging
Change Data Capture (CDC) logging captures and tracks data that has changed. CDC logging is configured per table, with limits on the amount of disk space to consume for storing the CDC logs. CDC logs use the same binary format as the commit log.
In DataStax Enterprise (DSE) 6.8.15 and prior releases, when all tables having mutations in a completed commitlog segment are flushed, a hard link to that commitlog segment is created in the cdc_raw_directory
.
This action makes the CDC-enabled mutations available.
Starting with DSE v6.8.16, DataStax CDC for Apache Cassandra® is available. DataStax CDC for Apache Cassandra is open-source software (OSS) that sends Cassandra mutations for tables having Change Data Capture (CDC) enabled to Luna Streaming or Apache Pulsar™. In turn, these applications can write the data to platforms such as Elasticsearch® or Snowflake®. For more information, see the DataStax CDC for Apache Cassandra documentation. |
Starting in DSE v6.8.16, upon CommitLogSegment
creation, a hard-link to the segment is created in the directory specified by the cdc_raw_directory
property in cassandra.yaml
.
On segment fsync
to disk, if CDC data is present anywhere in the segment, then a <segment_name>_cdc.idx
file is also created with the integer offset of how much data in the original segment is persisted to disk.
Upon final segment flush, a second line with the human-readable word "COMPLETED" is added to the <segment_name>_cdc.idx
file. The appended word indicates that Cassandra has completed all processing on the file.
Using an index file, rather than encouraging clients to parse the log in real-time memory, is preferable because the mapped handle may refer to kernel buffer data that is not yet persisted to disk.
Parsing only up to the listed offset in the <segment_name>_cdc.idx
file ensures that you only parse CDC data for durable data.
In all DSE 5.1.x to 6.8.x releases, after the disk space limit is reached, CDC-enabled tables reject writes until space is freed.
Prerequisites
Before enabling CDC logging, define a plan for moving and consuming the CDC log information.
DataStax recommends storing the CommitLogSegment
and the CDC log directories on a physical device for that is mounted separate from the data directories. Keep the log directories in separate sub-folders that are not nested.
A hard link creation to the
|
Enable CDC logging
-
Enable CDC logging and configure CDC directories and space in
cassandra.yaml
.For example, to enable CDC logging with default values:
cdc_enabled: true cdc_total_space_in_mb: 4096 cdc_free_space_check_interval_ms: 250 cdc_raw_directory: /var/lib/cassandra/cdc_raw
-
To enable CDC logging for a database table, create or alter the table with the table property.
For example, to enable CDC logging on the cycling table:
ALTER TABLE cycling WITH cdc=true;