Change Data Capture (CDC) logging
Change Data Capture (CDC) logging captures and tracks data that has changed. CDC logging is configured per table, with limits on the amount of disk space to consume for storing the CDC logs. CDC logs use the same binary format as the commit log.
In DataStax Enterprise (DSE) 6.8.15 and prior releases, when all tables having mutations in a completed commitlog segment are flushed, a hard link to that commitlog segment is created in the cdc_raw
directory.
This action makes the CDC-enabled mutations available.
For DSE 6.8.16 and later, DataStax CDC for Apache Cassandra® is available. DataStax CDC for Apache Cassandra® is open-source software (OSS) that sends Cassandra mutations for tables having Change Data Capture (CDC) enabled to Luna Streaming or Apache Pulsar™, which in turn can write the data to platforms such as Elasticsearch® or Snowflake®. For more, see the DataStax CDC for Apache Cassandra® documentation. |
Starting in DSE 6.8.16, upon CommitLogSegment creation, a hard-link to the segment is created in the directory specified by the cdc_raw_directory
property in cassandra.yaml
.
On segment fsync
to disk, if CDC data is present anywhere in the segment, a [segment_name]_cdc.idx
file is also created with the integer offset of how much data in the original segment is persisted to disk.
Upon final segment flush, a second line with the human-readable word "COMPLETED" will be added to the _cdc.idx
file indicating that Cassandra has completed all processing on the file.
We use an index file, rather than encouraging clients to parse the log in real-time memory, because the mapped handle may refer to kernel buffer data that is not yet persisted to disk.
Parsing only up to the listed offset in the _cdc.idx
file will ensure that you only parse CDC data for data that is durable.
In all DSE 5.1.x to 6.8.x releases, after the disk space limit is reached, CDC-enabled tables reject write until space is freed.
Prerequisites
Before enabling CDC logging, define a plan for moving and consuming the CDC log information. DataStax recommends a physical device for the CDC log that is separate from the data directories.
Procedure
-
Enable CDC logging and configure CDC directories and space in
cassandra.yaml
.For example, to enable CDC logging with default values:
cdc_enabled: true cdc_total_space_in_mb: 4096 cdc_free_space_check_interval_ms: 250 cdc_raw_directory: /var/lib/cassandra/cdc_raw
-
To enable CDC logging for a database table, create or alter the table with the table property.
For example, to enable CDC logging on the cycling table:
ALTER TABLE cycling WITH cdc=true;