Change Data Capture (CDC) logging

Change Data Capture (CDC) logging captures and tracks data that has changed. CDC logging is configured per table, with limits on the amount of disk space to consume for storing the CDC logs. CDC logs use the same binary format as the commit log.

In DataStax Enterprise (DSE) 6.8.15 and prior releases, when all tables having mutations in a completed commitlog segment are flushed, a hard link to that commitlog segment is created in the cdc_raw_directory. This action makes the CDC-enabled mutations available.

Starting with DSE v6.8.16, DataStax CDC for Apache Cassandra® is available. DataStax CDC for Apache Cassandra is open-source software (OSS) that sends Cassandra mutations for tables having Change Data Capture (CDC) enabled to Luna Streaming or Apache Pulsar™. In turn, these applications can write the data to platforms such as Elasticsearch® or Snowflake®.

For more information, see the DataStax CDC for Apache Cassandra documentation.

Starting in DSE v6.8.16, upon CommitLogSegment creation, a hard-link to the segment is created in the directory specified by the cdc_raw_directory property in cassandra.yaml. On segment fsync to disk, if CDC data is present anywhere in the segment, then a <segment_name>_cdc.idx file is also created with the integer offset of how much data in the original segment is persisted to disk. Upon final segment flush, a second line with the human-readable word "COMPLETED" is added to the <segment_name>_cdc.idx file. The appended word indicates that Cassandra has completed all processing on the file. Using an index file, rather than encouraging clients to parse the log in real-time memory, is preferable because the mapped handle may refer to kernel buffer data that is not yet persisted to disk. Parsing only up to the listed offset in the <segment_name>_cdc.idx file ensures that you only parse CDC data for durable data.

In all DSE 5.1.x to 6.8.x releases, after the disk space limit is reached, CDC-enabled tables reject writes until space is freed.

Prerequisites

Before enabling CDC logging, define a plan for moving and consuming the CDC log information.

DataStax recommends storing the CommitLogSegment and the CDC log directories on a physical device for that is mounted separate from the data directories. Keep the log directories in separate sub-folders that are not nested.

A hard link creation to the CommitLogSegment necessitates having the cdc_raw_directory directory on the same partition as the commitLog directory. If CDC is enabled and the directories are on separate partitions, then DSE fails to start and returns a fatal exception:

 Caused by: java.nio.file.FileSystemException:
    /cassandra/backup/cdc_raw/CommitLog-680-1686048517536.log -> /cassandra/commitlog/commitlog/CommitLog-680-1686048517536.log:
    Invalid cross-device link

Enable CDC logging

  1. Enable CDC logging and configure CDC directories and space in cassandra.yaml.

    For example, to enable CDC logging with default values:

    cdc_enabled: true
    cdc_total_space_in_mb: 4096
    cdc_free_space_check_interval_ms: 250
    cdc_raw_directory: /var/lib/cassandra/cdc_raw
  2. To enable CDC logging for a database table, create or alter the table with the table property.

    For example, to enable CDC logging on the cycling table:

    ALTER TABLE cycling WITH cdc=true;

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com