• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Astra DB Serverless Documentation

    • Overview
      • Release notes
      • Astra DB FAQs
      • Astra DB glossary
      • Get support
    • Getting Started
      • Grant a user access
      • Load and retrieve data
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
      • Connect a driver
      • Build sample apps
      • Use integrations
    • Planning
      • Plan options
      • Database regions
    • Securing
      • Security highlights
      • Security guidelines
      • Default user permissions
      • Change your password
      • Reset your password
      • Authentication and Authorization
      • Astra DB Plugin for HashiCorp Vault
    • Connecting
      • Connecting private endpoints
        • AWS Private Link
        • Azure Private Link
        • GCP Private Endpoints
        • Connecting custom DNS
      • Connecting Change Data Capture (CDC)
      • Connecting CQL console
      • Connect the Spark Cassandra Connector to Astra
      • Drivers for Astra DB
        • Connecting C++ driver
        • Connecting C# driver
        • Connecting Java driver
        • Connecting Node.js driver
        • Connecting Python driver
        • Drivers retry policies
      • Connecting Legacy drivers
      • Get Secure Connect Bundle
    • Migrating
      • Components
      • FAQs
      • Preliminary steps
        • Feasibility checks
        • Deployment and infrastructure considerations
        • Create target environment for migration
        • Understand rollback options
      • Phase 1: Deploy ZDM Proxy and connect client applications
        • Set up the ZDM Proxy Automation with ZDM Utility
        • Deploy the ZDM Proxy and monitoring
        • Configure Transport Layer Security
        • Connect client applications to ZDM Proxy
        • Leverage metrics provided by ZDM Proxy
        • Manage your ZDM Proxy instances
      • Phase 2: Migrate and validate data
      • Phase 3: Enable asynchronous dual reads
      • Phase 4: Change read routing to Target
      • Phase 5: Connect client applications directly to Target
      • Troubleshooting
        • Troubleshooting tips
        • Troubleshooting scenarios
      • Glossary
      • Contribution guidelines
      • Release Notes
    • Managing
      • Managing your organization
        • User permissions
        • Pricing and billing
        • Audit Logs
        • Bring Your Own Key
          • BYOK AWS Astra DB console
          • BYOK GCP Astra DB console
          • BYOK AWS DevOps API
          • BYOK GCP DevOps API
        • Configuring SSO
          • Configure SSO for Microsoft Azure AD
          • Configure SSO for Okta
          • Configure SSO for OneLogin
      • Managing your database
        • Create your database
        • View your databases
        • Database statuses
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
        • Monitor your databases
        • Export metrics to third party
          • Export metrics via Astra Portal
          • Export metrics via DevOps API
        • Manage access lists
        • Manage multiple keyspaces
        • Using multiple regions
        • Terminate your database
      • Managing with DevOps API
        • Managing database lifecycle
        • Managing roles
        • Managing users
        • Managing tokens
        • Managing BYOK AWS
        • Managing BYOK GCP
        • Managing access list
        • Managing multiple regions
        • Get private endpoints
        • AWS PrivateLink
        • Azure PrivateLink
        • GCP Private Service
    • Astra CLI
    • Astra Block
      • Quickstart
      • FAQ
      • Data model
      • About NFTs
    • Developing with Stargate APIs
      • Develop with REST
      • Develop with Document
      • Develop with GraphQL
        • Develop with GraphQL (CQL-first)
        • Develop with GraphQL (Schema-first)
      • Develop with gRPC
        • gRPC Rust client
        • gRPC Go client
        • gRPC Node.js client
        • gRPC Java client
      • Develop with CQL
      • Tooling Resources
      • Node.js Document API client
      • Node.js REST API client
    • Stargate QuickStarts
      • Document API QuickStart
      • REST API QuickStart
      • GraphQL API CQL-first QuickStart
    • API References
      • DevOps REST API v2
      • Stargate Document API v2
      • Stargate REST API v2
  • DataStax Astra DB Serverless Documentation
  • Connecting
  • Connecting Change Data Capture (CDC)

CDC for Astra DB

CDC for Astra DB automatically captures changes in real time, de-duplicates the changes, and streams the clean set of changed data into Astra Streaming where it can be processed by client applications or sent to downstream systems.

Astra Streaming processes data changes via a Pulsar topic. By design, the Change Data Capture (CDC) component is simple, with a 1:1 correspondence between the table and a single Pulsar topic.

This doc will show you how to create a CDC connector for your Astra DB deployment and send change data to an Elasticsearch sink.

Enabling CDC for Astra DB will result in increased costs based on your Astra Streaming usage. See Astra Streaming pricing for Astra Streaming pricing and CDC for Astra DB for CDC metering rates.

Supported data structures

The following Cassandra CQL 3.x data types (with the associated AVRO type or logical-type) are supported for CDC for Astra DB:

  • ascii (string)

  • bigint (long)

  • blob (bytes)

  • boolean (boolean)

  • counter (long)

  • date (int)

  • decimal (cql_decimal)

  • double (double)

  • duration (cql_duration)

  • float (float)

  • inet (string)

  • int (int)

  • list (array)

  • map (map, only string-type keys are supported)

  • set (array)

  • smallint (int)

  • text (string)

  • time (long)

  • timestamp (long)

  • timeuuid (string)

  • tinyint (int)

  • uuid (string)

  • varchar (string)

  • varint (cql_varint / bytes)

Cassandra static columns are supported:

  • On row-level updates, static columns are included in the message value.

  • On partition-level updates, the clustering keys are null in the message key. The message value only has static columns on INSERT/UPDATE operations.

For columns using data types that are not supported, the data types are omitted from the events sent to the data topic. If a row update contains both supported and unsupported data types, the event will include only columns with supported data types.

AVRO interpretation

Astra DB keys are strings, while CDC produces AVRO messages which are structures. The conversion for some AVRO structures requires additional tooling that can result in unexpected output.

The table below describes the conversion of AVRO logical types. The record type is a schema containing the listed fields.

Table 1. AVRO complex types
Name AVRO type Fields Explanation

collections

array

lists, sets

Sets and Lists are treated as AVRO type array, with the attribute items containing the schema of the array’s items.

decimal

record

BIG_INT, DECIMAL_SCALE

The Cassandra DECIMAL type is converted to a record with the cql_decimal logical type

duration

record

CQL_DURATION_MONTHS, CQL_DURATION_DAYS, CQL_DURATION_NANOSECONDS

The Cassandra DURATION type is converted to a record with the cql_duration logical type

maps

map

The Cassandra MAP type is converted to the AVRO map type, but the keys are converted to strings.
For complex types, the key is represented in JSON.

Limitations

CDC for Astra DB has the following limitations:

  • Does not manage table truncates.

  • Does not sync data available before starting the CDC agent.

  • Does not replay logged batches.

  • Does not manage time-to-live.

  • Does not support range deletes.

  • CQL column names must not match a Pulsar primitive type name (ex: INT32).

  • Does not support multi-region.

Creating a tenant and a topic

  1. In astra.datastax.com, select Create a Streaming Tenant.

  2. Enter the name for your new streaming tenant and select a provider.

    Create new tenant
  3. Select Create Tenant.

Use the default persistent and non-partitioned topic.

Astra Streaming CDC can only be used in a region that supports both Astra Streaming and Astra DB. See Regions for more information.

Creating a table

  1. In your database, create a table with a primary key column:

    CREATE TABLE IF NOT EXISTS <keyspacename>.tbl1 (key text PRIMARY KEY, c1 text);
  2. Confirm you created your table:

    • CQLSH

    • Result

    select * from ks1.tbl1;
    token@cqlsh> select * from ks1.tbl1;
    
     key | c1
    -----+----
    
    (0 rows)
    token@cqlsh>

Connecting to CDC for Astra DB

  1. Select the CDC tab in your database dashboard.

  2. Select Enable CDC.

  3. Complete the fields to connect CDC.

    Enable CDC
  4. Select Enable CDC. Once created, your CDC connector will appear:

    Confirm CDC Created
  5. Enabling CDC creates a new astracdc namespace with two new topics, data- and log-. The log- topic consumes schema changes, processes them, and then writes clean data to the data- topic. The log- topic is for CDC functionality and should not be used. The data- topic can be used to consume CDC data in Astra Streaming.

Connecting Elasticsearch sink

After creating your CDC connector, connect an Elasticsearch sink to it. DataStax recommends using the default Astra Streaming settings.

  1. Select the cdc-enabled table from the database CDC tab and click Add Elastic Search Sink to enforce the default settings.

  2. Select the corresponding data topic for the chosen table. The topic name will look something like this: data-64b406e3-28ec-4eaf-a802-69ade0415b58-ks1.tbl1.

  3. Use your Elasticsearch deployment to complete the fields. To find your Elasticsearch URL, navigate to your deployment within the Elastic Common Schema (ECS). Copy the Elasticsearch endpoint to the Elastic Search URL field.

    Find ECS URL
  4. Complete the remaining fields.

    Most values will auto-populate. These values are recommended:

    • Ignore Record Key as false

    • Null Value Action as DELETE

    • Enable Schema as true

      Connect ECS Sink
  5. When the fields are completed, select Create.

If creation is successful, <sink-name> created successfully appears at the top of the screen. You can confirm your new sink was created in the Sinks tab.

ECS Created

Sending messages

Let’s process some changes with CDC.

  1. Go to the CQL console.

  2. Modify the table you created.

    INSERT INTO <keyspacename>.tbl1 (key,c1) VALUES ('32a','bob3123');
    INSERT INTO <keyspacename>.tbl1 (key,c1) VALUES ('32b','bob3123b');
  3. Confirm the changes you’ve made:

    token@cqlsh> select * from ks1.tbl1;
    
     key | c1
    -----+----------
     32a |  bob3123
     32b | bob3123b
    
    (2 rows)

Confirming ECS is receiving data

To confirm ECS is receiving your CDC changes, issue a curl GET request to your ECS deployment.

  1. Get your index name from your ECS sink tab:

    ECS Index
  2. Issue your curl GET request with your Elastic username, password, and index name:

    curl  -u <username>:<password>  \
       -XGET "https://asdev.es.westus2.azure.elastic-cloud.com:9243/<index_name>/_search?pretty"  \
       -H 'Content-Type: application/json'

    If you’re using a trial account, the username is elastic.

You will receive a JSON response with your changes to the index, which confirms Astra Streaming is sending your CDC changes to your ECS sink.

{
    "_index" : "index.tbl1",
    "_type" : "_doc",
    "_id" : "32a",
    "_score" : 1.0,
    "_source" : {
        "c1" : "bob3123"
    }
},
{
    "_index" : "index.tbl1",
    "_type" : "_doc",
    "_id" : "32b",
    "_score" : 1.0,
    "_source" : {
        "c1" : "bob3123b"
    }
}

What’s next?

For more on Astra Streaming, Browse the Astra Streaming FAQ. or Pulsar clients with Astra Streaming.

Connecting custom DNS Connecting CQL console

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage