• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Streaming Home

Streaming Learning

    • Pulsar IO
      • Sinks
        • Astra DB
        • Elasticsearch
        • JDBC PostgreSQL
        • JDBC MariaDB
        • JDBC Clickhouse
        • JDBC SQLite
        • Kafka
        • Kinesis
        • Snowflake
      • Sources
        • Data Generator
        • Debezium MongoDB
        • Debezium MySQL
        • Debezium Oracle
        • Debezium PostgreSQL
        • Debezium SQL Server
        • Kafka
        • Kinesis
    • Transform functions
      • Cast
      • Compute
      • Deploy transform function in sink
      • Drop fields
      • Drop
      • Flatten
      • Merge KeyValue
      • Unwrap KeyValue
    • Real-time Processing
      • Real-time data pipelines with DataStax Astra and Decodable
    • Pulsar extensions
      • Starlight for Kafka
      • Starlight for RabbitMQ
      • Starlight for JMS
  • Streaming Learning
  • Real-time Processing
Edit this Page

Real-time data pipelines with DataStax Astra and Decodable

This guide presents a hands-on approach for defining the objects that make up a real-time data processing pipeline. You’ll create and configure an Astra streaming tenant and an Astra database, connect them with data processing pipelines in Decodable, and send a single data record through to validate your real-time data pipeline.
For extra credit, we’ll demonstrate processing under load with a bulk of data.

This guide uses the Astra and Decodable UIs in your web browser, so no terminal or scripting is required! You just need a safe place to temporarily store access tokens.

Architecture

Before we get started on our journey, let’s discuss the objects we’re creating and why we need to create them.
We want to build a pipeline that takes in raw web click data, breaks it into queryable values, saves the data, and filters for certain values. Both the parsed click data and the filtered data will be saved. We will use Decodable’s real-time stream processing (powered by Apache Flink) as well as DataStax’s Astra platform (powered by Apache Pulsar and Apache Cassandra). This pipeline is intended to be production ready, because we’re using cloud-based services that are automatically setting values for scaling, latency, and security.

The pipeline components are outlined below.

Real-time data pipelines with DataStax Astra and Decodable

E-Commerce Site Clicks

  • Where the data comes from

DataStax Astra

  • All Clicks Topic: a collection of messages with normalized click data

  • Product Clicks Topic: a collection of messages with normalized and filtered click data

  • All Clicks Sink: a function that writes message data to a certain DB table

  • Product Clicks Sink: a function that writes message data to a certain DB table

  • Cassandra: data store

Decodable

  • HTTP Connection: a managed endpoint for posting click data

  • Raw Click Stream: the flow of click data that other objects can “listen” to

  • Click Normalization Pipeline: a SQL based pipeline that takes in raw click data, parses certain pieces, gives context to other data, and transforms some data

  • All Clicks Stream: the flow of normalized click data that other objects can “listen” to

  • Astra Streaming Connector: a sink objects can publish data to, which will be transformed into a Pulsar message and produced to a given topic

  • Product Clicks Pipeline: a SQL based pipeline that takes normalized data and filters for only clicks associated with a product

  • Product Clicks Stream: the flow of filtered product click data that other objects can “listen” to

Prerequisites

You will need the following prerequisites in place to complete this guide:

  • Astra (free) account - Sign up now↗

  • Decodable (free) account - Sign up now↗

This guide stays within the free tiers of both Astra and Decodable. You won’t need a credit card for any of this guide.

Getting Started

The guide is broken into a few milestones. You’ll want to follow these milestones in order for everything to work.

  1. Real-time data pipeline Astra objects

    In this guide, you will create a new streaming tenant in Astra Streaming with a namespace and topics. Then, you’ll create a database in Astra DB, and hook the streaming topics and database together with a sink connector.

  2. Real-time data pipeline Decodable objects

    In this guide, you will create pipelines for processing incoming data and connectors that bond a Decodable stream of data with the Astra Streaming topics created in step 1.

  3. Putting the real-time data pipeline to work

    This is where the magic happens! In this guide, you will start the processing pipelines, send a single record of data through them, and then validate everything happened as expected. For extra credit, you are also given the opportunity to put the processing under load with a bulk of data.

  4. Debugging and cleaning up the real-time data pipeline

    This final milestone helps with debugging the pipelines in case something doesn’t go quite right. You are also given instructions on how to tear down and clean up all the objects previously created, because we’re all about being good citizens of the cloud.

Click the first guide above to get started!

Unwrap KeyValue Pulsar extensions

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage