Understanding the architecture

Components of a DataStax Apache Kafka Connector implementation.

The DataStax Apache Kafka™ Connector is deployed on the Kafka Connect Worker nodes and runs within the worker JVM. The Kafka Connect Worker Framework handles automatic rebalancing of tasks when new nodes are added and also ships with a built-in REST API for operator actions. Running the connector in this framework enables multiple DataStax connector instances to share the load and to scale horizontally when run in Distributed Mode. The diagram below shows how the DataStax Connector fits into the Kafka ecosystem.

The environment is compromised of the following components:
  • Data sources - Original source of the data, such as databases, applications, and other services like Salesforce and Twitter.
  • Kafka platform
    • Kafka brokers - Responsible for storing Kafka topics.
    • Kafka connect workers - The nodes running the Kafka connect framework that run producer and consumer plug-ins (Kafka connectors).
      • Source connectors - Push messages (data) from the original sources to Kafka brokers.
      • Sink connectors - Workers running one or more instances of the DataStax Kafka Connector, which pulls messages from Kafka topics and writes them to a database table on the DataStax platform using the DataStax Enteprise Java driver.
  • DataStax platform - Connector writes to DataStax Enterprise (DSE) or DataStax Distribution of Apache Cassandra™ (DDAC) nodes.
    Note: All nodes in each cluster must be uniformly licensed to use the same subscription. For example, if a cluster contains 5 nodes, all 5 must be DDAC nodes, or all 5 must be DSE nodes.