Start the DataStax Connector in distributed mode

Run an instance of the connector with a worker in distributed mode.

Use distributed mode with multiple workers in production environments to allow multiple instances of the DataStax Apache Kafka Connector to run as a group on one or more workers. Distributed mode automatically balances workload, allows dynamic scale up or down, and offers fault tolerance for active tasks, configuration, and offset commit data.

The DataStax Connector download package contains a sample distributed mode sample configuration file (cassandra-sink-distributed.json.sample).

Tip: See the Apache Kafka documentation for Distributed mode.

cassandra-sink-distributed.json.sample

  • The cassandra-sink-distributed.json.sample file is located in the conf directory of the DataStax Apache Kafka Connector distribution package.

Prerequisites

  1. Download and install the DataStax Apache Kafka Connector.
  2. Configure the distributed worker configuration file connect-distributed.properties to fit your needs. Use this example from DataStax as a starting point.

    Specify the converter for the key.converter and value.converter properties that matches the form of your Kafka data. See Configuring converters in the Confluent documentation for more information on these properties.

Procedure

  1. From the directory where you installed Apache Kafka, start the distributed worker:
    bin/connect-distributed.sh config/connect-distributed.properties
    The worker startup process outputs a large number of informational messages. The following message displays after the process completes:
    [2019-10-13 19:49:25,385] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:852)
  2. Register the connector configuration with the distributed worker:
    curl -X POST -H "Content-Type: application/json" -d @dse-sink.json "http://ip:port/connectors"

    ip and port are the IP address and port number of the Kafka worker.

    Use the same port as the rest.port parameter set in connect-distributed.properties. The default port is 8083.

    Note: You configured the dse-sink.json or dse-sink.properties file when installing the DataStax Apache Kafka Connector.