Start the DataStax Connector in distributed mode

Run an instance of the connector with a worker in distributed mode.

To collect large amounts of data, use distributed mode with multiple workers. Distributed mode handle automatic balancing of work, allows automatic scaling up or down dynamically, and offers fault tolerance for active tasks and offset commit data.
Tip: See the Apache Kafka documentation for Distributed mode.

Procedure

  1. Start the distributed worker:
    bin/connect-distributed.sh config/connect-distributed.properties
    The worker startup process outputs a large number of informational messages. When the process is complete the following message appears:
    [2018-10-13 19:49:25,385] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:852)
  2. Register the connector configuration with the distributed worker:
    curl -X POST -H "Content-Type: application/json" -d @config/dse-sink.json "http://localhost:8083/connectors"
    where
    • conf_dir is the plugin configuration directory
    • ip and port of the Kafka worker .

      Use the same port as the rest.port parameter set in the connect-distributed.properties. The default is 8083.