Streaming data with the DataStax Apache Kafka Connector

Deploy the DataStax Apache Kafka™ Connector to stream records from an Apache Kafka topic in your DataStax Astra database.

Deploy the DataStax Apache Kafka™ Connector to stream records from an Apache Kafka topic to your DataStax Astra database.

The DataStax Apache Kafka Connector download package includes a sample JSON properties file (dse-sink-distributed.json.sample). Use the sample file as a reference when configuring your deployment.

dse-sink-distributed.json.sample

  • The dse-sink-distributed.json.sample file is located in the conf directory of the DataStax Apache Kafka Connector distribution package.

Prerequisites

  1. Download and install the DataStax Apache Kafka Connector.
  2. Configure the distributed worker configuration file connect-distributed.properties to fit your needs. Use this example from DataStax as a starting point.

    Specify the converter for the key.converter and value.converter properties that matches the form of your Kafka data. See Configuring converters in the Confluent documentation for more information on these properties.

Procedure

  1. From the directory where you installed Apache Kafka, start the distributed worker:
    bin/connect-distributed.sh config/connect-distributed.properties
    The worker startup process outputs a large number of informational messages. The following message displays after the process completes:
    [2019-10-13 19:49:25,385] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:852)
  2. Configure the JSON configuration file (such as dse-sink.json) to use the Astra secure connect bundle.
    {
        "name": "dse-sink",
        "config": {
            "connector.class": "com.datastax.kafkaconnector.DseSinkConnector",
            "cloud.secureConnectBundle": "/path/to/secure-connect-database-name.zip",
            "auth.username": "username",
            "auth.password": "password"
    ...
        }
    }
    name
    Unique name for the connector.

    Default: dse-sink

    connector.class
    DataStax connector Java class provided in the kafka-connect-dse-N.N.N.jar

    Default: com.datastax.kafkaconnector.DseSinkConnector

    cloud.secureConnectBundle
    The full path to the secure connect bundle for your DataStax Astra database (secure-connect-database_name.zip). Download the secure connect bundle from the DataStax Cloud console.

    If this option is specified, you must also include the auth.username and auth.password for the database user.

    auth.username
    DSE login role name or LDAP username.Astra database username.
    Note: When authorization is enabled, the DataStax connector login role must have a minimum of modify privileges on tables receiving data from the DataStax Apache Kafka® Connector.
    auth.password
    Login role or LDAP password.Astra database password for the specified username.
  3. Register the connector configuration with the distributed worker:
    curl -X POST -H "Content-Type: application/json" -d @dse-sink.json "http://ip:port/connectors"

    ip and port are the IP address and port number of the Kafka worker.

    Use the same port as the rest.port parameter set in connect-distributed.properties. The default port is 8083.

    Note: You configured the dse-sink.json or dse-sink.properties file when installing the DataStax Apache Kafka Connector.

What's next

For more information about the DataStax Apache Kafka Connector, see Maintaining the DataStax Connector.