Scaling the DataStax Apache Pulsar™ Connector
Use the Apache Pulsar™ administration tool to increase or decrease the number of workers to run for a given sink using the parallelism factor. You can specify the parallelism factor during creation of a Pulsar sink, and you can modify it after the fact as well. The default parallelism factor is 1
.
Configuring parallelism during sink creation
To configure parallelism during sink creation, add the --parallelism
flag to the pulsar-admin sinks create
command and specify the number of workers:
Example create a Pulsar sink with a parallelism factor of 3:
bin/pulsar-admin sinks create \
--name dse-sink-kv \
--classname com.datastax.oss.sink.pulsar.StringCassandraSinkTask \
--parallelism: 3 \
--sink-config-file conf/qs.yml \
--sink-type cassandra-enhanced \
--tenant public \
--namespace default \
--inputs "persistent://public/default/example_topic"
"Created successfully"
The sink will run three parallel Pulsar workers.
Modifying parallelism post sink creation
To modify the parallelism factor of an existing sink, you can use pulsar-admin sinks update
command and increase or reduce the factor as required:
Example change the parallelism factor of an existing sink:
bin/pulsar-admin sinks update --name dse-sink-kv --parallelism 1
"Updated successfully"
The sink will terminate all but a single Pulsar worker.