Scale the DataStax Apache Pulsar™ connector
The parallelism
parameter specifies the number of workers (instances) that run for an Apache Pulsar™ sink.
You can set the parallelism factor when you create a sink or modify it afterwards.
The default parallelism factor is 1
.
Configure parallelism when creating a sink
To configure parallelism when you create a sink, add the --parallelism
flag to the pulsar-admin sinks create
command and specify the number of instances:
The following example creates a Pulsar sink with a parallelism factor of 3:
bin/pulsar-admin sinks create \
--name dse-sink-kv \
--classname com.datastax.oss.sink.pulsar.StringCassandraSinkTask \
--parallelism 3 \
--sink-config-file conf/qs.yml \
--sink-type cassandra-enhanced \
--tenant public \
--namespace default \
--inputs "persistent://public/default/example_topic"
With this configuration, the sink runs three parallel instances of this sink.
Modify parallelism for an existing sink
To modify the parallelism factor of an existing sink, use the pulsar-admin sinks update
command and specify the desired parallelism factor
For example, to change the parallelism factor to 3 for a sink named dse-sink-kv
, run the following command:
bin/pulsar-admin sinks update --name dse-sink-kv --parallelism 3
If the parallelism factor increased, the command starts new instances to scale up to the new value. If the factor decreased, the command stops existing instances to scale down to the new value.