Scaling the DataStax Apache Pulsar™ Connector

Use the Apache Pulsar™ administration tool to increase or decrease the number of workers to run for a given sink using the parallelism factor. You can specify the parallelism factor during creation of a Pulsar sink, and you can modify it after the fact as well. The default parallelism factor is 1.

Configuring parallelism during sink creation

To configure parallelism during sink creation, add the --parallelism flag to the pulsar-admin sinks create command and specify the number of workers:

Example create a Pulsar sink with a parallelism factor of 3:

bin/pulsar-admin sinks create \
	--name dse-sink-kv \
  --classname com.datastax.oss.sink.pulsar.StringCassandraSinkTask \
	--parallelism: 3 \
	--sink-config-file conf/qs.yml \
	--sink-type cassandra-enhanced \
	--tenant public \
	--namespace default \
	--inputs "persistent://public/default/example_topic"
"Created successfully"

The sink will run three parallel Pulsar workers.

Modifying parallelism post sink creation

To modify the parallelism factor of an existing sink, you can use pulsar-admin sinks update command and increase or reduce the factor as required:

Example change the parallelism factor of an existing sink:

bin/pulsar-admin sinks update --name dse-sink-kv --parallelism 1
"Updated successfully"

The sink will terminate all but a single Pulsar worker.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com