Install DataStax Apache Pulsar™ connector

Install DataStax Apache Pulsar™ connector from the DataStax distribution tar file using an account that has write access to the Pulsar configuration directory.

System requirements

The system requirements for DataStax Pulsar connector depends on the workload and network capacity. The factors include characteristics of the Pulsar topic and the cluster data models and volume. DataStax recommends testing with realistic data flows before committing to an instance type for the connector.

Database

The DataStax Pulsar connector supports the following databases:

  • DataStax Astra DB

  • DataStax Enterprise (DSE) 4.7 and later (non-EOL versions recommended)

  • Open source Apache Cassandra® 2.1 and later (non-EOL versions recommended, full compatibility with 5.x isn’t guaranteed)

Pulsar

The DataStax Pulsar connector requires Apache Pulsar 2.7.0 or later.

The connector supports the following data structures in Pulsar topics:

  • Primitive string values

  • Avro

  • JSON formatted string with JSON schema

  • JSON formatted string inside a schemaless topic

Operating system

The supported operating systems are Linux and macOS.

CPU

DataStax Pulsar connector is bound by the amount of CPU available on the host. The connector holds all the records pulled from Pulsar topics in memory, along with the cluster metadata and prepared statements.

Memory pressure is influenced by the following:

  • Record size of Pulsar topics

  • Number of records pulled at the same time, where the maximum is set by the workers batchSize parameter.

  • Number of simultaneous tasks run by the connector

Network

DataStax Pulsar connector needs adequate network capacity for the payload. This includes the connections from Pulsar Servers to the target platform. Scale the connector horizontally by adding additional workers to increase overall throughput.

The DataStax Pulsar connector framework automatically rebalances the load when workers are added by reallocating the tasks among the workers.

Install the connector

Perform the following steps on a Pulsar Connect node running Apache Pulsar 2.7.0 or later. If you need to install Pulsar, see Pulsar connector single instance quickstart for DSE.

  1. Download the DataStax Apache Pulsar connector tar file from the DataStax downloads site.

  2. Extract the files, replacing VERSION with the version number of the tar file you downloaded:

    tar zxf cassandra-enhanced-pulsar-sink-VERSION.tar.gz

    The following files are unpacked into a directory named after the tar file, such as cassandra-enhanced-pulsar-sink-1.4.0:

    LICENSE.txt
    README.md
    THIRD-PARTY.txt
    conf/example.yml
    cassandra-enhanced-pulsar-sink-1.4.0.nar
  3. In your Pulsar home directory, find the connectors directory. If there isn’t a connectors directory, create one.

  4. Move the DataStax Pulsar connector NAR file to the Pulsar connectors directory:

    mv installation_location/cassandra-enhanced-pulsar-sink-1.4.0.nar pulsar_home/connectors
  5. Copy the sample configuration file example.yml from cassandra-enhanced-pulsar-sink-VERSION/conf/ to your Pulsar config directory.

    If you plan to create multiple sinks from this connector, give your configuration file a unique name.

  6. Edit the configuration file as necessary using the information provided in this documentation. For example, for information about connection, authentication, and encryption parameters, see Connect the DataStax Apache Pulsar™ connector.

  7. Ensure that the user running Pulsar has permission to access the configuration and NAR files.

Start Pulsar with the connector

  1. Start, restart, or reload your Pulsar instance:

    bin/pulsar-admin sinks reload
  2. Check that the DataStax Pulsar connector is available:

    bin/pulsar-admin sinks available-sinks
  3. Create a Pulsar sink:

    bin/pulsar-admin sinks create \
    	--name SINK_NAME \
    	--classname com.datastax.oss.sink.pulsar.StringCassandraSinkTask \
    	--sink-config-file config/example.yml \
    	--sink-type cassandra-enhanced \
    	--tenant TENANT_NAME \
    	--namespace NAMESPACE_NAME \
    	--inputs "persistent://TENANT_NAME/NAMESPACE_NAME/TOPIC_NAME"

    Replace the following:

    • SINK_NAME: A unique name for the sink.

    • TENANT_NAME: The name of the relevant Pulsar tenant.

    • NAMESPACE_NAME: The name of the relevant Pulsar namespace.

    • TOPIC_NAME: The name of the Pulsar topic that you want to stream to your database.

      The topic mapping is set in the connector’s configuration YAML file. Make sure that it matches the schema of the table where you want to write the messages. For more information, see Pulsar topic-to-table parameters.

  4. Use pulsar-client produce to send some messages to your new sink, and then use cqlsh to verify that the messages were written to the table in your database.

Next steps

Explore the rest of the documentation to learn more about configuring and using the DataStax Pulsar connector.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com