Troubleshoot the DataStax Apache Pulsar connector

This page describes some common issues that you might encounter when using the DataStax Apache Pulsar™ connector and how to resolve them.

Load balancing datacenter is not specified

The following error message occurs when the load balancing datacenter isn’t set in the connector’s configuration file:

org.apache.pulsar.common.config.ConfigException: Invalid value [127.0.0.2] for
          configuration contactPoints: When contact points is provided, loadBalancing.localDc must
          also be specified

The connector requires that you set the loadBalancing.localDc for DSE clusters when contactPoints isn’t set to localhost.

Check the datacenter name in the cluster by running nodetool status, and then verify that loadBalancing.localDc has the exact same name, including capitalization. DSE datacenter names are case sensitive.

For Astra clusters, the datacenter is set by the Secure Connect Bundle (SCB). Make sure your connector configuration file includes a valid path to the database’s SCB zip file.

Writes fail because of mutation size

The Pulsar connector collects records to write to the database in CQL BATCH commands. Data insertions and deletions in the records are known as mutations. The connector is implemented to use single-partition batches that are submitted as one mutation operation.

For a given batch, if the total size of the mutation exceeds the maximum allowed by DSE in max_mutation_size_in_kb the batch is rejected and the error message is written to the system.log, by default in var/log/cassandra.

For example:

Mutation of 28087887 bytes is too large for the maximum size of 16777216

Before making any changes, understand the relationship between the max_mutation_size_in_kb and commitlog_segment_size_in_mb settings:

  • These parameters have default values if they aren’t explicitly set in cassandra.yaml.

  • The max_mutation_size_in_kb value is calculated as half of commitlog_segment_size_in_mb.

  • If you set max_mutation_size_in_kb explicitly, then you must also set commitlog_segment_size_in_mb to the result of 2 * max_mutation_size_in_kb / 1024.

However, DataStax recommends that you investigate why the mutations are larger than expected. Look for underlying issues with your client application, access patterns, and data model because increasing the commitlog segment size is a limited fix.

You can decrease the batch size by lowering the number of records collected in each batch in maxNumberOfRecordsInBatch.

If you cannot decrease the size of your batches, test whether increases to max_mutation_size_in_kb and commitlog_segment_size_in_mb result in batches completing successfully, without consuming too much RAM on the partition’s node. You can also increase the DSE database batch threshold using batch_size_warn_threshold_in_kb.

Records fail to write

Missing records can have many causes, such as node availability, transient errors that prevent writes due to consistency level, or schema changes that removed a mapped column from the target table.

To investigate this issue, do the following:

  1. Change the connector’s verbose option to true in your sink configuration file:

    configs:
      verbose: true
  2. Reload the sink configuration file:

    bin/pulsar-admin sinks update --name cass-sink-kv --sinkConfigFile conf/qs.yml
    "Updated successfully"
  3. Check for WARN and ERROR entries in the output log.

Data parsing fails

When the data from a Pulsar field isn’t compatible with the data type of the mapped column in the target table, the data conversion fails and a message is logged in the Pulsar Connect log. For example:

java.lang.IllegalArgumentException: Could not parse 'jack'; accepted formats are: a valid
          number (e.g. '1234.56'), a valid Java numeric format (e.g. '-123.45e6'), a valid date-time
          pattern (e.g. '2018-10-17T18:37:52.704Z'), or a valid boolean word

Ensure that the Pulsar fields are mapped to the correct database columns. The connector can not automatically convert varying types.

If they do not match, consider changing your table schema to accommodate the Pulsar field types.

When adding or changing columns in a database table, ensure that the schema is fully propagated before continuing. With DSE, use nodetool describering to show the schema version.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com