Troubleshoot the DataStax Apache Pulsar connector
This page describes some common issues that you might encounter when using the DataStax Apache Pulsar™ connector and how to resolve them.
Load balancing datacenter is not specified
The following error message occurs when the load balancing datacenter isn’t set in the connector’s configuration file:
org.apache.pulsar.common.config.ConfigException: Invalid value [127.0.0.2] for
configuration contactPoints: When contact points is provided, loadBalancing.localDc must
also be specified
The connector requires that you set the loadBalancing.localDc
for DSE clusters when contactPoints
isn’t set to localhost
.
Check the datacenter name in the cluster by running nodetool status
, and then verify that loadBalancing.localDc
has the exact same name, including capitalization.
DSE datacenter names are case sensitive.
For Astra clusters, the datacenter is set by the Secure Connect Bundle (SCB). Make sure your connector configuration file includes a valid path to the database’s SCB zip file.
Writes fail because of mutation size
The Pulsar connector collects records to write to the database in CQL BATCH
commands.
Data insertions and deletions in the records are known as mutations.
The connector is implemented to use single-partition batches that are submitted as one mutation operation.
For a given batch, if the total size of the mutation exceeds the maximum allowed by DSE in max_mutation_size_in_kb
the batch is rejected and the error message is written to the system.log
, by default in var/log/cassandra
.
For example:
Mutation of 28087887 bytes is too large for the maximum size of 16777216
Before making any changes, understand the relationship between the max_mutation_size_in_kb
and commitlog_segment_size_in_mb
settings:
-
These parameters have default values if they aren’t explicitly set in
cassandra.yaml
. -
The
max_mutation_size_in_kb
value is calculated as half ofcommitlog_segment_size_in_mb
. -
If you set
max_mutation_size_in_kb
explicitly, then you must also setcommitlog_segment_size_in_mb
to the result of2 * max_mutation_size_in_kb / 1024
.
However, DataStax recommends that you investigate why the mutations are larger than expected. Look for underlying issues with your client application, access patterns, and data model because increasing the commitlog segment size is a limited fix.
You can decrease the batch size by lowering the number of records collected in each batch in maxNumberOfRecordsInBatch
.
If you cannot decrease the size of your batches, test whether increases to max_mutation_size_in_kb
and commitlog_segment_size_in_mb
result in batches completing successfully, without consuming too much RAM on the partition’s node.
You can also increase the DSE database batch threshold using batch_size_warn_threshold_in_kb
.
Records fail to write
Missing records can have many causes, such as node availability, transient errors that prevent writes due to consistency level, or schema changes that removed a mapped column from the target table.
To investigate this issue, do the following:
-
Change the connector’s
verbose
option totrue
in your sink configuration file:configs: verbose: true
-
Reload the sink configuration file:
bin/pulsar-admin sinks update --name cass-sink-kv --sinkConfigFile conf/qs.yml "Updated successfully"
-
Check for
WARN
andERROR
entries in the output log.
Data parsing fails
When the data from a Pulsar field isn’t compatible with the data type of the mapped column in the target table, the data conversion fails and a message is logged in the Pulsar Connect log. For example:
java.lang.IllegalArgumentException: Could not parse 'jack'; accepted formats are: a valid
number (e.g. '1234.56'), a valid Java numeric format (e.g. '-123.45e6'), a valid date-time
pattern (e.g. '2018-10-17T18:37:52.704Z'), or a valid boolean word
Ensure that the Pulsar fields are mapped to the correct database columns. The connector can not automatically convert varying types.
If they do not match, consider changing your table schema to accommodate the Pulsar field types.
When adding or changing columns in a database table, ensure that the schema is fully propagated before continuing.
With DSE, use nodetool describering
to show the schema version.