Kafka
The Kafka source connector pulls data from a Kafka topic and persists the data into an Apache Pulsar™ topic.
For more, see Apache Pulsar’s Kafka source documentation.
Get Started
Set the following environment variables using pulsar-admin
or curl:
export TENANT=<replace-me>
export DESTINATION_TOPIC=<replace-me>
export NAMESPACE=default
export SOURCE_NAME=kafka-src
-
Pulsar Admin
-
curl
-
Sample Config Data
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
./bin/pulsar-admin sources create \
--source-type kafka \
--name "$SOURCE_NAME" \
--destination-topic-name "persistent://$TENANT/$NAMESPACE/$DESTINATION_TOPIC" \
--tenant "$TENANT" \
--source-config '{
"bootstrapServers": "asdasd",
"consumerConfigProperties": {
"sasl.jaas.config": "sensitive_data_removed",
"sasl.mechanism": "PLAIN",
"sasl.password": "sensitive_data_removed",
"sasl.username": "asdasd",
"security.protocol": "SASL_SSL"
},
"groupId": "asd",
"topic": "asdasd"
}'
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
Refer to the complete Pulsar sources REST API spec, for all available options.
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/astrasources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: $ASTRA_STREAMING_TOKEN" \
-d '{
"tenant": "'$TENANT'",
"topicName": "persistent://'$TENANT'/'$NAMESPACE'/'$DESTINATION_TOPIC'",
"name": "'$SOURCE_NAME'",
"namespace": "'$NAMESPACE'",
"archive": "builtin://kafka",
"parallelism": 1,
"processingGuarantees": "ATLEAST_ONCE",
"configs": {
"bootstrapServers": "asdasd",
"consumerConfigProperties": {
"sasl.jaas.config": "sensitive_data_removed",
"sasl.mechanism": "PLAIN",
"sasl.password": "sensitive_data_removed",
"sasl.username": "asdasd",
"security.protocol": "SASL_SSL"
},
"groupId": "asd",
"topic": "asdasd"
}
}'
{
"bootstrapServers": "asdasd",
"consumerConfigProperties": {
"sasl.jaas.config": "sensitive_data_removed",
"sasl.mechanism": "PLAIN",
"sasl.password": "sensitive_data_removed",
"sasl.username": "asdasd",
"security.protocol": "SASL_SSL"
},
"groupId": "asd",
"topic": "asdasd"
}
Managing the Connector
Start
-
Pulsar Admin
-
curl
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
# Start all instances of a connector
./bin/pulsar-admin sources start \
--namespace "$NAMESPACE" \
--name "$SOURCE_NAME" \
--tenant "$TENANT"
# optionally add --instance-id to only start an individual instance
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
Refer to the complete Pulsar sources REST API spec, for all available options.
Start all instances of a connector:
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/start" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
Start an individual instance of a connector:
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/start" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
Stop
-
Pulsar Admin
-
curl
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
# Stop all instances of a connector
./bin/pulsar-admin sources stop \
--namespace "$NAMESPACE" \
--name "$SOURCE_NAME" \
--tenant "$TENANT"
# optionally add --instance-id to only stop an individual instance
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
Refer to the complete Pulsar sources REST API spec, for all available options.
Stop all instances of a connector:
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/stop" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
Stop an individual instance of a connector:
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/stop" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
Restart
-
Pulsar Admin
-
curl
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
# Restart all instances of a connector
./bin/pulsar-admin sources restart \
--namespace "$NAMESPACE" \
--name "$SOURCE_NAME" \
--tenant "$TENANT"
# optionally add --instance-id to only restart an individual instance
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
Refer to the complete Pulsar sources REST API spec, for all available options.
# Restart all instances of a connector
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/restart" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
# Restart an individual instance of a connector
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/restart" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
Update
-
Pulsar Admin
-
curl
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
./bin/pulsar-admin sources update \
--source-type kafka \
--name "$SOURCE_NAME" \
--destination-topic-name "persistent://$TENANT/$NAMESPACE/$DESTINATION_TOPIC" \
--tenant "$TENANT" \
--parallelism 2 \
--source-config '{}'
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
Refer to the complete Pulsar sources REST API spec, for all available options.
curl -sS --fail -X PUT "$WEB_SERVICE_URL/admin/v3/astrasources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: $ASTRA_STREAMING_TOKEN" \
-d '{
"tenant": "'$TENANT'",
"topicName": "persistent://'$TENANT'/'$NAMESPACE'/'$DESTINATION_TOPIC'",
"name": "'$SOURCE_NAME'",
"namespace": "'$NAMESPACE'",
"archive": "builtin://kafka",
"parallelism": 1,
"processingGuarantees": "ATLEAST_ONCE",
"configs": {
"bootstrapServers": "asdasd",
"consumerConfigProperties": {
"sasl.jaas.config": "sensitive_data_removed",
"sasl.mechanism": "PLAIN",
"sasl.password": "sensitive_data_removed",
"sasl.username": "asdasd",
"security.protocol": "SASL_SSL"
},
"groupId": "asd",
"topic": "asdasd"
}
}'
Result
{
"tenant": "string",
"namespace": "string",
"name": "string",
"className": "string",
"topicName": "string",
"producerConfig": {
"maxPendingMessages": 0,
"maxPendingMessagesAcrossPartitions": 0,
"useThreadLocalProducers": true,
"cryptoConfig": {
"cryptoKeyReaderClassName": "string",
"cryptoKeyReaderConfig": {
"property1": {},
"property2": {}
},
"encryptionKeys": [
"string"
],
"producerCryptoFailureAction": "FAIL",
"consumerCryptoFailureAction": "FAIL"
},
"batchBuilder": "string"
},
"serdeClassName": "string",
"schemaType": "string",
"configs": {
"property1": {},
"property2": {}
},
"secrets": {
"property1": {},
"property2": {}
},
"parallelism": 0,
"processingGuarantees": "ATLEAST_ONCE",
"resources": {
"cpu": 0,
"ram": 0,
"disk": 0
},
"archive": "string",
"runtimeFlags": "string",
"customRuntimeOptions": "string",
"batchSourceConfig": {
"discoveryTriggererClassName": "string",
"discoveryTriggererConfig": {
"property1": {},
"property2": {}
}
},
"batchBuilder": "string"
}
Delete
-
Pulsar Admin
-
curl
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
# Delete all instances of a connector
./bin/pulsar-admin sources delete \
--namespace "$NAMESPACE" \
--name "$SOURCE_NAME" \
--tenant "$TENANT"
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
Refer to the complete Pulsar sources REST API spec, for all available options.
# Delete all instances of a connector
curl -sS --fail -X DELETE "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
-H "Authorization: $ASTRA_STREAMING_TOKEN"
Monitoring the Connector
Info
-
Pulsar Admin
-
curl
-
Sample Config Data
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
# Get information about connector
./bin/pulsar-admin sources get \
--namespace "$NAMESPACE" \
--name "$SOURCE_NAME" \
--tenant "$TENANT"
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
-
Use these values to form curl commands to the REST API, for example:
# Get a connector's information curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \ -H "accept: application/json" \ -H "Authorization: $ASTRA_STREAMING_TOKEN"
{
"bootstrapServers": "asdasd",
"consumerConfigProperties": {
"sasl.jaas.config": "sensitive_data_removed",
"sasl.mechanism": "PLAIN",
"sasl.password": "sensitive_data_removed",
"sasl.username": "asdasd",
"security.protocol": "SASL_SSL"
},
"groupId": "asd",
"topic": "asdasd"
}
Health
-
Pulsar Admin
-
curl
Refer to the complete pulsar-admin sources spec for all available options.
Assuming you have downloaded client.conf
to the Pulsar
folder:
# Check connector status
./bin/pulsar-admin sources status \
--instance-id "$SOURCE_INSTANCEID" \
--namespace "$NAMESPACE" \
--name "$SOURCE_NAME" \
--tenant "$TENANT"
You need a Pulsar token for REST API authentication. This is different from your Astra DB application tokens.
-
In the Astra Portal, click Streaming tenants.
-
Click your tenant’s name, and then click the Settings tab.
-
Click Create Token.
-
Copy the token, store it securely, and then click Close.
-
Click the Connect tab, and then copy the Web Service URL.
-
Create environment variables for your tenant’s token and web service URL:
export WEB_SERVICE_URL=<replace-me> export ASTRA_STREAMING_TOKEN=<replace-me>
-
Use these values to form curl commands to the REST API, for example:
# Get the status of all connector instances curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/status" \ -H "accept: application/json" \ -H "Authorization: $ASTRA_STREAMING_TOKEN" # Get the status of an individual connector instance curl "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/status" \ -H "accept: application/json" \ -H "Authorization: $ASTRA_STREAMING_TOKEN"
Result
Status response for all connector instances:
{
"numInstances": 0,
"numRunning": 0,
"instances": [
{
"instanceId": 0,
"status": {
"running": true,
"error": "string",
"numRestarts": 0,
"numReceivedFromSource": 0,
"numSystemExceptions": 0,
"latestSystemExceptions": [
{
"exceptionString": "string",
"timestampMs": 0
}
],
"numSourceExceptions": 0,
"latestSourceExceptions": [
{
"exceptionString": "string",
"timestampMs": 0
}
],
"numWritten": 0,
"lastReceivedTime": 0,
"workerId": "string"
}
}
]
}
Status response for individual connector instance:
{
"running": true,
"error": "string",
"numRestarts": 0,
"numReceivedFromSource": 0,
"numSystemExceptions": 0,
"latestSystemExceptions": [
{
"exceptionString": "string",
"timestampMs": 0
}
],
"numSourceExceptions": 0,
"latestSourceExceptions": [
{
"exceptionString": "string",
"timestampMs": 0
}
],
"numWritten": 0,
"lastReceivedTime": 0,
"workerId": "string"
}
Metrics
Astra Streaming exposes Prometheus formatted metrics for every connector. Refer to scrape metrics with Prometheus page for more detail.
Connector Reference
There are two sets of parameters that support source connectors.
Astra Streaming
Name | Required | Default | Description |
---|---|---|---|
archive |
true |
The connector type, like 'builtin://debezium-mysql' |
|
batchBuilder |
false |
BatchBuilder provides two types of batch construction methods, DEFAULT and KEY_BASED. The default value is: DEFAULT |
|
batchSourceConfig |
false |
Batch source config key/value (as a JSON string) |
|
className |
true |
The connector type’s class reference, like 'org.apache.pulsar.io.debezium.mysql.DebeziumMysqlSource' |
|
configs |
false |
{} |
JSON key/value config of source type specific settings. Example: {"property1":"1234","property2":{"subProperty":"asdf"}} |
customRuntimeOptions |
false |
A string that encodes options to customize the runtime, see Apache Pulsar docs for configured runtime for details |
|
name |
true |
Give your source a good name for later reference. The name must start with a lowercase alphabetic character. It can only contain lowercase alphanumeric characters, and hyphens (kebab-case). |
|
namespace |
true |
The namespace you’d like the source created under |
|
parallelism |
true |
1 |
The number of a Pulsar Source instances to run |
processingGuarantees |
true |
ATLEAST_ONCE |
The delivery semantics applied to the Pulsar Sink. Values are 'ATLEAST_ONCE', 'ATMOST_ONCE', 'EFFECTIVELY_ONCE' |
producerConfig |
false |
The custom producer configuration (as a JSON string) |
|
resources |
false |
The compute resources that need to be allocated per source instance (applicable only to the process)(as a JSON string). Example: {"cpu": 0.25,"disk":1000000000,"ram":500000000} |
|
runtimeFlags |
false |
A string that encodes options to customize the runtime, see Apache Pulsar docs for configured runtime for details |
|
schemaType |
false |
The schema type (either a builtin schema like 'avro', 'json', etc.. or custom Schema class name to be used to encode messages emitted from the Pulsar Source |
|
secrets |
false |
This is a map of secretName(that is how the secret is going to be accessed in the function via context) to an object that encapsulates how the secret is fetched by the underlying secrets provider. The type of an value here can be found by the SecretProviderConfigurator.getSecretObjectType() method |
|
serdeClassName |
false |
The SerDe classname for the Pulsar Source |
|
tenant |
true |
The tenant you’d like the source created under |
|
topicName |
true |
The name of an existing topic in Astra Streaming, where messages will be published to. Should be in the format of [non-]persistent://<tenant>/<namespace>/<topic-name> |
Kafka configuration options
These values are provided in the configs
area:
Name | Type | Required | Default | Description |
---|---|---|---|---|
|
String |
true |
" " (empty string) |
A comma-separated list of host and port pairs for establishing the initial connection to the Kafka cluster. |
|
String |
true |
" " (empty string) |
A unique string that identifies the group of consumer processes to which this consumer belongs. |
|
long |
false |
1 |
The minimum byte expected for each fetch response. |
|
boolean |
false |
true |
If set to true, the consumer’s offset is periodically committed in the background. |
|
long |
false |
5000 |
The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if |
|
long |
false |
3000 |
The interval between heartbeats to the consumer when using Kafka’s group management facilities. |
|
long |
false |
30000 |
The timeout used to detect consumer failures when using Kafka’s group management facility. |
|
String |
true |
" " (empty string) |
The Kafka topic that sends messages to Pulsar. |
|
Map |
false |
" " (empty string) |
The consumer configuration properties to be passed to consumers. |
|
String |
false |
org.apache.kafka.common.serialization.StringDeserializer |
The deserializer class for Kafka consumers to deserialize keys. |
|
String |
false |
org.apache.kafka.common.serialization.ByteArrayDeserializer |
The deserializer class for Kafka consumers to deserialize values. |
|
String |
false |
earliest |
The default offset reset policy. |
The Astra Streaming Kafka source connector supports all configuration properties provided by Apache Pulsar. For a complete list, see the Kafka source connector properties.