Data Generator

The Data Generator source connector creates fake data on an Apache Pulsar topic using the JFAIRY library to generate a message containing "person" data.

"Person" data includes first and last name, home address, and email address, among other details.

The connector will produce data indefinitely while it is running.

Astra Streaming currently supports Apache Pulsar 2.10, which uses version 0.5.9 of the jfairy library.

For a reference of the full "Person" class, view the source.

Get Started

Set the required variables using any of the methods below.

export TENANT=<replace-me>
export DESTINATION_TOPIC=<replace-me>
export NAMESPACE=default
export SOURCE_NAME=data-gen-src
  • Pulsar Admin

  • cURL

  • Sample Config Data

Refer to the complete pulsar-admin sources spec for all available options.

Assuming you have downloaded client.conf to the Pulsar folder:

./bin/pulsar-admin sources create \
  --source-type data-generator \
  --name "$SOURCE_NAME" \
  --destination-topic-name "persistent://$TENANT/$NAMESPACE/$DESTINATION_TOPIC" \
  --tenant "$TENANT" \
  --source-config '{
    "sleepBetweenMessages": "50"
    }'

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

Refer to the complete Pulsar sources REST API spec, for all available options.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN" \
  -d '{
        "topicName": "persistent://'$TENANT'/'$NAMESPACE'/'$DESTINATION_TOPIC'",
        "archive": "builtin://data-generator",
        "configs": {
          "sleepBetweenMessages": "50"
        }
      }'
{
  "topicName": "string",
  "producerConfig": {
    "maxPendingMessages": 0,
    "maxPendingMessagesAcrossPartitions": 0,
    "useThreadLocalProducers": true,
    "cryptoConfig": {
      "cryptoKeyReaderClassName": "string",
      "cryptoKeyReaderConfig": {
        "property1": {},
        "property2": {}
      },
      "encryptionKeys": [
        "string"
      ],
      "producerCryptoFailureAction": "FAIL",
      "consumerCryptoFailureAction": "FAIL"
    },
    "batchBuilder": "string"
  },
  "serdeClassName": "string",
  "schemaType": "string",
  "configs": {
    "property1": {},
    "property2": {}
  },
  "secrets": {
    "property1": {},
    "property2": {}
  },
  "parallelism": 0,
  "processingGuarantees": "ATLEAST_ONCE",
  "resources": {
    "cpu": 0,
    "ram": 0,
    "disk": 0
  },
  "archive": "string",
  "runtimeFlags": "string",
  "customRuntimeOptions": "string",
  "batchSourceConfig": {
    "discoveryTriggererClassName": "string",
    "discoveryTriggererConfig": {
      "property1": {},
      "property2": {}
    }
  },
  "batchBuilder": "string"
}

Managing the Connector

Start

  • Pulsar Admin

  • cURL

Refer to the complete pulsar-admin sources spec for all available options.

Assuming you have downloaded client.conf to the Pulsar folder:

# Start all instances of a connector
./bin/pulsar-admin sources start \
  --namespace "$NAMESPACE" \
  --name "$SOURCE_NAME" \
  --tenant "$TENANT"

# optionally add --instance-id to only start an individual instance

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

Refer to the complete Pulsar sources REST API spec, for all available options.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Start all instances of a connector
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/start" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

# Start an individual instance of a connector
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/start" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

Stop

  • Pulsar Admin

  • cURL

Refer to the complete pulsar-admin sources spec for all available options.

Assuming you have downloaded client.conf to the Pulsar folder:

# Stop all instances of a connector
./bin/pulsar-admin sources stop \
  --namespace "$NAMESPACE" \
  --name "$SOURCE_NAME" \
  --tenant "$TENANT"

# optionally add --instance-id to only stop an individual instance

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

Refer to the complete Pulsar sources REST API spec, for all available options.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Stop all instances of a connector
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/stop" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

# Stop an individual instance of a connector
#curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/stop" \
#  -H "Authorization: $ASTRA_STREAMING_TOKEN"

Restart

  • Pulsar Admin

  • cURL

Refer to the complete pulsar-admin sources spec for all available options.

Assuming you have downloaded client.conf to the Pulsar folder:

# Restart all instances of a connector
./bin/pulsar-admin sources restart \
  --namespace "$NAMESPACE" \
  --name "$SOURCE_NAME" \
  --tenant "$TENANT"

# optionally add --instance-id to only restart an individual instance

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

Refer to the complete Pulsar sources REST API spec, for all available options.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Restart all instances of a connector
curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/restart" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

# Restart an individual instance of a connector
#curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/restart" \
#  -H "Authorization: $ASTRA_STREAMING_TOKEN"

Update

  • Pulsar Admin

  • cURL

  • Response

Refer to the complete pulsar-admin sources spec for all available options.

Assuming you have downloaded client.conf to the Pulsar folder:

./bin/pulsar-admin sources update \
  --source-type data-generator \
  --name "$SOURCE_NAME" \
  --destination-topic-name "persistent://$TENANT/$NAMESPACE/$DESTINATION_TOPIC" \
  --tenant "$TENANT" \
  --source-config '{
    "sleepBetweenMessages": "100"
    }'

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

Refer to the complete Pulsar sources REST API spec, for all available options.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Update all instances of a connector currently running
curl -sS --fail -X PUT "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN" \
  -d '{
        "topicName": "persistent://'$TENANT'/'$NAMESPACE'/'$DESTINATION_TOPIC'",
        "configs": {
          "sleepBetweenMessages": "100"
        }
      }'

# REQUEST DATA SAMPLE
# {
#   "topicName": "string",
#   "producerConfig": {
#     "maxPendingMessages": 0,
#     "maxPendingMessagesAcrossPartitions": 0,
#     "useThreadLocalProducers": true,
#     "cryptoConfig": {
#       "cryptoKeyReaderClassName": "string",
#       "cryptoKeyReaderConfig": {
#         "property1": {},
#         "property2": {}
#       },
#       "encryptionKeys": [
#         "string"
#       ],
#       "producerCryptoFailureAction": "FAIL",
#       "consumerCryptoFailureAction": "FAIL"
#     },
#     "batchBuilder": "string"
#   },
#   "serdeClassName": "string",
#   "schemaType": "string",
#   "configs": {
#     "property1": {},
#     "property2": {}
#   },
#   "secrets": {
#     "property1": {},
#     "property2": {}
#   },
#   "parallelism": 0,
#   "processingGuarantees": "ATLEAST_ONCE",
#   "resources": {
#     "cpu": 0,
#     "ram": 0,
#     "disk": 0
#   },
#   "archive": "string",
#   "runtimeFlags": "string",
#   "customRuntimeOptions": "string",
#   "batchSourceConfig": {
#     "discoveryTriggererClassName": "string",
#     "discoveryTriggererConfig": {
#       "property1": {},
#       "property2": {}
#     }
#   },
#   "batchBuilder": "string"
# }
 {
   "tenant": "string",
   "namespace": "string",
   "name": "string",
   "className": "string",
   "topicName": "string",
   "producerConfig": {
     "maxPendingMessages": 0,
     "maxPendingMessagesAcrossPartitions": 0,
     "useThreadLocalProducers": true,
     "cryptoConfig": {
       "cryptoKeyReaderClassName": "string",
       "cryptoKeyReaderConfig": {
         "property1": {},
         "property2": {}
       },
       "encryptionKeys": [
         "string"
       ],
       "producerCryptoFailureAction": "FAIL",
       "consumerCryptoFailureAction": "FAIL"
     },
     "batchBuilder": "string"
   },
   "serdeClassName": "string",
   "schemaType": "string",
   "configs": {
     "property1": {},
     "property2": {}
   },
   "secrets": {
     "property1": {},
     "property2": {}
   },
   "parallelism": 0,
   "processingGuarantees": "ATLEAST_ONCE",
   "resources": {
     "cpu": 0,
     "ram": 0,
     "disk": 0
   },
   "archive": "string",
   "runtimeFlags": "string",
   "customRuntimeOptions": "string",
   "batchSourceConfig": {
     "discoveryTriggererClassName": "string",
     "discoveryTriggererConfig": {
       "property1": {},
       "property2": {}
     }
   },
   "batchBuilder": "string"
}

Delete

  • Pulsar Admin

  • cURL

Refer to the complete pulsar-admin sources spec for all available options.

Assuming you have downloaded client.conf to the Pulsar folder:

# Delete all instances of a connector
./bin/pulsar-admin sources delete \
  --namespace "$NAMESPACE" \
  --name "$SOURCE_NAME" \
  --tenant "$TENANT"

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

Refer to the complete Pulsar sources REST API spec, for all available options.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Delete all instances of a connector
curl -sS --fail -X DELETE "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

Monitoring the Connector

Info

  • Pulsar Admin

  • cURL

  • Sample Config Data

Assuming you have downloaded client.conf to the Pulsar folder:

# Get information about connector
./bin/pulsar-admin sources get \
  --namespace "$NAMESPACE" \
  --name "$SOURCE_NAME" \
  --tenant "$TENANT"

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Get a connector's information
curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \
  -H "accept: application/json" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"
{
  "topicName": "string",
  "producerConfig": {
    "maxPendingMessages": 0,
    "maxPendingMessagesAcrossPartitions": 0,
    "useThreadLocalProducers": true,
    "cryptoConfig": {
      "cryptoKeyReaderClassName": "string",
      "cryptoKeyReaderConfig": {
        "property1": {},
        "property2": {}
      },
      "encryptionKeys": [
        "string"
      ],
      "producerCryptoFailureAction": "FAIL",
      "consumerCryptoFailureAction": "FAIL"
    },
    "batchBuilder": "string"
  },
  "serdeClassName": "string",
  "schemaType": "string",
  "configs": {
    "property1": {},
    "property2": {}
  },
  "secrets": {
    "property1": {},
    "property2": {}
  },
  "parallelism": 0,
  "processingGuarantees": "ATLEAST_ONCE",
  "resources": {
    "cpu": 0,
    "ram": 0,
    "disk": 0
  },
  "archive": "string",
  "runtimeFlags": "string",
  "customRuntimeOptions": "string",
  "batchSourceConfig": {
    "discoveryTriggererClassName": "string",
    "discoveryTriggererConfig": {
      "property1": {},
      "property2": {}
    }
  },
  "batchBuilder": "string"
}

Health

  • Pulsar Admin

  • cURL

  • Response

Assuming you have downloaded the client.conf to the pulsar folder.

# Stop all instances of a connector
./bin/pulsar-admin sources status \
  --instance-id "$SOURCE_INSTANCEID" \
  --namespace "$NAMESPACE" \
  --name "$SOURCE_NAME" \
  --tenant "$TENANT"

You’ll need to create an Astra Streaming API token to be used with the REST API. This is different from your Astra tokens.

Navigate to the "Settings" area in the Astra Streaming UI and choose "Create Token".

Retrieve the web service URL from the "Connect" tab in the Astra Streaming UI.

export WEB_SERVICE_URL=<replace-me>
export ASTRA_STREAMING_TOKEN=<replace-me>
# Get the status of all connector instances
curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/status" \
  -H "accept: application/json" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

# Get the status of an individual connector instance
curl "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/status" \
  -H "accept: application/json" \
  -H "Authorization: $ASTRA_STREAMING_TOKEN"

Status response for all connector instances

{
  "numInstances": 0,
  "numRunning": 0,
  "instances": [
    {
      "instanceId": 0,
      "status": {
        "running": true,
        "error": "string",
        "numRestarts": 0,
        "numReceivedFromSource": 0,
        "numSystemExceptions": 0,
        "latestSystemExceptions": [
          {
            "exceptionString": "string",
            "timestampMs": 0
          }
        ],
        "numSourceExceptions": 0,
        "latestSourceExceptions": [
          {
            "exceptionString": "string",
            "timestampMs": 0
          }
        ],
        "numWritten": 0,
        "lastReceivedTime": 0,
        "workerId": "string"
      }
    }
  ]
}

Status response for individual connector instance

{
  "running": true,
  "error": "string",
  "numRestarts": 0,
  "numReceivedFromSource": 0,
  "numSystemExceptions": 0,
  "latestSystemExceptions": [
    {
      "exceptionString": "string",
      "timestampMs": 0
    }
  ],
  "numSourceExceptions": 0,
  "latestSourceExceptions": [
    {
      "exceptionString": "string",
      "timestampMs": 0
    }
  ],
  "numWritten": 0,
  "lastReceivedTime": 0,
  "workerId": "string"
}

Metrics

Astra Streaming exposes Prometheus formatted metrics for every connector. Refer to scrape metrics with Prometheus page for more detail.

Connector Reference

There are two sets of parameters that support source connectors.

Astra Streaming

Name Required Default Description

archive

true

The connector type, like 'builtin://debezium-mysql'

batchBuilder

false

BatchBuilder provides two types of batch construction methods, DEFAULT and KEY_BASED. The default value is: DEFAULT

batchSourceConfig

false

Batch source config key/value (as a JSON string)

className

true

The connector type’s class reference, like 'org.apache.pulsar.io.debezium.mysql.DebeziumMysqlSource'

configs

false

{}

JSON key/value config of source type specific settings. Example: {"property1":"1234","property2":{"subProperty":"asdf"}}

customRuntimeOptions

false

A string that encodes options to customize the runtime, see Apache Pulsar docs for configured runtime for details

name

true

Give your source a good name for later reference. The name must start with a lowercase alphabetic character. It can only contain lowercase alphanumeric characters, and hyphens (kebab-case).

namespace

true

The namespace you’d like the source created under

parallelism

true

1

The number of a Pulsar Source instances to run

processingGuarantees

true

ATLEAST_ONCE

The delivery semantics applied to the Pulsar Sink. Values are 'ATLEAST_ONCE', 'ATMOST_ONCE', 'EFFECTIVELY_ONCE'

producerConfig

false

The custom producer configuration (as a JSON string)

resources

false

The compute resources that need to be allocated per source instance (applicable only to the process)(as a JSON string). Example: {"cpu": 0.25,"disk":1000000000,"ram":500000000}

runtimeFlags

false

A string that encodes options to customize the runtime, see Apache Pulsar docs for configured runtime for details

schemaType

false

The schema type (either a builtin schema like 'avro', 'json', etc.. or custom Schema class name to be used to encode messages emitted from the Pulsar Source

secrets

false

This is a map of secretName(that is how the secret is going to be accessed in the function via context) to an object that encapsulates how the secret is fetched by the underlying secrets provider. The type of an value here can be found by the SecretProviderConfigurator.getSecretObjectType() method

serdeClassName

false

The SerDe classname for the Pulsar Source

tenant

true

The tenant you’d like the source created under

topicName

true

The name of an existing topic in Astra Streaming, where messages will be published to. Should be in the format of [non-]persistent://<tenant>/<namespace>/<topic-name>

Data Generator (configs)

These values are provided in the "configs" area.

Name Required Default Description

sleepBetweenMessages

false

50

How many seconds to sleep between emitting messages

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com