Apache Pulsar Functions

Functions are lightweight compute processes that enable you to process each message received on a topic. You can apply custom logic to that message, transforming or enriching it, and then output it to a different topic.

Functions run inside Astra Streaming and are therefore serverless. You write the code for your function in Java, Python, or Go, then upload the code. It will be automatically run for each message published to the specified input topic.

Functions are implemented using Apache Pulsar® functions. See Pulsar Functions overview for more information about Pulsar functions.

Custom functions can only be created by qualified organizations. A qualified organization is an organization on the Pay As You Go plan with a payment method on file. Upgrade your organization to a qualified organization by:

Unqualified orgs can still use transform functions.

Deploy Python functions in a ZIP file

Astra Streaming supports Python-based Pulsar Functions. These functions can be packaged in a ZIP file and deployed to Astra Streaming or Pulsar. The same ZIP file can be deployed to either environment. We’ll create a ZIP file in the proper format, then use the pulsar-admin command to deploy the ZIP. We’ll pass a “create function" configuration file (a .yaml file) as a parameter to pulsar-admin, which defines the Pulsar Function options and parameters.

Assuming the ZIP file is named testpythonfunction.zip, an unzipped testpythonfunction.zip folder looks like this:

/my-python-function
   python-code/my-python-function.zip
   python-code/deps/sh-1.12.14-py2.py3-none-any.whl
   python-code/src/my-python-function.py
  1. To deploy a ZIP, first create the proper ZIP file directory structure. That file format/layout looks like this:

    mkdir my-python-function
    mkdir my-python-function/python-code
    mkdir my-python-function/python-code/deps/
    mkdir my-python-function/python-code/src/
    
    touch my-python-function/python-code/src/my-python-function.py
  2. Add your code to my-python-function.py. For this example, we’ll just use a basic exclamation function:

    from pulsar import Function
    
    class ExclamationFunction(Function):
      def __init__(self):
        pass
    
      def process(self, input, context):
        return input + '!'
  3. Add your dependencies to the /deps folder. For this example, we’ll use the pulsar-client library.

    cd deps
    pip install pulsar-client==2.10.0
  4. Run the following command to add my-pulsar-function.zip to the root of the file structure:

    cd deps
    zip -r ../my-python-function.zip .
      adding: sh-1.12.14-py2.py3-none-any.whl (deflated 2%)
  5. Ensure your package has the ZIP file at the root of the file structure:

    python-code ls -al
    deps
    my-python-function.zip
    src

Deploy a Python function with configuration file

  1. Create a deployment configuration file. In this example we’ll call this file “func-create-config.yaml”. This file will be passed to the pulsar-admin create function command.
    The contents of the YAML file should be:

    py: </absolute/path/to/my/testpythonfunction.zip>
    className: pythonfunc.ExclamationFunction
    parallelism: 1
    inputs:
     - persistent://mytenant/n0/t1
    output: persistent://mytenant/ns/t2
    autoAck: true
    tenant: mytenant
    namespace: ns0
    name: testpythonfunction
    logTopic:
    userConfig:
     logging_level: ERROR
  2. Use pulsar-admin to deploy the Python ZIP to Astra Streaming or Pulsar. The command below assumes you’ve properly configured the client.conf file for pulsar-admin commands against your Pulsar cluster. If you are using Astra Streaming see the documentation here for more information.

    bin/pulsar-admin functions create --function-config-file </absolute/path/to/your/python/func-create-config.yml>
  3. Check results: Go to the Astra Portal to see your newly deployed function listed under the “Functions” tab for your Tenant. See Controlling your function for more information on testing and monitoring your function in Astra Streaming.

    1. You can also use the pulsar-admin command to list your functions:

      bin/pulsar-admin functions list --tenant <mytenant> --namespace <my-namespace>

Deploy Java functions in a JAR file

Astra Streaming supports Java-based Pulsar Functions which are packaged in a JAR file. The JAR can be deployed to Astra Streaming or Pulsar. The same JAR file can be deployed to either environment.

We’ll create a JAR file using Maven, then use the pulsar-admin command to deploy the JAR. We’ll pass a "create function" configuration file (a .yaml file) as a parameter to pulsar-admin, which defines the Pulsar function options and parameters.

  1. To deploy a JAR, first create the proper JAR with the Java code of the Pulsar Function. An example pom.xml file is shown below:

    Function pom.xml
     <?xml version="1.0" encoding="UTF-8"?>
     <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
         <modelVersion>4.0.0</modelVersion>
    
         <groupId>java-function</groupId>
         <artifactId>java-function</artifactId>
         <version>1.0-SNAPSHOT</version>
    
         <dependencies>
             <dependency>
                 <groupId>org.apache.pulsar</groupId>
                 <artifactId>pulsar-functions-api</artifactId>
                 <version>3.0.0</version>
             </dependency>
         </dependencies>
    
         <build>
             <plugins>
                 <plugin>
                     <artifactId>maven-assembly-plugin</artifactId>
                     <configuration>
                         <appendAssemblyId>false</appendAssemblyId>
                         <descriptorRefs>
                             <descriptorRef>jar-with-dependencies</descriptorRef>
                         </descriptorRefs>
                         <archive>
                         <manifest>
                             <mainClass>org.example.test.ExclamationFunction</mainClass>
                         </manifest>
                     </archive>
                     </configuration>
                     <executions>
                         <execution>
                             <id>make-assembly</id>
                             <phase>package</phase>
                             <goals>
                                 <goal>assembly</goal>
                             </goals>
                         </execution>
                     </executions>
                 </plugin>
                 <plugin>
                     <groupId>org.apache.maven.plugins</groupId>
                     <artifactId>maven-compiler-plugin</artifactId>
                     <version>3.11.0</version>
                     <configuration>
                         <release>17</release>
                     </configuration>
                 </plugin>
             </plugins>
         </build>
    
     </project>
  2. Package the JAR file with Maven.

    • Maven

    • Result

    mvn package
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  10.989 s
    [INFO] Finished at: 2023-05-16T16:19:05-04:00
    [INFO] ------------------------------------------------------------------------
  3. Create a deployment configuration file. In this example we’ll call this file “func-create-config.yaml”. This file will be passed to the pulsar-admin create function command.
    The contents of the YAML file should be:

    jar: </absolute/path/to/my/testpythonfunction.jar>
    className: com.example.pulsar.ExclamationFunction
    parallelism: 1
    inputs:
      - persistent://mytenant/n0/t1
    output: persistent://mytenant/ns/t2
    autoAck: true
    tenant: mytenant
    namespace: ns0
    name: testjarfunction
    logTopic:
    userConfig:
      logging_level: ERROR

    Astra Streaming requires the “inputs” topic to have a message schema defined before deploying the function. Otherwise, deployment errors may occur. Use the Astra Portal to define the message schema for a topic.

  4. Use pulsar-admin to deploy your new JAR to Astra Streaming or Pulsar. The command below assumes you’ve properly configured the client.conf file for pulsar-admin commands against your Pulsar cluster. If you are using Astra Streaming see the documentation here for more information.

    bin/pulsar-admin functions create --function-config-file  </absolute/path/to/your/javajar/func-create-config.yml>
  5. Check results: Go to the Astra Portal to see your newly deployed function listed under the “Functions” tab for your Tenant. See Controlling your function for more information on testing and monitoring your function in Astra Streaming.

    1. You can also use the pulsar-admin command to list your functions:

      bin/pulsar-admin functions list --tenant <mytenant> --namespace <my-namespace>

Add functions in Astra Streaming dashboard

Add functions in the Functions tab of the Astra Streaming dashboard.

  1. Select Create Function to get started.

  2. Choose your function name and namespace.

    Function and Namespace
  3. Select the file you want to pull the function from and which function you want to use within that file.

Astra generates a list of acceptable classes. Python and Java functions are added a little differently from each other.

Python functions are added by loading a Python file (.py) or a zipped Python file (.zip). When adding Python files, the Class Name is specified as the name of the Python file without the extension plus the class you want to execute.

For example, if the Python file is called testfunction.py and the class is ExclamationFunction, then the class name is testfunction.ExclamationFunction. The file can contain multiple classes, but only one is used. If there is no class in the Python file (when using a basic function, for example), specify the filename without the extension (ex. function).

Java functions are added by loading a Java jar file (.jar). When adding Java files, you also need to specify the name of the class to execute as the function.

Exclamation Function
  1. Choose your input topics.

    IO Topics

  2. Choose Optional Destination Topics for output and logging.

    Optional Topics

  3. Choose Advanced Options and run at least one sink instance.

    Advanced Configuration

  4. Choose your Processing Guarantee. The default value is ATLEAST_ONCE. Processing Guarantee offers three options:

    • ATLEAST_ONCE: Each message sent to the function can be processed more than once.

    • ATMOST_ONCE: The message sent to the function is processed at most once. Therefore, there is a chance that the message is not processed.

    • EFFECTIVELY_ONCE: Each message sent to the function will have one output associated with it.

  5. Provide an Option Configuration Key. See the Pulsar Docs for a list of configuration keys.

    Provide Config Key

  6. Select Create.

You have created a function for this namespace. You can confirm your function was created in the Functions tab.

Add function with Pulsar CLI

You can also add functions using the Pulsar CLI. We will create a new Python function to consume a message from one topic, add an exclamation point, and publish the results to another topic.

  1. Create the following Python function in testfunction.py:

    from pulsar import Function
    
    class ExclamationFunction(Function):
      def __init__(self):
        pass
    
      def process(self, input, context):
        return input + '!'
  2. Deploy testfunction.py to your Pulsar cluster using the Pulsar CLI:

    $ ./pulsar-admin functions create \
      --py /full/path/to/testfunction.py \
      --classname testfunction.ExclamationFunction \
      --tenant <tenant-name> \
      --namespace default \
      --name exclamation \
      --auto-ack true \
      --inputs persistent://<tenant-name>/default/in \
      --output persistent://<tenant-name>/default/out \
      --log-topic persistent://<tenant-name>/default/log

    You will see "Created Successfully!" if the function is set up and ready to accept messages.

    If you receive a 402 error with "Reason: only qualified organizations can create functions", this means your organization needs to be upgraded to a Pay As You Go plan with a payment method. A qualified organization is an organization on the Pay As You Go plan with a payment method on file. Upgrade your organization to a qualified organization by:

  3. Use ./pulsar-admin functions list --tenant <tenant-name> to list the functions in your tenant and confirm your new function was created.

Testing Your Function

Triggering a function is a convenient way to test that the function is working. When you trigger a function, you are publishing a message on the function’s input topic, which triggers the function to run. If the function has an output topic and the function returns data to the output topic, that data is displayed.

Send a test value with Pulsar CLI’s trigger to test a function you’ve set up.

  1. Listen for messages on the output topic:

    $ ./pulsar-client consume persistent://<tenant-name>/default/<topic-name> \
      --subscription-name my-subscription \
      --num-messages 0 # Listen indefinitely
  2. Test your exclamation function with trigger:

    $ ./pulsar-admin functions trigger \
      --name exclamation \
      --tenant <tenant-name> \
      --namespace default \
      --trigger-value "Hello world"

    The trigger sends the string Hello world to your exclamation function. Your function should output Hello world! to your consumed output.

Controlling Your Function

You can start, stop, and restart your function by selecting it in the Functions dashboard.

Function Controls

Monitoring Your Function

Functions produce logs to help you in debugging. To view your function’s logs, open your function in the Functions dashboard.

Function Log

In the upper right corner of the function log are controls to Refresh, Copy to Clipboard, and Save your function log.

Updating Your Function

A function that is already running can be updated with new configuration. The following settings can be updated:

  • Function code

  • Output topic

  • Log topic

  • Number of instances

  • Configuration keys

If you need to update any other setting of the function, delete and then re-add the function.

To update your function, select your function in the Functions dashboard.

Update Function
  1. Select Change File to find your function locally and click Open.

  2. Update your function’s Instances and Timeout. When you’re done, click Update.

  3. An Updates Submitted Successfully flag will appear to let you know your function has been updated.

Deleting Your Function

To delete a function, select the function to be deleted in the Functions dashboard.

Delete Function
  1. Click Delete.

  2. A popup will ask you to confirm deletion by entering the function’s name and clicking Delete.

  3. A Function-name Deleted Successfully! flag will appear to let you know you’ve deleted your function.

Pulsar functions video

Follow along with this video from our Five Minutes About Pulsar series to see a Pulsar Python function in action.

Next

Learn more about developing functions for Astra Streaming and Pulsar here.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com