Broker autoscaler

The operator scales the number of broker pods in a cluster up and down based on current CPU usage. The CPU usage of each broker is checked at the Pulsar load balancer, not just at the Kubenetes pod level. This means that the operator can scale brokers based on the CPU usage of all brokers in the cluster, not just the CPU usage of a single broker pod. When the operator sees that the Pulsar load balancer is having trouble finding brokers to assign topic bundles to, it will scale up the number of brokers to handle the load. When the operator sees that the CPU usage of all brokers is low, it will scale down the number of brokers to save resources. CPU usage is tightly coupled to traffic, so you can expect to see significant scaling activity with broker autoscaler enabled. This value can be controlled with the stabilizationWindowMs parameter, which tells the operator how long to wait between scaling events.

Install Operator with broker autoscaler enabled

helm install pulsar-operator helm/pulsar-stack \
    --values helm/examples/broker-autoscaling/values.yaml

The operator’s thresholds are set in the values.yaml file.

      broker:
        replicas: 2
        autoscaler:
          enabled: true
          periodMs: 20000
          min: 2
          max: 10
          lowerCpuThreshold: 0.4
          higherCpuThreshold: 0.8
          scaleUpBy: 1
          scaleDownBy: 1
          stabilizationWindowMs: 60000
Broker autoscaler configuration
Name Type Description Notes

enabled

Boolean

Enable autoscaling for brokers.

higherCpuThreshold

Double

The threshold to trigger a scale up. The autoscaler will scale up if all the brokers' CPU usage is higher than this threshold.

Default is 0.8. Min 0.0, Max 1.0.

lowerCpuThreshold

Double

The threshold to trigger a scale down. The autoscaler will scale down if all the brokers' CPU usage is lower than this threshold.

Default is 0.4. Min(0.0), Max(1.0)

max

Integer

Maximum number of brokers. If the number of brokers is equal to this value, the autoscaler will never scale up.

Min 1.

min

Integer

Minimum number of brokers. If the number of brokers is equal to this value, the autoscaler will never scale down.

Min 1.

periodMs

Long

The interval in milliseconds between two consecutive autoscaling checks.

Minimum value is 1000 ms.

scaleDownBy

Integer

The number of brokers to remove at each scale down.

Default is 1. Min 1.

scaleUpBy

Integer

The number of brokers to add at each scale up.

Default is 1. Min 1.

stabilizationWindowMs

Long

The stabilization window restricts rapid changes in replica count when the metrics used for scaling are fluctuating. The autoscaling algorithm uses this window to infer a previous desired state and avoid unwanted changes to workload scale.

Default value is 5 minutes after the pod’s state is Ready.

Test broker autoscaler

Once you’ve deployed a Pulsar cluster with broker autoscaling enabled, test it by adding load to the cluster and watching the operator pod’s logs.

If you don’t have a bastion pod, you can add it to your cluster in the same values.yaml file you used to deploy the operator.

bastion:
  replicas: 1
  resources:
    requests:
      cpu: "0.1"
      memory: "128Mi"
  1. Exec into your bastion pod.

    kubectl exec --stdin --tty <pulsar-bastion-74777cbbf9-blb4t> -- /bin/bash
  2. Run a Pulsar perf test in your deployment, and follow the operator’s logs to see the autoscaler in action.

    • Bastion pod

    • Result

    bin/pulsar-perf produce topic
    2023-05-19T14:39:34,726+0000 [pulsar-perf-producer-exec-1-1] INFO  org.apache.pulsar.testclient.PerformanceProducer - Created 1 producers
    2023-05-19T14:39:34,778+0000 [pulsar-client-io-2-1] INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
    2023-05-19T14:39:43,190+0000 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     817 msg ---     81.7 msg/s ---      0.6 Mbit/s  --- failure      0.0 msg/s --- Latency: mean:  12.008 ms - med:  10.571 - 95pct:  20.821 - 99pct:  32.194 - 99.9pct:  46.759 - 99.99pct:  56.243 - Max:  56.243
  3. The operator notices the differing values and patches the broker-set to keep up with the increased CPU usage of the broker pods.

    17:06:09 INFO  [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-0 cpu usage: 7.000000 %
    │ 17:06:10 INFO  [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-1 cpu usage: 15.000001 %
    │ 17:06:10 INFO  [com.dat.oss.pul.aut.BrokerSetAutoscaler] (pool-9-thread-1) Scaled brokers for broker set broker from 2 to 3
    │ 17:06:10 INFO  [com.dat.oss.pul.crd.SpecDiffer] (ReconcilerExecutor-pulsar-broker-controller-67) 'broker.replicas' value differs:
    was:2
    now: 3

    The default for lowerCpuThreshold is 0.4 and higherCpuThreshold is 0.8. You may need to set these values lower in values.yaml to trigger broker scaling.

  4. Cancel the Pulsar perf test with Ctrl-C. The operator will notice the decreased load and scale down the number of brokers. Notice that the operator scales down the number of brokers by 1 at a time, as specified in the scaleDownBy parameter, and properly decommissions them.

    17:18:39 INFO  [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-6 cpu usage: 2.000000 %
    17:18:39 INFO  [com.dat.oss.pul.aut.BrokerSetAutoscaler] (pool-9-thread-1) Scaled brokers for broker set broker from 7 to 6
    17:18:39 INFO  [com.dat.oss.pul.crd.SpecDiffer] (ReconcilerExecutor-pulsar-broker-controller-74) 'broker.replicas' value differs:
    was: 7
    now: 6                                                                                                                                          │
    │ 17:18:40 INFO  [com.dat.oss.pul.con.AbstractResourceSetsController] (ReconcilerExecutor-pulsar-broker-controller-74) broker-set 'broker' patched

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com