Broker autoscaler

The operator scales the number of broker pods in a cluster up and down based on current CPU usage. The CPU usage of each broker is checked at the Pulsar load balancer, not just at the Kubenetes pod level. This means that the operator can scale brokers based on the CPU usage of all brokers in the cluster, not just the CPU usage of a single broker pod. When the operator sees that the Pulsar load balancer is having trouble finding brokers to assign topic bundles to, it will scale up the number of brokers to handle the load. When the operator sees that the CPU usage of all brokers is low, it will scale down the number of brokers to save resources. CPU usage is tightly coupled to traffic, so you can expect to see significant scaling activity with broker autoscaler enabled. This value can be controlled with the stabilizationWindowMs parameter, which tells the operator how long to wait between scaling events.

Install Operator with broker autoscaler enabled

helm install pulsar-operator helm/pulsar-stack \
    --values helm/examples/broker-autoscaling/values.yaml

The operator’s thresholds are set in the values.yaml file.

      broker:
        replicas: 2
        autoscaler:
          enabled: true
          periodMs: 20000
          min: 2
          max: 10
          lowerCpuThreshold: 0.4
          higherCpuThreshold: 0.8
          scaleUpBy: 1
          scaleDownBy: 1
          stabilizationWindowMs: 60000

Broker autoscaler configuration
Name	Type	Description	Notes
enabled	Boolean	Enable autoscaling for brokers.
higherCpuThreshold	Double	The threshold to trigger a scale up. The autoscaler will scale up if all the brokers' CPU usage is higher than this threshold.	Default is 0.8. Min 0.0, Max 1.0.
lowerCpuThreshold	Double	The threshold to trigger a scale down. The autoscaler will scale down if all the brokers' CPU usage is lower than this threshold.	Default is 0.4. Min(0.0), Max(1.0)
max	Integer	Maximum number of brokers. If the number of brokers is equal to this value, the autoscaler will never scale up.	Min 1.
min	Integer	Minimum number of brokers. If the number of brokers is equal to this value, the autoscaler will never scale down.	Min 1.
periodMs	Long	The interval in milliseconds between two consecutive autoscaling checks.	Minimum value is 1000 ms.
scaleDownBy	Integer	The number of brokers to remove at each scale down.	Default is 1. Min 1.
scaleUpBy	Integer	The number of brokers to add at each scale up.	Default is 1. Min 1.
stabilizationWindowMs	Long	The stabilization window restricts rapid changes in replica count when the metrics used for scaling are fluctuating. The autoscaling algorithm uses this window to infer a previous desired state and avoid unwanted changes to workload scale.	Default value is 5 minutes after the pod’s state is Ready.

Test broker autoscaler

Once you’ve deployed a Pulsar cluster with broker autoscaling enabled, test it by adding load to the cluster and watching the operator pod’s logs.

If you don’t have a bastion pod, you can add it to your cluster in the same values.yaml file you used to deploy the operator.

bastion:
  replicas: 1
  resources:
    requests:
      cpu: "0.1"
      memory: "128Mi"

Exec into your bastion pod.

kubectl exec --stdin --tty <pulsar-bastion-74777cbbf9-blb4t> -- /bin/bash

Run a Pulsar perf test in your deployment, and follow the operator’s logs to see the autoscaler in action.

Bastion pod
Result

bin/pulsar-perf produce topic

2023-05-19T14:39:34,726+0000 [pulsar-perf-producer-exec-1-1] INFO  org.apache.pulsar.testclient.PerformanceProducer - Created 1 producers
2023-05-19T14:39:34,778+0000 [pulsar-client-io-2-1] INFO  com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized
2023-05-19T14:39:43,190+0000 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     817 msg ---     81.7 msg/s ---      0.6 Mbit/s  --- failure      0.0 msg/s --- Latency: mean:  12.008 ms - med:  10.571 - 95pct:  20.821 - 99pct:  32.194 - 99.9pct:  46.759 - 99.99pct:  56.243 - Max:  56.243

The operator notices the differing values and patches the broker-set to keep up with the increased CPU usage of the broker pods.

17:06:09 INFO  [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-0 cpu usage: 7.000000 %
│ 17:06:10 INFO  [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-1 cpu usage: 15.000001 %
│ 17:06:10 INFO  [com.dat.oss.pul.aut.BrokerSetAutoscaler] (pool-9-thread-1) Scaled brokers for broker set broker from 2 to 3
│ 17:06:10 INFO  [com.dat.oss.pul.crd.SpecDiffer] (ReconcilerExecutor-pulsar-broker-controller-67) 'broker.replicas' value differs:
was:2
now: 3

The default for lowerCpuThreshold is 0.4 and higherCpuThreshold is 0.8. You may need to set these values lower in values.yaml to trigger broker scaling.

Cancel the Pulsar perf test with Ctrl-C. The operator will notice the decreased load and scale down the number of brokers. Notice that the operator scales down the number of brokers by 1 at a time, as specified in the scaleDownBy parameter, and properly decommissions them.

17:18:39 INFO  [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-6 cpu usage: 2.000000 %
17:18:39 INFO  [com.dat.oss.pul.aut.BrokerSetAutoscaler] (pool-9-thread-1) Scaled brokers for broker set broker from 7 to 6
17:18:39 INFO  [com.dat.oss.pul.crd.SpecDiffer] (ReconcilerExecutor-pulsar-broker-controller-74) 'broker.replicas' value differs:
was: 7
now: 6                                                                                                                                          │
│ 17:18:40 INFO  [com.dat.oss.pul.con.AbstractResourceSetsController] (ReconcilerExecutor-pulsar-broker-controller-74) broker-set 'broker' patched

Broker autoscaler

Install Operator with broker autoscaler enabled

Test broker autoscaler

Was this helpful?

Give Feedback