Broker autoscaler
The operator scales the number of broker pods in a cluster up and down based on current CPU usage.
The CPU usage of each broker is checked at the Pulsar load balancer, not just at the Kubenetes pod level. This means that the operator can scale brokers based on the CPU usage of all brokers in the cluster, not just the CPU usage of a single broker pod.
When the operator sees that the Pulsar load balancer is having trouble finding brokers to assign topic bundles to, it will scale up the number of brokers to handle the load.
When the operator sees that the CPU usage of all brokers is low, it will scale down the number of brokers to save resources.
CPU usage is tightly coupled to traffic, so you can expect to see significant scaling activity with broker autoscaler enabled. This value can be controlled with the stabilizationWindowMs
parameter, which tells the operator how long to wait between scaling events.
Install Operator with broker autoscaler enabled
helm install pulsar-operator helm/pulsar-stack \
--values helm/examples/broker-autoscaling/values.yaml
The operator’s thresholds are set in the values.yaml file.
broker:
replicas: 2
autoscaler:
enabled: true
periodMs: 20000
min: 2
max: 10
lowerCpuThreshold: 0.4
higherCpuThreshold: 0.8
scaleUpBy: 1
scaleDownBy: 1
stabilizationWindowMs: 60000
Name | Type | Description | Notes |
---|---|---|---|
enabled |
Boolean |
Enable autoscaling for brokers. |
|
higherCpuThreshold |
Double |
The threshold to trigger a scale up. The autoscaler will scale up if all the brokers' CPU usage is higher than this threshold. |
Default is 0.8. Min 0.0, Max 1.0. |
lowerCpuThreshold |
Double |
The threshold to trigger a scale down. The autoscaler will scale down if all the brokers' CPU usage is lower than this threshold. |
Default is 0.4. Min(0.0), Max(1.0) |
max |
Integer |
Maximum number of brokers. If the number of brokers is equal to this value, the autoscaler will never scale up. |
Min 1. |
min |
Integer |
Minimum number of brokers. If the number of brokers is equal to this value, the autoscaler will never scale down. |
Min 1. |
periodMs |
Long |
The interval in milliseconds between two consecutive autoscaling checks. |
Minimum value is 1000 ms. |
scaleDownBy |
Integer |
The number of brokers to remove at each scale down. |
Default is 1. Min 1. |
scaleUpBy |
Integer |
The number of brokers to add at each scale up. |
Default is 1. Min 1. |
stabilizationWindowMs |
Long |
The stabilization window restricts rapid changes in replica count when the metrics used for scaling are fluctuating. The autoscaling algorithm uses this window to infer a previous desired state and avoid unwanted changes to workload scale. |
Default value is 5 minutes after the pod’s state is Ready. |
Test broker autoscaler
Once you’ve deployed a Pulsar cluster with broker autoscaling enabled, test it by adding load to the cluster and watching the operator pod’s logs.
If you don’t have a bastion pod, you can add it to your cluster in the same values.yaml file you used to deploy the operator.
|
-
Exec into your bastion pod.
kubectl exec --stdin --tty <pulsar-bastion-74777cbbf9-blb4t> -- /bin/bash
-
Run a Pulsar perf test in your deployment, and follow the operator’s logs to see the autoscaler in action.
-
Bastion pod
-
Result
bin/pulsar-perf produce topic
2023-05-19T14:39:34,726+0000 [pulsar-perf-producer-exec-1-1] INFO org.apache.pulsar.testclient.PerformanceProducer - Created 1 producers 2023-05-19T14:39:34,778+0000 [pulsar-client-io-2-1] INFO com.scurrilous.circe.checksum.Crc32cIntChecksum - SSE4.2 CRC32C provider initialized 2023-05-19T14:39:43,190+0000 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 817 msg --- 81.7 msg/s --- 0.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12.008 ms - med: 10.571 - 95pct: 20.821 - 99pct: 32.194 - 99.9pct: 46.759 - 99.99pct: 56.243 - Max: 56.243
-
-
The operator notices the differing values and patches the broker-set to keep up with the increased CPU usage of the broker pods.
17:06:09 INFO [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-0 cpu usage: 7.000000 % │ 17:06:10 INFO [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-1 cpu usage: 15.000001 % │ 17:06:10 INFO [com.dat.oss.pul.aut.BrokerSetAutoscaler] (pool-9-thread-1) Scaled brokers for broker set broker from 2 to 3 │ 17:06:10 INFO [com.dat.oss.pul.crd.SpecDiffer] (ReconcilerExecutor-pulsar-broker-controller-67) 'broker.replicas' value differs: was:2 now: 3
The default for lowerCpuThreshold is
0.4
and higherCpuThreshold is0.8
. You may need to set these values lower in values.yaml to trigger broker scaling. -
Cancel the Pulsar perf test with Ctrl-C. The operator will notice the decreased load and scale down the number of brokers. Notice that the operator scales down the number of brokers by 1 at a time, as specified in the
scaleDownBy
parameter, and properly decommissions them.17:18:39 INFO [com.dat.oss.pul.aut.bro.LoadReportResourceUsageSource] (pool-9-thread-1) Broker pulsar-broker-6 cpu usage: 2.000000 % 17:18:39 INFO [com.dat.oss.pul.aut.BrokerSetAutoscaler] (pool-9-thread-1) Scaled brokers for broker set broker from 7 to 6 17:18:39 INFO [com.dat.oss.pul.crd.SpecDiffer] (ReconcilerExecutor-pulsar-broker-controller-74) 'broker.replicas' value differs: was: 7 now: 6 │ │ 17:18:40 INFO [com.dat.oss.pul.con.AbstractResourceSetsController] (ReconcilerExecutor-pulsar-broker-controller-74) broker-set 'broker' patched