Kubernetes Autoscaling for Apache Pulsar (KAAP)

Kubernetes Autoscaling for Apache Pulsar (KAAP) simplifies running Apache Pulsar™ on Kubernetes by applying the familiar Operator pattern to the Pulsar components, and horizontally scaling resources up or down based on CPU and memory workloads.

Operating and maintaining Apache Pulsar clusters traditionally involves complex manual configurations, making it challenging for developers and operators to effectively manage the system’s lifecycle. However, with the KAAP Operator, these complexities are abstracted away, enabling developers to focus on their applications rather than the underlying infrastructure.

Benefits and features

Whether you are a developer looking to leverage the power of Apache Pulsar in your Kubernetes environment or an operator seeking to streamline the management of Pulsar clusters, the KAAP Operator provides benefits and features that make it a robust and user-friendly solution:

Easy deployment

Deploying an Apache Pulsar cluster on Kubernetes is simplified through declarative configurations and automation provided by the operator.

Operators are a common pattern for packaging, deploying, and managing Kubernetes applications. Operators extend Kubernetes functionality to automate common tasks in stateful applications. Think of KAAP Operator as a manager for the individual components of Pulsar. By implementing the PulsarCluster CRD, the operator knows enough to manage the deployment, configuration, and scaling of Pulsar components with re-usable and automated tasks, such as:

  • Deploying a Pulsar cluster.

  • Deploying monitoring and logging components.

  • Autoscaling bookies based on memory usage, or brokers based on CPU load.

  • Assigning resources to specific availability zones (AZs).

KAAP Operator is configured, deployed, and packaged with Helm charts and based on the Quarkus Operator SDK.

Scalability

The KAAP Operator enables effortless scaling of Pulsar clusters by automatically handling the creation and configuration of new Pulsar brokers and bookies as per defined rules. The broker autoscaling is integrated with the Pulsar broker load balancer to make smart resource management decisions, and bookkeepers are scaled up and down based on storage usage in a safe, controlled manner.

After a new custom resource (CR) type is added to your cluster by installing a CRD, you can create instances of the resource based on its specification. The Kubernetes API can be extended to support the new resource type, automating away the tedious aspects of managing a Pulsar cluster:

High availability

The operator implements best practices for high availability, ensuring that Pulsar clusters are fault-tolerant and can sustain failures without service disruptions.

Lifecycle management

The operator takes care of common Pulsar cluster lifecycle tasks, such as cluster creation, upgrade, configuration updates, and graceful shutdowns.

Extended management and monitoring with KAAP stack

The KAAP stack deploys more Kubernetes-native tooling with your Pulsar cluster. Along with the PulsarCluster custom resource definition (CRD), KAAP stack also includes the following:

  • KAAP Operator

  • Prometheus Stack (Grafana)

  • Pulsar Grafana dashboards

  • Cert Manager

  • Keycloak

Pulsar component architecture

A typical Pulsar cluster requires the following components:

  • Apache ZooKeeper™: This is the Pulsar metadata store. It stores data about a cluster’s configuration, helps the proxy direct messages to the correct broker, and holds Bookie configurations.

  • Broker: This is the Pulsar message router.

  • BookKeeper (bookie): This is the Pulsar data store. BookKeeper stores message data in a low-latency, resilient way.

In addition to the required components, you can include optional components:

  • Apache BookKeeper AutoRecovery: This is a Pulsar component that recovers BookKeeper data in the event of a bookie outage.

  • Pulsar proxy: This is a proxy that runs at the edge of the cluster with public facing endpoints and support for cluster extensions.

  • Dedicated functions workers: You can optionally run dedicated function workers in a Pulsar cluster.

  • Pulsar AdminConsole: This is an optional web-based admin console for managing Pulsar clusters.

  • Pulsar Heartbeat: This is an optional component that monitors the health of Pulsar cluster and emits metrics about the cluster that are helpful for observing and debugging issues.

  • Prometheus/Grafana/Alert manager stack: This is the default observability stack for a cluster. The DataStax Pulsar Helm chart includes pre-made dashboards in Grafana and pre-wires all the metrics scraping.

Get started with the KAAP Operator and KAAP stack

Use Helm to install KAAP Operator alone or with the extended management and monitoring capabilities of KAAP stack:

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM