Kubernetes Autoscaling for Apache Pulsar (KAAP)
Kubernetes Autoscaling for Apache Pulsar (KAAP) simplifies running Apache Pulsar™ on Kubernetes by applying the familiar Operator pattern to the Pulsar components, and horizontally scaling resources up or down based on CPU and memory workloads.
Operating and maintaining Apache Pulsar clusters traditionally involves complex manual configurations, making it challenging for developers and operators to effectively manage the system’s lifecycle. However, with the KAAP Operator, these complexities are abstracted away, enabling developers to focus on their applications rather than the underlying infrastructure.
Benefits and features
Whether you are a developer looking to leverage the power of Apache Pulsar in your Kubernetes environment or an operator seeking to streamline the management of Pulsar clusters, the KAAP Operator provides benefits and features that make it a robust and user-friendly solution:
- Easy deployment
-
Deploying an Apache Pulsar cluster on Kubernetes is simplified through declarative configurations and automation provided by the operator.
Operators are a common pattern for packaging, deploying, and managing Kubernetes applications. Operators extend Kubernetes functionality to automate common tasks in stateful applications. Think of KAAP Operator as a manager for the individual components of Pulsar. By implementing the
PulsarClusterCRD, the operator knows enough to manage the deployment, configuration, and scaling of Pulsar components with re-usable and automated tasks, such as:-
Deploying a Pulsar cluster.
-
Deploying monitoring and logging components.
-
Autoscaling bookies based on memory usage, or brokers based on CPU load.
-
Assigning resources to specific availability zones (AZs).
KAAP Operator is configured, deployed, and packaged with Helm charts and based on the Quarkus Operator SDK.
-
- Scalability
-
The KAAP Operator enables effortless scaling of Pulsar clusters by automatically handling the creation and configuration of new Pulsar brokers and bookies as per defined rules. The broker autoscaling is integrated with the Pulsar broker load balancer to make smart resource management decisions, and bookkeepers are scaled up and down based on storage usage in a safe, controlled manner.
After a new custom resource (CR) type is added to your cluster by installing a CRD, you can create instances of the resource based on its specification. The Kubernetes API can be extended to support the new resource type, automating away the tedious aspects of managing a Pulsar cluster:
-
Apache BookKeeper™ autoscaler: Automatically scale the number of bookies based on memory usage.
-
Broker autoscaler: Automatically scale the number of brokers based on CPU load.
-
Rack-aware BookKeeper placement: Place bookies in different racks to guarantee high availability.
-
Kafka API: Use the Starlight for Kafka API to bring your Kafka message traffic to Pulsar.
-
- High availability
-
The operator implements best practices for high availability, ensuring that Pulsar clusters are fault-tolerant and can sustain failures without service disruptions.
- Lifecycle management
-
The operator takes care of common Pulsar cluster lifecycle tasks, such as cluster creation, upgrade, configuration updates, and graceful shutdowns.
- Extended management and monitoring with KAAP stack
-
The KAAP stack deploys more Kubernetes-native tooling with your Pulsar cluster. Along with the
PulsarClustercustom resource definition (CRD), KAAP stack also includes the following:-
KAAP Operator
-
Prometheus Stack (Grafana)
-
Pulsar Grafana dashboards
-
Cert Manager
-
Keycloak
-
Pulsar component architecture
A typical Pulsar cluster requires the following components:
-
Apache ZooKeeper™: This is the Pulsar metadata store. It stores data about a cluster’s configuration, helps the proxy direct messages to the correct broker, and holds Bookie configurations.
-
Broker: This is the Pulsar message router.
-
BookKeeper (bookie): This is the Pulsar data store. BookKeeper stores message data in a low-latency, resilient way.
In addition to the required components, you can include optional components:
-
Apache BookKeeper AutoRecovery: This is a Pulsar component that recovers BookKeeper data in the event of a bookie outage.
-
Pulsar proxy: This is a proxy that runs at the edge of the cluster with public facing endpoints and support for cluster extensions.
-
Dedicated functions workers: You can optionally run dedicated function workers in a Pulsar cluster.
-
Pulsar AdminConsole: This is an optional web-based admin console for managing Pulsar clusters.
-
Pulsar Heartbeat: This is an optional component that monitors the health of Pulsar cluster and emits metrics about the cluster that are helpful for observing and debugging issues.
-
Prometheus/Grafana/Alert manager stack: This is the default observability stack for a cluster. The DataStax Pulsar Helm chart includes pre-made dashboards in Grafana and pre-wires all the metrics scraping.
Get started with the KAAP Operator and KAAP stack
Use Helm to install KAAP Operator alone or with the extended management and monitoring capabilities of KAAP stack: