Kubernetes Autoscaling for Apache Pulsar (KAAP)
Kubernetes Autoscaling for Apache Pulsar (KAAP) simplifies running Apache Pulsar on Kubernetes by applying the familiar Operator pattern to Pulsar’s components, and horizonally scaling resources up or down based on CPU and memory workloads.
Operating and maintaining Apache Pulsar clusters traditionally involves complex manual configurations, making it challenging for developers and operators to effectively manage the system’s lifecycle. However, with the KAAP operator, these complexities are abstracted away, enabling developers to focus on their applications rather than the underlying infrastructure.
Some of the key features and benefits of the KAAP operator include:
-
Easy Deployment: Deploying an Apache Pulsar cluster on Kubernetes is simplified through declarative configurations and automation provided by the operator.
-
Scalability: The KAAP operator enables effortless scaling of Pulsar clusters by automatically handling the creation and configuration of new Pulsar brokers and bookies as per defined rules. The broker autoscaling is integrated with the Pulsar broker load balancer to make smart resource management decisions, and bookkeepers are scaled up and down based on storage usage in a safe, controlled manner.
-
High Availability: The operator implements best practices for high availability, ensuring that Pulsar clusters are fault-tolerant and can sustain failures without service disruptions.
-
Lifecycle Management: The operator takes care of common Pulsar cluster lifecycle tasks, such as cluster creation, upgrade, configuration updates, and graceful shutdowns.
We also offer the KAAP stack if you’re looking for more Kubernetes-native tooling deployed with your Pulsar cluster. Along with the PulsarCluster CRDs, KAAP stack also includes:
-
Pulsar Operator
-
Prometheus Stack (Grafana)
-
Pulsar Grafana dashboards
-
Cert Manager
-
Keycloak
Whether you are a developer looking to leverage the power of Apache Pulsar in your Kubernetes environment or an operator seeking to streamline the management of Pulsar clusters, the KAAP Operator provides a robust and user-friendly solution.
This guide offers a starting point for KAAP Operator. We will cover installation and deployment, configuration points, and further options for managing Pulsar components with the KAAP Operator.
If you’re upgrading from KAAP v0.1.0 to v0.2.0, you must upgrade the CRDs to v1beta1. See Upgrading to v0.2.0. |
Features
After a new custom resource type is added to your cluster by installing a CRD, you can create instances of the resource based on its specification. The Kubernetes API can be extended to support the new resource type, automating away the tedious aspects of managing a Pulsar cluster.
-
Bookkeeper autoscaler - Automatically scale the number of bookies based on memory usage.
-
Broker autoscaler - Automatically scale the number of brokers based on CPU load.
-
Rack-aware bookkeeper placement - Place bookies in different racks to guarantee high availability.
-
Kafka API - Use the Starlight for Kafka API to bring your Kafka message traffic to Pulsar.
How KAAP Operator makes Pulsar easier
Operators are a common pattern for packaging, deploying, and managing Kubernetes applications. Operators extend Kubernetes functionality to automate common tasks in stateful applications. Think of KAAP Operator as a manager for the individual components of Pulsar. By implementing the pulsarCluster Custom Resource Definition, the operator knows enough to manage the deployment, configuration, and scaling of Pulsar components with re-usable and automated tasks, such as:
-
Deploying a Pulsar cluster
-
Deploying monitoring and logging components
-
Autoscaling bookies based on memory usage, or brokers based on CPU load
-
Assigning resources to specific availability zones (AZs)
KAAP Operator is configured, deployed, and packaged with Helm charts and based on the Quarkus Operator SDK.
Pulsar component architecture
A typical Pulsar cluster requires the following components:
-
https://pulsar.apache.org/docs/concepts-architecture-overview/#metadata-store[Zookeeper - This is Pulsar’s meta data store. It stores data about a cluster’s configuration, helps the proxy direct messages to the correct broker, and holds Bookie configurations.
-
https://pulsar.apache.org/docs/concepts-architecture-overview/#brokers[Broker - This is Pulsar’s message router.
-
https://pulsar.apache.org/docs/concepts-architecture-overview/#apache-bookkeeper[Bookkeeper (bookie) - This is Pulsar’s data store. Bookkeeper stores message data in a low-latency, resilient way.
In addition to the required components, you might want to include some optional components:
-
https://bookkeeper.apache.org/docs/admin/autorecovery[Bookkeeper AutoRecovery - This is a Pulsar component that recovers Bookkeeper data in the event of a bookie outage.
-
https://pulsar.apache.org/docs/functions-worker-run-separately/[Dedicated functions worker(s) - You can optionally run dedicated function workers in a Pulsar cluster.
-
Pulsar AdminConsole - This is an optional web-based admin console for managing Pulsar clusters.
-
Pulsar Heartbeat - This is an optional component that monitors the health of Pulsar cluster and emits metrics about the cluster that are helpful for observing and debugging issues.
-
Prometheus/Grafana/Alert manager stack - This is the default observability stack for a cluster. The Luna Helm chart includes pre-made dashboards in Grafana and pre-wires all the metrics scraping.
How KAAP Operator installs Pulsar
KAAP Operator can be installed in two ways.
-
Pulsar Operator - Installs just the operator and PulsarCluster CRDs into an existing Pulsar cluster.
-
Pulsar Stack - Installs and deploys the operator, a Pulsar cluster, and a full Prometheus monitoring stack.
You can also scan an existing Pulsar cluster and generate an equivalent PulsarCluster CRD. For more, see Migrate existing cluster to KAAP operator. |
To get started, see Getting Started.