Quick Start for Helm Chart installs
You have options for installing DataStax Luna Streaming:
-
Via the provided DataStax Helm chart for an existing Kubernetes environment locally or with a cloud provider, as covered in this topic.
-
Via the DataStax Luna Streaming tarball for deployment to a single server/VM, or to multiple servers/VMs. See Quick Start for Server/VM installs.
-
Via the DataStax Ansible scripts provided at https://github.com/datastax/pulsar-ansible.
The Helm chart and options described below configure an Apache Pulsar cluster. It is designed for production use, but can also be used in local development environments with the proper settings.
The resulting configuration includes support for:
-
WebSocket Proxy
-
Standalone Functions Workers
-
Pulsar IO Connectors
-
Tiered Storage including Tardigarde distributed cloud storage
-
Pulsar Admin Console for managing the cluster
-
Pulsar heartbeat
-
Burnell for API-based token generation
-
Prometheus, Grafana, and Alertmanager stack with default Grafana dashboards and Pulsar-specific alerting rules
-
cert-manager with support for self-signed certificates as well as public certificates using ACME; such as Let’s Encrypt
-
Ingress for all HTTP ports (Pulsar Admin Console, Prometheus, Grafana, others)
Prerequisites
For an example set of production cluster values, see the DataStax production-ready Helm chart.
DataStax recommends these hardware resources for running Luna Streaming in a Kubernetes environment:
-
Helm version 3
-
A Kubernetes cluster
-
Two node pools
-
One
function-worker
node pool for deploying sink and source connectors, and the other node pool for everything else
-
-
Must use SSD disks
-
Depending on the cloud provider, the latest 'Storage Driver' should be used, along with the fastest disk type (for example, GP3 in AWS)
-
5 Zookeeper replicas
-
3 Bookies
-
3 Brokers
-
3 Proxies
For the local machine running the Helm chart, you will need:
-
Helm version 3
-
A Google, AWS, or Azure account with privileges to install and configure Kubernetes resources.
Interested in a production benchmark of a Pulsar cluster? Check out Improve Apache Pulsar Performance on 3rd Gen Intel Xeon Scalable Processors, where Intel and DataStax benchmark a Pulsar cluster on 3rd Gen Intel® Xeon® processors running in AWS VM instances. |
Storage Class Settings
The default_storage
parameter in values.yaml
controls the default storage class for all persistent volumes created by the Helm chart.
default_storage:
existingStorageClassName: default
For a component like BookKeeper, which requires stateful storage, we need to override the default_storage
class when the BookKeeper Persistent Volume Claims (PVCs) are created.
There are two ways to override default_storage
:
-
Leave
existingStorageClassName
blank and specify the storage class parameters below.-
GKE
-
EKS
-
AKS
volumes: journal: name: journal size: 20Gi existingStorageClassName: storageClass: provisioner: kubernetes.io/gce-pd type: pd-ssd fsType: ext4 extraParams: replication-type: none
volumes: journal: name: journal size: 20Gi existingStorageClassName: storageClass: provisioner: kubernetes.io/aws-ebs type: gp2 fsType: ext4 extraParams: iopsPerGB: "10"
volumes: journal: name: journal size: 20Gi existingStorageClassName: storageClass: provisioner: kubernetes.io/azure-disk type: pd-ssd fsType: ext4 extraParams: replication-type: none
-
-
Create a custom storage configuration as a
yaml
file (like the DataStax example) and tell the Helm chart to use that storage configuration when it creates the BookKeeper PVCs.volumes: journal: name: journal size: 20Gi existingStorageClassName: bookkeeper-storageclass.yaml
Install Luna Streaming in a cloud provider
First, create the namespace; in this example, we use pulsar
.
kubectl create namespace pulsar
Then run this helm command:
helm install pulsar datastax-pulsar/pulsar --namespace pulsar --values storage_values.yaml --create-namespace
To avoid having to specify the pulsar namespace on each subsequent command, set the namespace context. Example:
|
kubectl config set-context $(kubectl config current-context) --namespace=pulsar
Once Pulsar is installed, you can now access your Luna Streaming cluster.
Access the Luna Streaming cluster
The default values will create a ClusterIP for all components. ClusterIPs are only accessible within the Kubernetes cluster. The easiest way to work with Pulsar is to log into the bastion host (assuming it is in the pulsar
namespace):
kubectl exec $(kubectl get pods -l component=bastion -o jsonpath="{.items[*].metadata.name}" -n pulsar) -it -n pulsar — /bin/bash
Once you are logged into the bastion, you can run Pulsar admin commands:
bin/pulsar-admin tenants list
For external access, you can use a load balancer. Here is an example set of values to use for load balancer on the proxy:
proxy: service: type: LoadBalancer ports: - name: http port: 8080 protocol: TCP - name: pulsar port: 6650 protocol: TCP
If you are using a load balancer on the proxy, you can find the IP address using:
kubectl get service -n pulsar
Manage Luna Streaming with Pulsar Admin Console
Or if you would rather go directly to the broker:
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 8080:8080
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 6650:6650
Manage Luna Streaming with Pulsar Admin Console
The Pulsar Admin Console is installed in your cluster by enabling the console with this values setting:
component: pulsarAdminConsole: yes
The Pulsar Admin Console will be automatically configured to connect to the Pulsar cluster.
By default, the Pulsar Admin Console has authentication disabled. You can enable authentication with these settings:
pulsarAdminConsole: authMode: k8s
To learn more about using the Pulsar Admin Console, see Admin Console Tutorial.
Install Luna Streaming locally
With the prerequisites listed above met, start minikube with adequate resources. For example:
minikube start --cpus 5 --memory 16G
Next, enter these commands:
helm repo add datastax-pulsar https://datastax.github.io/pulsar-helm-chart helm repo update curl -LOs https://datastax.github.io/pulsar-helm-chart/examples/dev-values.yaml
The dev-values.yaml
file can be viewed (here).
To list the version of the chart in the local Helm repository:
helm search repo datastax-pulsar
It may take 10 or more minutes for all the pods to reach a Ready state in your Kubernetes environment.
Example of checking the pods' status:
kubectl get pods NAME READY STATUS RESTARTS AGE prometheus-pulsar-kube-prometheus-sta-prometheus-0 2/2 Running 1 10m pulsar-adminconsole-9669f6d98-dxjvp 2/2 Running 3 12m pulsar-autorecovery-7cf8d598d6-6fwpn 1/1 Running 4 12m pulsar-bastion-67776dddc-xc6tb 1/1 Running 0 12m pulsar-bookkeeper-0 1/1 Running 1 12m pulsar-broker-7d9b8974dc-hd8xz 1/1 Running 11 12m pulsar-cert-manager-76c9d8d4d-szzh9 1/1 Running 3 12m pulsar-cert-manager-cainjector-dbff95bff-fbsmk 1/1 Running 5 12m pulsar-cert-manager-webhook-8469dc9ff6-c5x29 1/1 Running 3 12m pulsar-function-0 2/2 Running 0 12m pulsar-grafana-6f7d749d86-bzgwb 2/2 Running 0 12m pulsar-kube-prometheus-sta-operator-c68c6bf4b-xrpdl 1/1 Running 0 12m pulsar-kube-state-metrics-55fb767d74-ddqp4 1/1 Running 1 12m pulsar-prometheus-node-exporter-cst5r 1/1 Running 3 12m pulsar-proxy-7685b58f69-jqpcl 3/3 Running 4 12m pulsar-pulsarheartbeat-5f897b5948-m4r7s 1/1 Running 2 12m pulsar-zookeeper-0 1/1 Running 0 12m pulsar-zookeeper-metadata-5l58k 0/1 Completed 0 12m
Once all the pods are running, you can access the Pulsar Admin Console by forwarding to localhost:
kubectl port-forward $(kubectl get pods -l component=adminconsole -o jsonpath='{.items[0].metadata.name}') 8080:80
Now open a browser to http://localhost:8080. In the Pulsar Admin Console, you can test your Pulsar setup using the built-in clients (Test Clients in the left-hand menu).
Access the Pulsar cluster on localhost
To port forward the proxy admin and Pulsar ports to your local machine:
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 8080:8080
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 6650:6650
Or if you would rather go directly to the broker:
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 8080:8080
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 6650:6650
Access Admin Console on your local machine
To access Pulsar Admin Console on your local machine, forward port 80:
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=adminconsole -o jsonpath='{.items[0].metadata.name}') 8888:80
While using the Admin Console and Pulsar Monitoring, if the connection to localhost:3000 is refused, set a port-forward to the Grafana pod. Example:
|
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') 3000:3000 &
Example configurations
There are several example configurations in the examples directory:
-
dev-values.yaml example file. A configuration for setting up a development environment to run in a local Kubernetes environment (for example, minikube, or kind). Message/state persistence, redundancy, authentication, and TLS are disabled.
With message/state persistence disabled, the cluster will not survive a restart of the ZooKeeper or BookKeeper. -
dev-values-persistence.yaml
. Same as above, but persistence is enabled. This will allow for the cluster to survive the restarts of the pods, but requires persistent volume claims (PVC) to be supported by the Kubernetes environment. -
dev-values-auth.yaml
. A development environment with authentication enabled. New keys and tokens from those keys are automatically generated and stored in Kubernetes secrets. You can retrieve the superuser token from the admin console (Credentials menu) or from the secrettoken-superuser
.helm install pulsar -f dev-values-auth.yaml datastax-pulsar/pulsar
-
dev-values-tls.yaml
. Development environment with self-signed certificate created by cert-manager. You need to install the cert-manager CRDs before installing the Helm chart. The chart will install the cert-manager application.kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.crds.yaml helm install pulsar -f dev-values-auth.yaml datastax-pulsar/pulsar
Enabling the Prometheus stack
You can enable a full Prometheus stack (Prometheus, Alertmanager, Grafana) from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus). This includes default Prometheus rules and Grafana dashboards for Kubernetes.
In an addition, this chart can deploy Grafana dashboards for Pulsar as well as Pulsar-specific rules for Prometheus.
To enable the Prometheus stack, use the following setting in your values file:
kube-prometheus-stack: enabled: yes
To enable the Grafana dashboards, modify the following setting:
grafanaDashboards: enabled: no
To enable the Kubernetes default rules, use the following setting:
kube-prometheus-stack: defaultRules: create: yes
Tiered Storage Configuration
Tiered storage (offload to blob storage) can be configured in the storageOffload
section of the values.yaml
file. Instructions for AWS S3 and Google Cloud Storage are provided in the file.
In addition, you can configure any S3 compatible storage. There is explicit support for Tardigrade, which is a provider of secure, decentralized storage. You can enable the Tardigarde S3 gateway in the extra
configuration. The instructions for configuring the gateway are provided in the tardigrade
section of the values.yaml
file.
Pulsar SQL Configuration
If you enable Pulsar SQL, the cluster provides Trino access to the data stored in BookKeeper (and tiered storage, if enabled). Trino is exposed on the service named <release>-sql
.
The easiest way to access the Trino command line is to log into the bastion host and then connect to the Trino service port, like this:
bin/pulsar sql --server pulsar-sql:8081
Where the value for the server
option should be the service name plus port. Once you are connected, you can enter Trino commands. Example:
trino> SELECT * FROM system.runtime.nodes;
node_id | http_uri | node_version | coordinator | state
--------------------------------------+--------------------------+--------------+-------------+--------
64b7c5a1-9a72-4598-b494-b140169abc55 | http://10.244.5.164:8080 | 0.206 | true | active
0a92962e-8b44-4bd2-8988-81cbde6bab5b | http://10.244.5.196:8080 | 0.206 | false | active
(2 rows)
Query 20200608_155725_00000_gpdae, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0:04 [2 rows, 144B] [0 rows/s, 37B/s]
To access Pulsar SQL from outside the cluster, you can enable the ingress
option which will expose the Trino port on hostname. We have tested with the Traefik ingress, but any Kubernetes ingress should work. You can then run SQL queries using the Trino CLI and monitoring Trino using the built-in UI (point browser to the ingress hostname). Authentication is not enabled on the UI, so you can log in with any username.
It is recommended that you match the Trino CLI version to the version running as part of Pulsar SQL.
The Trino CLI supports basic authentication, so if you enabled that on the Ingress (using annotations), you can have secure Trino access. Example:
trino --server https://trino.example.com --user admin --password
Password:
trino> show catalogs;
Catalog
---------
pulsar
system
(2 rows)
Query 20200610_131641_00027_tzc7t, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Dependencies
The Helm chart has the following optional dependencies:
Authentication
The chart can enable token-based authentication for your Pulsar cluster. For information on token-based authentication in Pulsar, see Pulsar token authentication admin documentation.
For authentication to work, the token-generation keys need to be stored in Kubernetes secrets along with some default tokens (for superuser access).
The chart includes tooling to automatically create the necessary secrets or you can do this manually.
Automatic generation of secrets for token authentication
Use the following settings to enable automatic generation of the secrets and enable token-based authentication:
enableTokenAuth: yes autoRecovery: enableProvisionContainer: yes
When the provision container is enabled, it will check if the required secrets exist. If they don’t exist, it will generate new token keys and use those keys to generate the default set of tokens.
The name of the key secrets are:
-
token-private-key
-
token-public-key
Using these keys, it will generate tokens for each role listed in superUserRoles
. Based on the default settings, the following secrets will be created to store the tokens:
-
token-superuser
-
token-admin
-
token-proxy
-
token-websocket
Manual secret creation for token authentication
A number of values need to be stored in secrets prior to enabling token-based authentication.
-
Generate a key-pair for signing the tokens using the Pulsar tokens command:
bin/pulsar tokens create-key-pair --output-private-key my-private.key --output-public-key my-public.key
The names of the files used in this section match the default values in the Helm chart’s
values.yaml
file. If you used different names, then you will have to update the corresponding values. -
Store those keys as secrets.
kubectl create secret generic token-private-key \ --from-file=my-private.key \ --namespace pulsar
kubectl create secret generic token-public-key \ --from-file=my-public.key \ --namespace pulsar
-
Using those keys, generate tokens for the following subjects (roles):
-
admin
-
superuser
-
proxy
-
websocket
(only required if using the standalone WebSocket proxy)bin/pulsar tokens create --private-key file:///pulsar/token-private-key/my-private.key --subject <subject>
-
-
Add each token as a secret:
kubectl create secret generic token-<subject> \ --from-file=<subject>.jwt \ --namespace pulsar
-
Enable token-based authentication with this setting in
values.yaml
:enableTokenAuth: yes
Manually configuring certificate secrets for TLS
To use TLS, you must first create a certificate and store it in the secret defined by
.
You can create the certificate like this:tlsSecretName
kubectl create secret tls <tlsSecretName> --key <keyFile> --cert <certFile>
The resulting secret will be of type kubernetes.io/tls
. The key should not be in PKCS 8
format even though that is the format used by Pulsar. The format will be converted by the chart to PKCS 8
.
You can also specify the certificate information directly in the values:
# secrets: # key: | # certificate: | # caCertificate: |
This is useful if you are using a self-signed certificate.
For automated handling of publicly signed certificates, you can use a tool such as cert-manager. This page on GitHub describes how to set up cert-manager in AWS.
Once you have created the secrets that store the certificate info (or specified it in the values), you can enable TLS in the values:
enableTls: yes
Getting started with Kubernetes video
Follow along with this video from our Five Minutes About Pulsar series to get started with a Helm installation.
What’s next?
To learn about installing Luna Streaming for Bare Metal/VM, see Quickstart for Bare Metal/VM Installs.