Quick Start for Helm Chart installs

You have two options for installing DataStax Luna Streaming (Pulsar):

  • Via the provided Helm chart for an existing Kubernetes environment on a laptop or with a cloud provider, as covered in this topic.

  • Via replicated.io packaging for deployment to a single server/VM, or to multiple servers/VMs. See Quick Start for Server/VM installs.

The Helm chart and options described below configure an Apache Pulsar cluster. It is designed for production use, but can also be used in local development environments with the proper settings.

The resulting configuration includes support for:

  • TLS

  • Authentication

  • WebSocket Proxy

  • Standalone Functions Workers

  • Pulsar IO Connectors

  • Tiered Storage including Tardigarde distributed cloud storage

  • Pulsar SQL Workers

  • Pulsar Admin Console for managing the cluster

  • Pulsar heartbeat

  • Burnell for API-based token generation

  • Prometheus, Grafana, and Alertmanager stack with default Grafana dashboards and Pulsar-specific alerting rules

  • cert-manager with support for self-signed certificates as well as public certificates using ACME; such as Let’s Encrypt

  • Ingress for all HTTP ports (Pulsar Admin Console, Prometheus, Grafana, others)

Helm version 3 must be installed and initialized to use the chart. Version 2 of Helm is not supported. To get started with Helm, refer to Helm documentation.

Prerequisites and tips

  • Helm version 3

  • A Kubernetes cluster

If you haven’t already, install Helm version 3, create a Kubernetes cluster, and ensure you have access to the Kubernetes cluster (such as minikube). Example:

Download and install Homebrew:

Follow the Homebrew instruction regarding the path, such as adding the following to your ~/.profile:

# set PATH so it includes user's private bin directories
PATH="$HOME/bin:$HOME/.local/bin:$PATH"

Update your environment:

source ~/.profile

Use brew to install 'gcc` (recommended by Homebrew) and install helm v3:

brew install gcc
brew install helm

Again, only if you don’t already have a Kubernetes cluster, create one. For details on setting up minikube, see its documentation. Here’s a quick summary of the commands, which assume you already have Homebrew and kubectl installed:

brew install minikube
which minikube
minikube start
kubectl get pods -A

Use Helm to add the repo and install Luna Streaming (Pulsar) on your laptop

With the prerequisites listed above met, enter these commands:

helm repo add datastax-pulsar https://datastax.github.io/pulsar-helm-chart
helm repo update
curl -LOs https://datastax.github.io/pulsar-helm-chart/examples/dev-values.yaml
helm install pulsar -f dev-values.yaml datastax-pulsar/pulsar

To list the version of the chart in the local Helm repository:

helm search repo datastax-pulsar

It may take 10 or more minutes for all the pods to reach a Ready state in your Kubernetes environment. Example of checking the pods' status:

kubectl get pods

NAME                                                  READY   STATUS     RESTARTS  AGE
prometheus-pulsar-kube-prometheus-sta-prometheus-0    2/2     Running    1         10m
pulsar-adminconsole-9669f6d98-dxjvp                   2/2     Running    3         12m
pulsar-autorecovery-7cf8d598d6-6fwpn                  1/1     Running    4         12m
pulsar-bastion-67776dddc-xc6tb                        1/1     Running    0         12m
pulsar-bookkeeper-0                                   1/1     Running    1         12m
pulsar-broker-7d9b8974dc-hd8xz                        1/1     Running    11        12m
pulsar-cert-manager-76c9d8d4d-szzh9                   1/1     Running    3         12m
pulsar-cert-manager-cainjector-dbff95bff-fbsmk        1/1     Running    5         12m
pulsar-cert-manager-webhook-8469dc9ff6-c5x29          1/1     Running    3         12m
pulsar-function-0                                     2/2     Running    0         12m
pulsar-grafana-6f7d749d86-bzgwb                       2/2     Running    0         12m
pulsar-kube-prometheus-sta-operator-c68c6bf4b-xrpdl   1/1     Running    0         12m
pulsar-kube-state-metrics-55fb767d74-ddqp4            1/1     Running    1         12m
pulsar-prometheus-node-exporter-cst5r                 1/1     Running    3         12m
pulsar-proxy-7685b58f69-jqpcl                         3/3     Running    4         12m
pulsar-pulsarheartbeat-5f897b5948-m4r7s               1/1     Running    2         12m
pulsar-zookeeper-0                                    1/1     Running    0         12m
pulsar-zookeeper-metadata-5l58k                       0/1     Completed  0         12m
In the example output, the pulsar-zookeeper-metadata-5l58k pod is special. It is a pod that is associated with a Job. Unlike regular pods, these pods run some code then shut down. The pod is in Completed state, which means it started, ran what it needed to, and then shutdown. This is normal. In fact, if pulsar-zookeeper-metadata-5l58k has not run successfully, many of the other pods would not have started. Most of them depend on it to set up the Pulsar cluster correctly.

Once all the pods are running, you can access the Pulsar Admin Console by forwarding to localhost:

kubectl port-forward $(kubectl get pods -l component=adminconsole -o jsonpath='{.items[0].metadata.name}') 8080:80

Now open a browser to http://localhost:8080. In the Pulsar Admin Console, you can test your Pulsar setup using the built-in clients (Test Clients in the left-hand menu).

Installing Luna Streaming (Pulsar) in a Cloud Provider

Before you can install the chart, you need to configure the storage class settings for your cloud provider, such as AWS, GCP, or Azure. The handling of storage varies from cloud provider to cloud provider.

Create a new file called storage_values.yaml for the storage class settings. To use an existing storage class (including the default one) set this value:

default_storage:
  existingStorageClassName: default or <name of storage class>

For each volume of each component (Zookeeper, Bookkeeper), you can override the default_storage setting by specifying a different existingStorageClassName. This allows you to match the optimum storage type to the volume.

If you have specific storage class requirement, for example fixed IOPS disks in AWS, you can have the chart configure the storage classes for you. Here are examples from the cloud providers:

# For AWS
# default_storage:
#  provisioner: kubernetes.io/aws-ebs
#  type: gp2
#  fsType: ext4
#  extraParams:
#     iopsPerGB: "10"


# For GCP
# default_storage:
#   provisioner: kubernetes.io/gce-pd
#   type: pd-ssd
#   fsType: ext4
#   extraParams:
#      replication-type: none

# For Azure
# default_storage:
#   provisioner: kubernetes.io/azure-disk
#   fsType: ext4
#   type: managed-premium
#   extraParams:
#     storageaccounttype: Premium_LRS
#     kind: Managed
#     cachingmode: ReadOnly

See this values file for more details on the settings.

Once you have your storage settings in the values file, install the chart. First, create the namespace; in this example, we use pulsar.

kubectl create namespace pulsar

Then run this helm command:

helm install pulsar datastax-pulsar/pulsar --namespace pulsar --values storage_values.yaml --create-namespace

To avoid having to specify the pulsar namespace on each subsequent command, set the namespace context. Example:

kubectl config set-context $(kubectl config current-context) --namespace=pulsar

Installing Luna Streaming (Pulsar) for development

This chart is designed for production use, but it can be used in development enviroments. To use this chart in a development environment (such as minikube), you need to:

  • Disable anti-affinity rules that ensure components run on different nodes

  • Reduce resource requirements

  • Disable persistence (configuration and messages are not stored so are lost on restart). If you want persistence, you will have to configure storage settings that are compatible with your development enviroment as described above.

For an example set of values, download this dev-values.yaml file. Use that values file or one like it to start the cluster.

Then run this command:

helm install pulsar datastax/pulsar --namespace pulsar --values dev-values.yaml --create-namespace

Accessing the Pulsar cluster in cloud

The default values will create a ClusterIP for all components. ClusterIPs are only accessible within the Kubernetes cluster. The easiest way to work with Pulsar is to log into the bastion host (assuming it is in the pulsar namespace):

kubectl exec $(kubectl get pods -l component=bastion -o jsonpath="{.items[*].metadata.name}" -n pulsar) -it -n pulsar — /bin/bash

Once you are logged into the bastion, you can run Pulsar admin commands:

bin/pulsar-admin tenants list

For external access, you can use a load balancer. Here is an example set of values to use for load balancer on the proxy:

proxy:
 service:
    type: LoadBalancer
    ports:
    - name: http
      port: 8080
      protocol: TCP
    - name: pulsar
      port: 6650
      protocol: TCP

If you are using a load balancer on the proxy, you can find the IP address using:

kubectl get service -n pulsar

Accessing the Pulsar cluster on localhost

To port forward the proxy admin and Pulsar ports to your local machine:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 6650:6650

Or if you would rather go directly to the broker:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 6650:6650

Managing Pulsar using Pulsar Admin Console

You can install the Pulsar Admin Console in your cluster by enabling the console with this values setting:

component:
  pulsarAdminConsole: yes

The Pulsar Admin Console will be automatically configured to connect to the Pulsar cluster.

By default, the Pulsar Admin Console has authentication disabled. You can enable authentication with these settings:

pulsarAdminConsole:
    authMode: k8s

When k8s authentication mode is enabled, the Pulsar Admin Console gets the users from Kubernetes secrets that start with dashboard-user- in the same namespace where it is deployed. The text that follows the prefix is the username. For example, for a user admin you need to have a secret dashboard-user-admin. The secret data must have a key named password with the base-64 encoded password. The following command will create a secret for a user admin with a password of password:

kubectl create secret generic dashboard-user-admin --from-literal=password=password

You can create multiple users for the Pulsar Admin Console by creating multiple secrets. To change the password for a user, delete the secret then recreate it with a new password. Example:

kubectl delete secret dashboard-user-admin
kubectl create secret generic dashboard-user-admin --from-literal=password=newpassword

For convenience, the Helm chart is able to create an initial user for the Pulsar Admin Console with the following settings:

pulsarAdminConsole:
    createUserSecret:
      enabled: yes
      user: 'admin'
      password: 'password'

Accessing Admin Console on your local machine

To access Pulsar Admin Console on your local machine, forward port 80:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=adminconsole -o jsonpath='{.items[0].metadata.name}') 8888:80
While using the Admin Console and Pulsar Monitoring, if the connection to localhost:3000 is refused, set a port-forward to the Grafana pod. Example:
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') 3000:3000 &

Accessing Pulsar Admin Console from a cloud provider

To access Pulsar Admin Console from a cloud provider, the chart supports Kubernetes Ingress. Your Kubernetes cluster must have a running Ingress controller, such as Nginx or Traefik.

Set these values to configure the Ingress for the Pulsar Admin Console:

pulsarAdminConsole:
  ingress:
    enabled: yes
    host: pulsar-ui.example.com

Enabling the Prometheus stack

You can enable a full Prometheus stack (Prometheus, Alertmanager, Grafana) from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus). This includes default Prometheus rules and Grafana dashboards for Kubernetes.

In an addition, this chart can deploy Grafana dashboards for Pulsar as well as Pulsar-specific rules for Prometheus.

To deploy the Prometheus stack, use the following setting in your values file:

kube-prometheus-stack:
  enabled: yes

To enable the Grafana dashboards, modify the following setting:

grafanaDashboards:
  enabled: no

To enable the Kubernetes default rules, use the following setting:

kube-prometheus-stack:
  defaultRules:
    create: yes

Example configurations

There are several example configurations in the examples directory:

  • dse-values.yaml example file. A configuration for setting up a development environment to run in a local Kubernetes environment (for example, minikube, or kind). Message/state persistence, redundancy, authentication, and TLS are disabled.

With message/state persistence disabled, the cluster will not survive a restart of the ZooKeeper or BookKeeper.
  • dev-values-persistence. Same as above, but persistence is enabled. This will allow for the cluster to survive the restarts of the pods, but requires persistent volume claims (PVC) to be supported by the Kubernetes environment.

  • dev-values-auth.yaml. A development environment with authentication enabled. New keys and tokens from those keys are automatically generated and stored in Kubernetes secrets. You can retrieve the superuser token from the admin console (Credentials menu) or from the secret token-superuser.

helm install pulsar -f dev-values-auth.yaml datastax-pulsar/pulsar

  • dev-values-tls.yaml. Development environment with self-signed certficate created by cert-manager. You need to install the cert-manager CRDs before installing the Helm chart. The chart will install the cert-manager application.

kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.crds.yaml
helm install pulsar -f dev-values-auth.yaml datastax-pulsar/pulsar

Tiered Storage

Tiered storage (offload to blob storage) can be configured in the storageOffload section of the values.yaml file. Instructions for AWS S3 and Google Cloud Storage are provided in the file.

In addition you can configure any S3 compatible storage. There is explicit support for Tardigrade, which is a provider of secure, decentralized storage. You can enable the Tardigarde S3 gateway in the extra configuration. The instructions for configuring the gateway are provided in the tardigrade section of the values.yaml file.

Pulsar SQL

If you enable Pulsar SQL, the cluster provides Presto access to the data stored in BookKeeper (and tiered storage, if enabled). Presto is exposed on the service named <release>-sql.

The easiest way to access the Presto command line is to log into the bastion host and then connect to the Presto service port, like this:

bin/pulsar sql --server pulsar-sql:8090

Where the value for the server option should be the service name plus port. Once you are connected, you can enter Presto commands. Example:

presto> SELECT * FROM system.runtime.nodes;
               node_id                |         http_uri         | node_version | coordinator | state
--------------------------------------+--------------------------+--------------+-------------+--------
 64b7c5a1-9a72-4598-b494-b140169abc55 | http://10.244.5.164:8080 | 0.206        | true        | active
 0a92962e-8b44-4bd2-8988-81cbde6bab5b | http://10.244.5.196:8080 | 0.206        | false       | active
(2 rows)

Query 20200608_155725_00000_gpdae, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0:04 [2 rows, 144B] [0 rows/s, 37B/s]

To access Pulsar SQL from outside the cluster, you can enable the ingress option which will expose the Presto port on hostname. We have tested with the Traefik ingress, but any Kubernetes ingress should work. You can then run SQL queries using the Presto CLI and monitoring Presto using the built-in UI (point browser to the ingress hostname). Authentication is not enabled on the UI, so you can log in with any username.

It is recommended that you match the Presto CLI version to the version running as part of Pulsar SQL.

The Presto CLI supports basic authentication, so if you enabled that on the Ingress (using annotations), you can have secure Presto access. Example:

presto --server https://presto.example.com --user admin --password
Password:
presto> show catalogs;
 Catalog
---------
 pulsar
 system
(2 rows)

Query 20200610_131641_00027_tzc7t, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Dependencies

The Helm chart has the following optional dependencies:

Authentication

The chart can enable token-based authentication for your Pulsar cluster. For information on token-based authentication in Pulsar, go here.

For authentication to work, the token-generation keys need to be stored in Kubernetes secrets along with some default tokens (for superuser access).

The chart includes tooling to automatically create the necessary secrets or you can do this manually.

Automatic generation of secrets for token authentication

Use the following settings to enable automatic generation of the secrets and enable token-based authentication:

enableTokenAuth: yes
autoRecovery:
  enableProvisionContainer: yes

When the provision container is enabled, it will check if the required secrets exist. If they don’t exist, it will generate new token keys and use those keys to generate the default set of tokens.

The name of the key secrets are:

  • token-private-key

  • token-public-key

Using these keys, it will generate tokens for each role listed in superUserRoles. Based on the default settings, the following secrets will be created to store the tokens:

  • token-superuser

  • token-admin

  • token-proxy

  • token-websocket

Manual secret creation for token authentication

A number of values need to be stored in secrets prior to enabling token-based authentication. First, you need to generate a key-pair for signing the tokens using the Pulsar tokens command:

bin/pulsar tokens create-key-pair --output-private-key my-private.key --output-public-key my-public.key

The names of the files used in this section match the default values in the chart. If you used different names, then you will have to update the corresponding values.

Then you need to store those keys as secrets.

kubectl create secret generic token-private-key \
 --from-file=my-private.key \
 --namespace pulsar
kubectl create secret generic token-public-key \
 --from-file=my-public.key \
 --namespace pulsar

Using those keys, generate tokens with subjects(roles):

bin/pulsar tokens create --private-key file:///pulsar/token-private-key/my-private.key --subject <subject>

You need to generate tokens with the following subjects:

  • admin

  • superuser

  • proxy

  • websocket (only required if using the standalone WebSocket proxy)

Once you have created those tokens, add each as a secret:

kubectl create secret generic token-<subject> \
 --from-file=<subject>.jwt \
 --namespace pulsar

Once you have created the required secrets, you can enable token-based authentication with this setting in the values:

enableTokenAuth: yes

TLS

Automatically generating certificates using cert-manager

Manually configuring certificate secrets for TLS

To use TLS, you must first create a certificate and store it in the secret defined by tlsSecretName. You can create the certificate like this:

kubectl create secret tls <tlsSecretName> --key <keyFile> --cert <certFile>

The resulting secret will be of type kubernetes.io/tls. The key should not be in PKCS 8 format even though that is the format used by Pulsar. The format will be converted by the chart to PKCS 8.

You can also specify the certificate information directly in the values:

# secrets:
  # key: |
  # certificate: |
  # caCertificate: |

This is useful if you are using a self-signed certificate.

For automated handling of publicly signed certificates, you can use a tool such as cert-manager. This page on GitHub describes how to set up cert-manager in AWS.

Once you have created the secrets that store the certificate info (or specified it in the values), you can enable TLS in the values:

enableTls: yes

Next

To learn about installing Luna Streaming via the Replicated package, see Quick Start for Server/VM installs.