Quick Start for Helm Chart installs

You have options for installing DataStax Luna Streaming:

The Helm chart and options described below configure an Apache Pulsar cluster. It is designed for production use, but can also be used in local development environments with the proper settings.

The resulting configuration includes support for:

  • TLS

  • Authentication

  • WebSocket Proxy

  • Standalone Functions Workers

  • Pulsar IO Connectors

  • Tiered Storage including Tardigarde distributed cloud storage

  • Pulsar SQL Workers

  • Pulsar Admin Console for managing the cluster

  • Pulsar heartbeat

  • Burnell for API-based token generation

  • Prometheus, Grafana, and Alertmanager stack with default Grafana dashboards and Pulsar-specific alerting rules

  • cert-manager with support for self-signed certificates as well as public certificates using ACME; such as Let’s Encrypt

  • Ingress for all HTTP ports (Pulsar Admin Console, Prometheus, Grafana, others)

Prerequisites

For an example set of production cluster values, see the DataStax production-ready Helm chart.

DataStax recommends these hardware resources for running Luna Streaming in a Kubernetes environment:

  • Helm version 3

  • A Kubernetes cluster

  • Two node pools

    • One function-worker node pool for deploying sink and source connectors, and the other node pool for everything else

  • Must use SSD disks

  • Depending on the cloud provider, the latest 'Storage Driver' should be used, along with the fastest disk type (for example, GP3 in AWS)

  • 5 Zookeeper replicas

  • 3 Bookies

  • 3 Brokers

  • 3 Proxies

For the local machine running the Helm chart, you will need:

  • Helm version 3

  • A Google, AWS, or Azure account with privileges to install and configure Kubernetes resources.

Interested in a production benchmark of a Pulsar cluster?

Check out Improve Apache Pulsar Performance on 3rd Gen Intel Xeon Scalable Processors, where Intel and DataStax benchmark a Pulsar cluster on 3rd Gen Intel® Xeon® processors running in AWS VM instances.

Storage Class Settings

The default_storage parameter in values.yaml controls the default storage class for all persistent volumes created by the Helm chart.

default_storage:
  existingStorageClassName: default

For a component like BookKeeper, which requires stateful storage, we need to override the default_storage class when the BookKeeper Persistent Volume Claims (PVCs) are created.

There are two ways to override default_storage:

  • Leave existingStorageClassName blank and specify the storage class parameters below.

    • GKE

    • EKS

    • AKS

      volumes:
        journal:
          name: journal
          size: 20Gi
          existingStorageClassName:
            storageClass:
             provisioner: kubernetes.io/gce-pd
             type: pd-ssd
             fsType: ext4
             extraParams:
                replication-type: none
      volumes:
        journal:
          name: journal
          size: 20Gi
          existingStorageClassName:
            storageClass:
             provisioner: kubernetes.io/aws-ebs
             type: gp2
             fsType: ext4
             extraParams:
                iopsPerGB: "10"
      volumes:
        journal:
          name: journal
          size: 20Gi
          existingStorageClassName:
            storageClass:
             provisioner: kubernetes.io/azure-disk
             type: pd-ssd
             fsType: ext4
             extraParams:
                replication-type: none
  • Create a custom storage configuration as a yaml file (like the DataStax example) and tell the Helm chart to use that storage configuration when it creates the BookKeeper PVCs.

      volumes:
        journal:
          name: journal
          size: 20Gi
          existingStorageClassName: bookkeeper-storageclass.yaml

Install Luna Streaming in a cloud provider

First, create the namespace; in this example, we use pulsar.

kubectl create namespace pulsar

Then run this helm command:

helm install pulsar datastax-pulsar/pulsar --namespace pulsar --values storage_values.yaml --create-namespace

To avoid having to specify the pulsar namespace on each subsequent command, set the namespace context. Example:

kubectl config set-context $(kubectl config current-context) --namespace=pulsar

Once Pulsar is installed, you can now access your Luna Streaming cluster.

Access the Luna Streaming cluster

The default values will create a ClusterIP for all components. ClusterIPs are only accessible within the Kubernetes cluster. The easiest way to work with Pulsar is to log into the bastion host (assuming it is in the pulsar namespace):

kubectl exec $(kubectl get pods -l component=bastion -o jsonpath="{.items[*].metadata.name}" -n pulsar) -it -n pulsar — /bin/bash

Once you are logged into the bastion, you can run Pulsar admin commands:

bin/pulsar-admin tenants list

For external access, you can use a load balancer. Here is an example set of values to use for load balancer on the proxy:

proxy:
 service:
    type: LoadBalancer
    ports:
    - name: http
      port: 8080
      protocol: TCP
    - name: pulsar
      port: 6650
      protocol: TCP

If you are using a load balancer on the proxy, you can find the IP address using:

kubectl get service -n pulsar

Manage Luna Streaming with Pulsar Admin Console

Or if you would rather go directly to the broker:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 6650:6650

Manage Luna Streaming with Pulsar Admin Console

The Pulsar Admin Console is installed in your cluster by enabling the console with this values setting:

component:
  pulsarAdminConsole: yes

The Pulsar Admin Console will be automatically configured to connect to the Pulsar cluster.

By default, the Pulsar Admin Console has authentication disabled. You can enable authentication with these settings:

pulsarAdminConsole:
    authMode: k8s

To learn more about using the Pulsar Admin Console, see Admin Console Tutorial.

Install Luna Streaming locally

With the prerequisites listed above met, start minikube with adequate resources. For example:

minikube start --cpus 5 --memory 16G

Next, enter these commands:

helm repo add datastax-pulsar https://datastax.github.io/pulsar-helm-chart
helm repo update
curl -LOs https://datastax.github.io/pulsar-helm-chart/examples/dev-values.yaml

The dev-values.yaml file can be viewed (here).

To list the version of the chart in the local Helm repository:

helm search repo datastax-pulsar

It may take 10 or more minutes for all the pods to reach a Ready state in your Kubernetes environment.

Example of checking the pods' status:

kubectl get pods

NAME                                                  READY   STATUS     RESTARTS  AGE
prometheus-pulsar-kube-prometheus-sta-prometheus-0    2/2     Running    1         10m
pulsar-adminconsole-9669f6d98-dxjvp                   2/2     Running    3         12m
pulsar-autorecovery-7cf8d598d6-6fwpn                  1/1     Running    4         12m
pulsar-bastion-67776dddc-xc6tb                        1/1     Running    0         12m
pulsar-bookkeeper-0                                   1/1     Running    1         12m
pulsar-broker-7d9b8974dc-hd8xz                        1/1     Running    11        12m
pulsar-cert-manager-76c9d8d4d-szzh9                   1/1     Running    3         12m
pulsar-cert-manager-cainjector-dbff95bff-fbsmk        1/1     Running    5         12m
pulsar-cert-manager-webhook-8469dc9ff6-c5x29          1/1     Running    3         12m
pulsar-function-0                                     2/2     Running    0         12m
pulsar-grafana-6f7d749d86-bzgwb                       2/2     Running    0         12m
pulsar-kube-prometheus-sta-operator-c68c6bf4b-xrpdl   1/1     Running    0         12m
pulsar-kube-state-metrics-55fb767d74-ddqp4            1/1     Running    1         12m
pulsar-prometheus-node-exporter-cst5r                 1/1     Running    3         12m
pulsar-proxy-7685b58f69-jqpcl                         3/3     Running    4         12m
pulsar-pulsarheartbeat-5f897b5948-m4r7s               1/1     Running    2         12m
pulsar-zookeeper-0                                    1/1     Running    0         12m
pulsar-zookeeper-metadata-5l58k                       0/1     Completed  0         12m

Once all the pods are running, you can access the Pulsar Admin Console by forwarding to localhost:

kubectl port-forward $(kubectl get pods -l component=adminconsole -o jsonpath='{.items[0].metadata.name}') 8080:80

Now open a browser to http://localhost:8080. In the Pulsar Admin Console, you can test your Pulsar setup using the built-in clients (Test Clients in the left-hand menu).

Access the Pulsar cluster on localhost

To port forward the proxy admin and Pulsar ports to your local machine:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=proxy -o jsonpath='{.items[0].metadata.name}') 6650:6650

Or if you would rather go directly to the broker:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 8080:8080

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=broker -o jsonpath='{.items[0].metadata.name}') 6650:6650

Access Admin Console on your local machine

To access Pulsar Admin Console on your local machine, forward port 80:

kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l component=adminconsole -o jsonpath='{.items[0].metadata.name}') 8888:80
While using the Admin Console and Pulsar Monitoring, if the connection to localhost:3000 is refused, set a port-forward to the Grafana pod. Example:
kubectl port-forward -n pulsar $(kubectl get pods -n pulsar -l app.kubernetes.io/name=grafana -o jsonpath='{.items[0].metadata.name}') 3000:3000 &

Example configurations

There are several example configurations in the examples directory:

  • dev-values.yaml example file. A configuration for setting up a development environment to run in a local Kubernetes environment (for example, minikube, or kind). Message/state persistence, redundancy, authentication, and TLS are disabled.

    With message/state persistence disabled, the cluster will not survive a restart of the ZooKeeper or BookKeeper.
  • dev-values-persistence.yaml. Same as above, but persistence is enabled. This will allow for the cluster to survive the restarts of the pods, but requires persistent volume claims (PVC) to be supported by the Kubernetes environment.

  • dev-values-auth.yaml. A development environment with authentication enabled. New keys and tokens from those keys are automatically generated and stored in Kubernetes secrets. You can retrieve the superuser token from the admin console (Credentials menu) or from the secret token-superuser.

    helm install pulsar -f dev-values-auth.yaml datastax-pulsar/pulsar

  • dev-values-tls.yaml. Development environment with self-signed certificate created by cert-manager. You need to install the cert-manager CRDs before installing the Helm chart. The chart will install the cert-manager application.

    kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.crds.yaml
    helm install pulsar -f dev-values-auth.yaml datastax-pulsar/pulsar

Enabling the Prometheus stack

You can enable a full Prometheus stack (Prometheus, Alertmanager, Grafana) from [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus). This includes default Prometheus rules and Grafana dashboards for Kubernetes.

In an addition, this chart can deploy Grafana dashboards for Pulsar as well as Pulsar-specific rules for Prometheus.

To enable the Prometheus stack, use the following setting in your values file:

kube-prometheus-stack:
  enabled: yes

To enable the Grafana dashboards, modify the following setting:

grafanaDashboards:
  enabled: no

To enable the Kubernetes default rules, use the following setting:

kube-prometheus-stack:
  defaultRules:
    create: yes

Tiered Storage Configuration

Tiered storage (offload to blob storage) can be configured in the storageOffload section of the values.yaml file. Instructions for AWS S3 and Google Cloud Storage are provided in the file.

In addition, you can configure any S3 compatible storage. There is explicit support for Tardigrade, which is a provider of secure, decentralized storage. You can enable the Tardigarde S3 gateway in the extra configuration. The instructions for configuring the gateway are provided in the tardigrade section of the values.yaml file.

Pulsar SQL Configuration

If you enable Pulsar SQL, the cluster provides Trino access to the data stored in BookKeeper (and tiered storage, if enabled). Trino is exposed on the service named <release>-sql.

The easiest way to access the Trino command line is to log into the bastion host and then connect to the Trino service port, like this:

bin/pulsar sql --server pulsar-sql:8081

Where the value for the server option should be the service name plus port. Once you are connected, you can enter Trino commands. Example:

trino> SELECT * FROM system.runtime.nodes;
               node_id                |         http_uri         | node_version | coordinator | state
--------------------------------------+--------------------------+--------------+-------------+--------
 64b7c5a1-9a72-4598-b494-b140169abc55 | http://10.244.5.164:8080 | 0.206        | true        | active
 0a92962e-8b44-4bd2-8988-81cbde6bab5b | http://10.244.5.196:8080 | 0.206        | false       | active
(2 rows)

Query 20200608_155725_00000_gpdae, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0:04 [2 rows, 144B] [0 rows/s, 37B/s]

To access Pulsar SQL from outside the cluster, you can enable the ingress option which will expose the Trino port on hostname. We have tested with the Traefik ingress, but any Kubernetes ingress should work. You can then run SQL queries using the Trino CLI and monitoring Trino using the built-in UI (point browser to the ingress hostname). Authentication is not enabled on the UI, so you can log in with any username.

It is recommended that you match the Trino CLI version to the version running as part of Pulsar SQL.

The Trino CLI supports basic authentication, so if you enabled that on the Ingress (using annotations), you can have secure Trino access. Example:

trino --server https://trino.example.com --user admin --password
Password:
trino> show catalogs;
 Catalog
---------
 pulsar
 system
(2 rows)

Query 20200610_131641_00027_tzc7t, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

Dependencies

The Helm chart has the following optional dependencies:

Authentication

The chart can enable token-based authentication for your Pulsar cluster. For information on token-based authentication in Pulsar, see Pulsar token authentication admin documentation.

For authentication to work, the token-generation keys need to be stored in Kubernetes secrets along with some default tokens (for superuser access).

The chart includes tooling to automatically create the necessary secrets or you can do this manually.

Automatic generation of secrets for token authentication

Use the following settings to enable automatic generation of the secrets and enable token-based authentication:

enableTokenAuth: yes
autoRecovery:
  enableProvisionContainer: yes

When the provision container is enabled, it will check if the required secrets exist. If they don’t exist, it will generate new token keys and use those keys to generate the default set of tokens.

The name of the key secrets are:

  • token-private-key

  • token-public-key

Using these keys, it will generate tokens for each role listed in superUserRoles. Based on the default settings, the following secrets will be created to store the tokens:

  • token-superuser

  • token-admin

  • token-proxy

  • token-websocket

Manual secret creation for token authentication

A number of values need to be stored in secrets prior to enabling token-based authentication.

  1. Generate a key-pair for signing the tokens using the Pulsar tokens command:

    bin/pulsar tokens create-key-pair --output-private-key my-private.key --output-public-key my-public.key

    The names of the files used in this section match the default values in the Helm chart’s values.yaml file. If you used different names, then you will have to update the corresponding values.

  2. Store those keys as secrets.

    kubectl create secret generic token-private-key \
     --from-file=my-private.key \
     --namespace pulsar
    kubectl create secret generic token-public-key \
     --from-file=my-public.key \
     --namespace pulsar
  3. Using those keys, generate tokens for the following subjects (roles):

    • admin

    • superuser

    • proxy

    • websocket (only required if using the standalone WebSocket proxy)

      bin/pulsar tokens create --private-key file:///pulsar/token-private-key/my-private.key --subject <subject>
  4. Add each token as a secret:

    kubectl create secret generic token-<subject> \
     --from-file=<subject>.jwt \
     --namespace pulsar
  5. Enable token-based authentication with this setting in values.yaml:

    enableTokenAuth: yes

Manually configuring certificate secrets for TLS

To use TLS, you must first create a certificate and store it in the secret defined by tlsSecretName. You can create the certificate like this:

kubectl create secret tls <tlsSecretName> --key <keyFile> --cert <certFile>

The resulting secret will be of type kubernetes.io/tls. The key should not be in PKCS 8 format even though that is the format used by Pulsar. The format will be converted by the chart to PKCS 8.

You can also specify the certificate information directly in the values:

# secrets:
  # key: |
  # certificate: |
  # caCertificate: |

This is useful if you are using a self-signed certificate.

For automated handling of publicly signed certificates, you can use a tool such as cert-manager. This page on GitHub describes how to set up cert-manager in AWS.

Once you have created the secrets that store the certificate info (or specified it in the values), you can enable TLS in the values:

enableTls: yes

Getting started with Kubernetes video

Follow along with this video from our Five Minutes About Pulsar series to get started with a Helm installation.

What’s next?

To learn about installing Luna Streaming for Bare Metal/VM, see Quickstart for Bare Metal/VM Installs.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com