Provision Cassandra or DSE in Kubernetes with Kubernetes Operator for Apache Cassandra

Complete the following procedure to provision Apache Cassandra® or DataStax Enterprise (DSE) in a Kubernetes cluster.

Prerequisites

If you have not already, Create a Kubernetes cluster.

Application of the example operator configuration, through the steps shown in Configure Cassandra or DSE in Kubernetes with Kubernetes Operator for Apache Cassandra. These steps create a new resource type in your Kubernetes cluster named CassandraDatacenter.

Procedure

With the Kubernetes Operator for Apache Cassandra configuration applied, you can now define a cluster topology for the Kubernetes Operator for Apache Cassandra to create and monitor. In this topic, a three-node cluster is provisioned, with one datacenter made up of three racks, with one node per rack. The steps show how to indicate whether the serverType for your Kubernetes environment is cassandra or dse, plus related values.

For all the parameters presented in each step below, DataStax provides example YAML files that you can download and customize for your purposes:

Define the cluster and datacenter parameters

Details

A logical datacenter is the primary resource managed by the Kubernetes Operator for Apache Cassandra. Within a single Kubernetes namespace:

  • A single CassandaDatacenter resource defines a single-datacenter cluster.

  • Two or more CassandaDatacenter resources with different clusterName values define separate and unrelated single-datacenter clusters. Kubernetes Operator for Apache Cassandra manages both clusters because they reside within the same Kubernetes namespace.

  • Two or more CassandaDatacenter resources that share the same clusterName define a multi-datacenter cluster. Kubernetes Operator for Apache Cassandra joins the instances in each datacenter into a logical topology that acts as a single cluster.

The provisioning example in this topic defines a single-datacenter cluster. The cluster name is cluster1 and the datacenter name is dc1.

Define the Rack parameters

Details

Cassandra and DSE are rack aware, and the racks parameter configures the Kubernetes Operator for Apache Cassandra to set up pods in a rack-aware way.

To specify the availability zone for a given rack, the Kubernetes worker nodes must have labels matching failure-domain.beta.kubernetes.io/zone. For more, see the Kubernetes documentation.

Racks must have identifiers. In this topic, the example configuration identifies the racks as r1, r2, and r3. Ensure the number of racks match the replication factor in the keyspaces you plan to create.

The number of racks cannot easily be changed after the CassandraDatacenter resource is deployed.

Define the node count parameters

Details

The size parameter is the number of nodes to run in the datacenter. A node is a worker machine in Kubernetes and may be a virtual machine (VM) or a physical machine, depending on the cluster. Multiple pods can run on one node.

For optimal performance, DataStax recommends running only one Cassandra or DSE server instance per Kubernetes worker node. The Kubernetes Operator for Apache Cassandra enforces that limit, and pods may get stuck in the Pending status if there are insufficient Kubernetes workers available.
Use a kubectl describe node command to return information about the nodes in the Kubernetes cluster.

For example:

kubectl get node --selector='!node-role.kubernetes.io/master'

This topic and its examples assume you have at least three worker nodes available. If you are working on a minikube or other setup with a single Kubernetes worker node, then you must reduce the size value accordingly or set the allowMultipleNodesPerWorker parameter to true.

Define the storage parameters

Define the storage with a combination of the previously provisioned storage class and size parameters. These values inform the storage provisioner how much room to require from the backend.

Configure the database

Details

The config key in the CassandaDatacenter resource contains the parameters used to configure the server process running in each pod. In general, it is not necessary to specify any parameters here. Settings omitted from the config key receive reasonable default values, and it is common to run demo clusters with no custom configuration.

If you are familiar with configuring Cassandra or DSE outside of containers on traditional operating systems, then you recognize that some familiar configuration parameters have been specified elsewhere in the CassandaDatacenter resource, outside of the config section. Do not repeat these parameters inside of the config section of the provisioning YAML file; Kubernetes Operator for Apache Cassandra populates them automatically from the CassandaDatacenter resource. For example:

  • cluster_name parameter, which is normally specified in the cassandra.yaml file.

    The location of this file depends on the type of installation:

    • Package installations: /etc/dse/cassandra/cassandra.yaml

    • Tarball installations: <install_location>/resources/cassandra/conf/cassandra.yaml

  • rack and datacenter parameters.

Similarly, Kubernetes Operator for Apache Cassandra automatically populates any values that are normally customized on a per Cassandra node basis. Do not specify these in the CassandaDatacenter resource. For example, do not specify in the basis key:

  • initial_token

  • listen_address and other ip-addresses.

A large number of keys and values can be specified in the config section. The config key data structure resembles the DataStax OpsCenter Lifecycle Manager (LCM) configuration profiles. If needed, extrapolate the parameters to use from the LCM config profile options.

View information about the superuser credentials

Details

By default, Kubernetes Operator for Apache Cassandra creates a Cassandra superuser. A Kubernetes secret is created, named <clusterName>-superuser, which contains username and password keys.

To define the superuser with custom credentials, skip to Define superuser with custom credentials

The following example assumes that you already created your environment’s CassandaDatacenter resource for the Kubernetes Operator for Apache Cassandra in the previous configuration topic, and uses the following kubectl commands to return information about the superuser:

kubectl -n my-db-ns get secret cluster1-superuser

Sample output:

NAME                       TYPE                                  DATA   AGE
cluster1-superuser         Opaque                                2      13m
kubectl -n my-db-ns get secret cluster1-superuser -o yaml

Sample output:

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: cluster1-superuser
data:
  password: d0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=
  username: Y2x1c3RlcjEtc3VwZXJ1c2Vy
echo Y2x1c3RlcjEtc3VwZXJ1c2Vy | base64 -d

Sample output:

cluster1-superuser
echo 'd0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=' | base64 -d

Sample output:

wH4QtZM84W5WlCBeZ8ZcjEeE0ltIuoZxLSI9jYljjAbuKYOVE564CRXJpckpb0+H9fJvNpwkHLYUO+NMu7PIEhafJW3U4Z+lvR5Sz0qHsZccDt4zxSHZsxtMpAb33WUd7GnHuA

Define superuser with custom credentials

Details

To define the superuser with your own custom credentials rather than take the default values, create a secret with kubectl.

For example:

kubectl create secret generic superuser-secret -f my-secret.yaml

To use the new superuser secret, specify the name of the secret in the CassandaDatacenter configuration YAML file that you load into the cluster.

For example:

apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
  name: dtcntr
spec:
  superuserSecretName: superuser-secret

Specify the server type and version

Details

In the spec section of your YAML configuration file for the CassandraDatacenter resource, specify:

  • (Required) serverType - the value must be cassandra or dse.

  • (Required) serverVersion. For:

    • serverType: dse, the version value must be 6.8.0 or later.

    • serverType: cassandra, the version value is 3.11.7 or later.

  • (Optional) serverImage - only meant as an override.

DataStax recommends that you do not specify serverImage in the provisioning example file. When unspecified, Kubernetes Operator for Apache Cassandra provides the appropriate image path, filename, and supported version.

Using a default image; intentionally not specifying the serverImage:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dtcntr
spec:
  serverType: dse
  serverVersion: 6.8.4

When you are provisioning Apache Cassandra in the Kubernetes cluster and want to use the default image, change serverType: dse from the previous example to serverType: cassandra, and specify serverVersion: 3.11.7.

Using a specific Cassandra image:

For example:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dtcntr
spec:
  serverType: cassandra
  serverVersion: 3.11.7
  serverImage: my-private-docker-registry.example.com/datastax/cassandra-mgmtapi-3_11_7

Using a specific DSE image:

For example:

apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
  name: dtcntr
spec:
  serverType: dse
  serverVersion: 6.8.4
  serverImage: my-private-docker-registry.example.com/datastax.dse-server:6.8.4-20200731

Configure a NodePort service

Details

Request a NodePort service in a CassandraDatacenter configuration YAML by setting the following fields:

spec:
  networking:
    nodePort:
      cql: 30001
      broadcast: 30002

To request the SSL versions of the ports:

spec:
  networking:
    nodePort:
      cqlSSL: 30010 broadcast
      SSL: 300202

If any of the nodePort fields are already configured, a NodePort service is created that routes from the specified external port to the identically numbered internal port. Cassandra is configured to listen on the specified ports.

Full example of a nodeport-service-dc.yaml file:

apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc1
spec:
  clusterName: cluster1
  serverType: dse
  serverVersion: "6.8.4"
  managementApiAuth:
    insecure: {}
  networking:
    nodePort:
      cql: 30001
      broadcast: 30002
  size: 2
  storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: server-storage
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  racks:
    - name: r1
    - name: r2
  config:
    jvm-server-options:
      initial_heap_size: "800m"
      max_heap_size: "800m"
    cassandra-yaml:
      file_cache_size_in_mb: 100
      memtable_space_in_mb: 100

Configure a nodeSelector

Details

To pin pods to labeled K8s worker nodes in the Kubernetes cluster, use node selectors. Define node selectors in a Pod spec similar to the following example:

apiVersion: cassandra.datastax.com/v1beta1
kind: Pod
metadata:
  name: my-db-pod
  labels:
    env: mytest
spec:
  containers:
  - name: my-db-pod
    image: my-db-pod
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd

If this local configuration file is named my-db-pod-fast-storage.yaml, and the namespace is cass-operator, then apply the YAML file by running:

kubectl apply -n cass-operator -f my-db-pod-fast-storage.yaml

For more information about nodeSelector options, see the Kubernetes documentation.

Encryption

Details

Kubernetes Operator for Apache Cassandra automates the creation of key stores and trust stores for client-to-node and internode encryption. For each datacenter created with the operator, credentials are injected into the stateful set using secrets with the name <datacenter-name>-keystore. To use client-to-node or internode encryption, reference only the injected keystores from the Cassandra parameters provided in the datacenter configuration. For an example, see the datacenter encryption test yamls.

Due to limitations of Kubernetes stateful sets, the current strategy primarily focuses on internode encryption with ca-only verification (peer name verification is not currently available). Peer verification can be achieved with init containers capable of leveraging external certificate issuance architecture to enable per-node and per-client peer name verification.

By storing the certificate authority (CA) in Kubernetes secrets, you can create secrets ahead of time from user-provided or organizational certificate authorities. Leverage a single CA across multiple datacenters by copying the secrets generated for one datacenter to the secondary datacenter prior to launching the secondary datacenter.

While you could change from encrypted internode communications to unencrypted internode communications and the reverse, this change as a rolling configuration is not currently supported. The entire cluster must be stopped and started to update these features.

Download, optionally customize, and apply the provisioning YAML

Details

If you have not already done so, download the example provisioning YAML file from the DataStax public GitHub repo and modify it as needed.

Customize the example YAML to suit your requirements. Save the file, for example, as cluster1-dc1.yaml. Use the kubectl command to apply the YAML file.

For example:

kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml

As Kubernetes Operator for Apache Cassandra proceeds with the specified deployment in your Kubernetes cluster, watch the list of pods. Completing a deployment takes several minutes per node. The best way to track the operator’s progress is to use the following commands and check the Status and Events values returned.

For example:

kubectl -n my-db-ns get pods

Sample output:

NAME                            READY   STATUS    RESTARTS   AGE
cass-operator-f74447c57-kdf2p   1/1     Running   0          13m
gke-cluster1-dc1-r1-sts-0       1/1     Running   0          5m38s
gke-cluster1-dc1-r2-sts-0       1/1     Running   0          42s
gke-cluster1-dc1-r3-sts-0       1/1     Running   0          6m7s
kubectl -n my-db-ns describe cassdc dc1

Sample output:

...
Status:
  Cassandra Operator Progress:  Updating
  Last Server Node Started:     2021-01-30T11:37:28Z
  Super User Upserted:          2021-01-30T11:38:37Z
Events:
  Type     Reason           Age                  From                Message
  ----     ------           ----                 ----                -------
  Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-dc1-service
  Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-seed-service
  Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-dc1-all-pods-service
  Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r1-sts
  Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r2-sts
  Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r3-sts

What’s next?

Learn how to use the provisioned cluster in the Kubernetes environment.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com