Provision Cassandra or DSE in Kubernetes with Cassandra Operator

Provision Cassandra or DSE in Kubernetes with Cass Operator.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Complete the following procedure to provision Apache Cassandra® or DataStax Enterprise (DSE) in a Kubernetes cluster.

Tip: If you haven't already, create a Kubernetes cluster. For a walkthrough of the steps – especially if you're new to Kubernetes – see the Google Kubernetes Engine (GKE) cloud example in this guide's topic, Create a Kubernetes cluster.

This provisioning topic assumes you've applied the example operator configuration, as shown in the previous topic. Those steps resulted in creating a new resource type in your Kubernetes cluster, the CassandaDatacenter.

You can now define a cluster topology for the Cass Operator to create and monitor. In this topic, a three-node cluster is provisioned, with one datacenter made up of three racks, with a total of one node per rack. The examples also show how to indicate whether the serverType for your Kubernetes environment is cassandra or dse, plus related values.
Tip: For all the parameters presented in each step below, DataStax provides example YAML files that you can download and customize for your purposes:

Define the cluster and datacenter parameters

A logical datacenter is the primary resource managed by the Cass Operator. Within a single Kubernetes namespace:
  • A single CassandaDatacenter resource defines a single-datacenter cluster.
  • Two or more CassandaDatacenter resources with different clusterName values define separate and unrelated single-datacenter clusters. Note the operator manages both clusters because they reside within the same Kubernetes namespace.
  • Two or more CassandaDatacenter resources that share the same clusterName define a multi-datacenter cluster. Cass Operator joins the instances in each datacenter into a logical topology that acts as a single cluster.
The provisioning example in this topic defines a single-datacenter cluster. The cluster is named cluster1 and the datacenter is named dc1.

Define the Rack parameters

Cassandra and DSE are rack aware, and the racks parameter will configure the Cass Operator to set up pods in a rack-aware way.

If you want to specify which availability zone a given rack belongs to, the Kubernetes worker nodes must have labels matching failure-domain.beta.kubernetes.io/zone. For more, see the Kubernetes documentation.

Racks must have identifiers. In this topic, the example configuration identifies the racks as r1, r2, and r3. The number of racks should match the replication factor in the keyspaces you plan to create.
CAUTION: The number of racks cannot easily be changed once the CassandraDatacenter is deployed.

Define the node count parameters

The size parameter is the number of nodes to run in the datacenter. A node is a worker machine in Kubernetes and may be a VM or physical machine, depending on the cluster. Multiple Pods can run on one Node.

For optimal performance, DataStax recommends running only one Cassandra or DSE server instance per Kubernetes worker node. The Cass Operator enforces that limit, and pods may get stuck in the Pending status if there are insufficient Kubernetes workers available.
Tip: Use a kubectl describe node command to return information about the nodes in the Kubernetes cluster. Example:
kubectl get node --selector='!node-role.kubernetes.io/master' 
This topic and its examples assume you have at least three worker nodes available. If you're working on a minikube or other setup with a single Kubernetes worker node, you must reduce the size value accordingly, or set the allowMultipleNodesPerWorker parameter to true.
Tip: For related information, see Resizing a cluster in the Kubernetes documentation.

Define the storage parameters

Define the storage with a combination of the previously provisioned storage class and size parameters. These values inform the storage provisioner how much room to require from the backend.

Configure the database

The config key in the CassandaDatacenter resource contains the parameters used to configure the server process running in each pod. In general, it's not necessary to specify any parameters here. Settings omitted from the config key receive reasonable default values, and it's common to run demo clusters with no custom configuration.

If you're familiar with configuring Cassandra or DSE outside of containers on traditional operating systems, you may recognize that some familiar configuration parameters have been specified elsewhere in the CassandaDatacenter resource, outside of the config section. You should not repeat these parameters inside of the config section of the provisioning YAML; Cass Operator populates them automatically from the CassandaDatacenter resource. For example, you would not need to repeat:
  • cluster_name, which is normally specified in cassandra.yaml
  • The rack and datacenter parameters
Similarly, Cass Operator automatically populates any values that are normally customized on a per Cassandra node basis. Do not specify these in the CassandaDatacenter resource. For example, in the basis key:
  • initial_token
  • listen_address and other ip-addresses.

A large number of keys and values can be specified in the config section. The config key data structure resembles the DataStax OpsCenter Lifecycle Manager (LCM) configuration profiles. If needed, extrapolate the parameters to use from the LCM config profile options.

View information about the superuser credentials

By default, Cass Operator creates a Cassandra superuser. A Kubernetes secret is created, named <clusterName>-superuser, which contains username and password keys.
Note: If you want to define the superuser with custom credentials, skip to the next step.
For example, assuming you already created your environment's CassandraDatacenter in the previous configuration topic for the operator, you can use the following commands to return information about the superuser:
kubectl -n my-db-ns get secret cluster1-superuser
NAME                       TYPE                                  DATA   AGE
cluster1-superuser         Opaque                                2      13m
kubectl -n my-db-ns get secret cluster1-superuser -o yaml
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: cluster1-superuser
data:
  password: d0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=
  username: Y2x1c3RlcjEtc3VwZXJ1c2Vy
echo Y2x1c3RlcjEtc3VwZXJ1c2Vy | base64 -d
cluster1-superuser
echo 'd0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=' | base64 -d
wH4QtZM84W5WlCBeZ8ZcjEeE0ltIuoZxLSI9jYljjAbuKYOVE564CRXJpckpb0+H9fJvNpwkHLYUO+NMu7PIEhafJW3U4Z+lvR5Sz0qHsZccDt4zxSHZsxtMpAb33WUd7GnHuA

Define superuser with custom credentials

To instead define the superuser with your own credentials, create a secret with kubectl. For example:
kubectl create secret generic superuser-secret -f my-secret.yaml
To use the new superuser secret, specify the name of the secret in the CassandaDatacenter configuration yaml that you load into the cluster. Example:
apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
  name: dtcntr
spec:
  superuserSecretName: superuser-secret

Specify the server type and version.

In the spec section of your configuration YAML for the CassandraDatacenter resource, specify:
  • serverType - required. The value must be cassandra or dse.
  • serverVersion - required. For serverType: dse, the value must be 6.8.0 or later; for serverType: cassandra, currently the value is 3.11.7.
  • serverImage - optional, and only meant as an override.
    Tip: DataStax recommends that you not specify serverImage in the provisioning example file. When unspecified, Cass Operator provides the appropriate image path/filename and supported version.
Using a default image; intentionally not specifying the serverImage:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dtcntr
spec:
  serverType: dse
  serverVersion: 6.8.2
If you're provisioning Apache Cassandra in the Kubernetes cluster, and want to use the default image, just change serverType from the example above to cassandra, and specify (currently) serverVersion: 3.11.7.
Using a specific Cassandra image - for example:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dtcntr
spec:
  serverType: cassandra
  serverVersion: 3.11.7
  serverImage: my-private-docker-registry.example.com/datastax/cassandra-mgmtapi-3_11_7
Using a specific DSE image - for example:
apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
  name: dtcntr
spec:
  serverType: dse
  serverVersion: 6.8.2
  serverImage: my-private-docker-registry.example.com/datastax.dse-server:6.8.2-20200731

Configure a NodePort service

Request a NodePort service in a CassandraDatacenter configuration YAML by setting the following fields:
spec:
  networking: 
    nodePort: 
      cql: 30001 
      broadcast: 30002
To request the SSL versions of the ports:
spec:
  networking: 
    nodePort: 
      cqlSSL: 30010 broadcast
      SSL: 300202
If any of the nodePort fields have been configured, a NodePort service will be created that routes from the specified external port to the identically numbered internal port. Cassandra will be configured to listen on the specified ports.
Full example of a nodeport-service-dc.yaml:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
  name: dc1
spec:
  clusterName: cluster1
  serverType: dse
  serverVersion: "6.8.2"
  managementApiAuth:
    insecure: {}
  networking:
    nodePort:
      cql: 30001
      broadcast: 30002
  size: 2
  storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: server-storage
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  racks:
    - name: r1
    - name: r2
  config:
    jvm-server-options:
      initial_heap_size: "800m"
      max_heap_size: "800m"
    cassandra-yaml:
      file_cache_size_in_mb: 100
      memtable_space_in_mb: 100

Configure a nodeSelector

Use node selectors to pin pods to labeled k8s worker nodes in the Kubernetes cluster. Define node selectors in a Pod spec. Example:
apiVersion: cassandra.datastax.com/v1beta1
kind: Pod
metadata:
  name: my-db-pod
  labels:
    env: mytest
spec:
  containers:
  - name: my-db-pod
    image: my-db-pod
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd
For example, if this local configuration file is named my-db-pod-fast-storage.yaml, and the namespace is cass-operator:
kubectl apply -n cass-operator -f my-db-pod-fast-storage.yaml

For more about nodeSelector options, see the Kubernetes documentation.

Encryption

Cass Operator automates the creation of key stores and trust stores for client-to-node and internode encryption. For each datacenter created with the operator, credentials are injected into the stateful set via secrets with the name datacenter-name-keystore. In order to use client-to-node or internode encryption, it is only necessary to reference the injected keystores from the Cassandra parameters provided in the datacenter configuration. See the example in the datacenter encryption test yamls.

Due to limitations of kubernetes stateful sets, the current strategy primarily focuses on internode encryption with ca-only verification (peer name verification is not currently available). Peer verification can be achieved with init containers, which may be able to leverage external certificate issuance architecture to enable per-node and per-client peer name verification.

By storing the certificate authority in kubernetes secrets, it is possible to create secrets ahead of time from user-provided or organizational certificate authorities. It is also possible to leverage a single CA across multiple datacenters, by copying the secrets generated for one datacenter to the secondary datacenter prior to launching the secondary datacenter.

While you could change from encrypted internode communications to unencrypted internode communications and the reverse, this change as a rolling configuration is not currently supported. The entire cluster must be stopped and started to update these features.

Download, optionally customize, and apply the provisioning YAML

If you haven't already done so, download the example provisioning YAML from the DataStax public GitHub repo and modify as needed. Customize the example YAML to suit your requirements. Save the file, for example, as cluster1-dc1.yaml. Use the kubectl command to apply the YAML file. Example:
kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml
As Cass Operator proceeds with the specified deployment in your Kubernetes cluster, watch the list of pods. Completing a deployment may take several minutes per node. The best way to track the operator's progress is by using the following commands and checking the Status and Events. Example:
kubectl -n my-db-ns get pods
NAME                            READY   STATUS    RESTARTS   AGE
cass-operator-f74447c57-kdf2p   1/1     Running   0          13m
gke-cluster1-dc1-r1-sts-0       1/1     Running   0          5m38s
gke-cluster1-dc1-r2-sts-0       1/1     Running   0          42s
gke-cluster1-dc1-r3-sts-0       1/1     Running   0          6m7s
kubectl -n my-db-ns describe cassdc dc1
...
Status:
  Cassandra Operator Progress:  Updating
  Last Server Node Started:     2020-07-30T11:37:28Z
  Super User Upserted:          2020-07-30T11:38:37Z
Events:
  Type     Reason           Age                  From                Message
  ----     ------           ----                 ----                -------
  Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-dc1-service
  Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-seed-service
  Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-dc1-all-pods-service
  Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r1-sts
  Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r2-sts
  Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r3-sts

What's next?

Learn how to use the provisioned cluster in the Kubernetes environment.