Provisioning a Cassandra Operator cluster in Kubernetes

Provision a Cassandra or DSE cluster in Kubernetes.

Complete the following procedure to provision Apache Cassandra® or DataStax Enterprise (DSE) in a Kubernetes cluster.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Prerequisites

If you have not yet created a Kubernetes cluster, see the tutorials on the kubernetes.io site.
Tip: On the Kubernetes page, expand the left sidebar to navigate between the 6+ interactive tutorials.

This provisioning topic assumes you've applied the example operator configuration, as shown in in the previous topic. Those steps resulted in creating a new resource type in your Kubernetes cluster, the CassandaDatacenter.

You can now define a cluster topology for the Cass Operator to create and monitor. In this topic, a three-node cluster is provisioned, with one datacenter made up of three racks, with a total of one node per rack. The examples also show how to indicate whether the serverType for your Kubernetes environment is cassandra or dse, plus related values.
Tip: For all the parameters presented in each step below, DataStax provides example YAML files that you can download and customize for your purposes:

Procedure

  1. Define the cluster and datacenter parameters
    A logical datacenter is the primary resource managed by the Cass Operator. Within a single Kubernetes namespace:
    • A single CassandaDatacenter resource defines a single-datacenter cluster.
    • Two or more CassandaDatacenter resources with different clusterName values define separate and unrelated single-datacenter clusters. Note the operator manages both clusters because they reside within the same Kubernetes namespace.
    • Two or more CassandaDatacenter resources that share the same clusterName define a multi-datacenter cluster. Cass Operator joins the instances in each datacenter into a logical topology that acts as a single cluster.
    The provisioning example in this topic defines a single-datacenter cluster. The cluster is named cluster1 and the datacenter is named dc1.
  2. Define the Rack parameters

    Cassandra and DSE are rack aware, and the racks parameter will configure the Cass Operator to set up pods in a rack-aware way.

    If you want to specify which availability zone a given rack belongs to, the Kubernetes worker nodes must have labels matching failure-domain.beta.kubernetes.io/zone. For more, see the Kubernetes documentation.

    Racks must have identifiers. In this topic, the example configuration identifies the racks as r1, r2, and r3. The number of racks should match the replication factor in the keyspaces you plan to create.
    CAUTION: The number of racks cannot easily be changed once the CassandraDatacenter is deployed.
  3. Define the node count parameters

    The size parameter is the number of nodes to run in the datacenter. A node is a worker machine in Kubernetes and may be a VM or physical machine, depending on the cluster. Multiple Pods can run on one Node.

    For optimal performance, DataStax recommends running only one Cassansdra or DSE server instance per Kubernetes worker node. The Cass Operator enforces that limit, and pods may get stuck in the Pending status if there are insufficient Kubernetes workers available.
    Tip: Use a kubectl describe node command to return information about the nodes in the Kubenetes cluster. Example:
    kubectl get node --selector='!node-role.kubernetes.io/master' 
    This topic and its examples assume you have at least three worker nodes available. If you're working on a minikube or other setup with a single Kubernetes worker node, you must reduce the size value accordingly, or set the allowMultipleNodesPerWorker parameter to true.
    Tip: For related information, see Resizing a cluster in the Kubernetes documentation.
  4. Define the storage parameters

    Define the storage with a combination of the previously provisioned storage class and size parameters. These values inform the storage provisioner how much room to require from the backend.

  5. Configure the database

    The config key in the CassandaDatacenter resource contains the parameters used to configure the server process running in each pod. In general, it's not necessary to specify any parameters here. Settings omitted from the config key receive reasonable default values, and it's common to run demo clusters with no custom configuration.

    If you're familiar with configuring Cassandra or DSE outside of containers on traditional operating systems, you may recognize that some familiar configuration parameters have been specified elsewhere in the CassandaDatacenter resource, outside of the config section. You should not repeat these parameters inside of the config section of the provisioning YAML; Cass Operator populates them automatically from the CassandaDatacenter resource. For example, you would not need to repeat:
    • cluster_name, which is normally specified in cassandra.yaml
    • The rack and datacenter parameters
    Similarly, Cass Operator automatically populates any values that are normally customized on a per Cassandra node basis. Do not specify these in the CassandaDatacenter resource. For example, in the basis key:
    • initial_token
    • listen_address and other ip-addresses.

    A large number of keys and values can be specified in the config section. The config key data structure resembles the DataStax OpsCenter Lifecycle Manager (LCM) Using configuration profiles. If needed, extrapolate the parameters to use from the LCM config profile options.

  6. View information about the superuser credentials
    By default, Cass Operator creates a cassandra superuser. A Kubernetes secret is created, named <clusterName>-superuser, which contains username and password keys.
    Note: If you want to define the superuser with custom credentials, skip to the next step.
    For example, assuming you already created your environment's CassandraDatacenter in the previous configuration topic for the operator, you can use the folowing commands to return information about the superuser:
    kubectl -n my-db-ns get secret cluster1-superuser
    NAME                       TYPE                                  DATA   AGE
    cluster1-superuser         Opaque                                2      13m
    kubectl -n my-db-ns get secret cluster1-superuser -o yaml
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
      name: cluster1-superuser
    data:
      password: d0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=
      username: Y2x1c3RlcjEtc3VwZXJ1c2Vy
    echo Y2x1c3RlcjEtc3VwZXJ1c2Vy | base64 -D
    cluster1-superuser
    echo 'd0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=' | base64 -D
    wH4QtZM84W5WlCBeZ8ZcjEeE0ltIuoZxLSI9jYljjAbuKYOVE564CRXJpckpb0+H9fJvNpwkHLYUO+NMu7PIEhafJW3U4Z+lvR5Sz0qHsZccDt4zxSHZsxtMpAb33WUd7GnHuA
  7. Define superuser with custom credentials
    To instead define the superuser with your own credentials, create a secret with kubectl. For example:
    kubectl create secret generic superuser-secret -f my-secret.yaml
    To use the new superuser secret, specify the name of the secret in the CassandaDatacenter configuration yaml that you load into the cluster. Example:
    apiVersion: datastax.com/v1beta1
    kind: CassandaDatacenter
    metadata:
      name: dtcntr
    spec:
      superuserSecretName: superuser-secret
    CAUTION: Never use passwords from documentation examples in your environment.
  8. Specify the server type and version.
    In the spec section of your configuration YAML for the CassandraDatacenter resource, specify:
    • serverType - required. The value must be cassandra or dse.
    • serverVersion - required. For serverType: dse, the value must be 6.8.0 or later; for serverType: cassandra, currently the value must be 3.11.6.
    • serverImage - optional, and only meant as an override.
      Tip: DataStax recommends that you not specify serverImage in the provisioning example file. When unspecified, Cass Operator provides the appropriate image path/filename and supported version.
    Using a default image; intentionally not specifying the serverImage:
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: dtcntr
    spec:
      serverType: dse
      serverVersion: 6.8.0
    If you're provisioning Apache Cassandra in the Kubernetes cluster, and want to use the default image, just change serverType from the example above to cassandra, and specify (currently) serverVersion: 3.11.6.
    Using a specific Cassandra image - for example:
    apiVersion: cassandra.datastax.com/v1beta1
    kind: CassandraDatacenter
    metadata:
      name: dtcntr
    spec:
      serverType: cassandra
      serverVersion: 3.11.6
      serverImage: my-private-docker-registry.example.com/datastax/cassandra-mgmtapi-3_11_6
    Using a specific DSE image - for example:
    apiVersion: datastax.com/v1beta1
    kind: CassandaDatacenter
    metadata:
      name: dtcntr
    spec:
      serverType: dse
      serverVersion: 6.8.0
      serverImage: my-private-docker-registry.example.com/datastax.dse-server:6.8.0-20200330
  9. Download, optionally customize, and apply the provisioning YAML
    If you haven't alread done so, download the example provisioning YAML from the DataStax public GitHub repo and modify as needed. Customize the example YAML to suit your requirements. Save the file, for example, as cluster1-dc1.yaml. Use the kubectl command to apply the YAML file. Example:
    kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml
    As Cass Operator proceeds with the specified deployment in your Kubernetes cluster, watch the list of pods. Completing a deployment may take several minutes per node. The best way to track the operator's progress is by using the following commands and checking the Status and Events. Example:
    kubectl -n my-db-ns get pods
    NAME                            READY   STATUS    RESTARTS   AGE
    cass-operator-f74447c57-kdf2p   1/1     Running   0          13m
    gke-cluster1-dc1-r1-sts-0       1/1     Running   0          5m38s
    gke-cluster1-dc1-r2-sts-0       1/1     Running   0          42s
    gke-cluster1-dc1-r3-sts-0       1/1     Running   0          6m7s
    kubectl -n my-db-ns describe cassdc dc1
    ...
    Status:
      Cassandra Operator Progress:  Updating
      Last Server Node Started:     2020-03-30T11:37:28Z
      Super User Upserted:          2020-03-30T11:38:37Z
    Events:
      Type     Reason           Age                  From                Message
      ----     ------           ----                 ----                -------
      Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-dc1-service
      Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-seed-service
      Normal   CreatedResource  9m49s                cassandra-operator  Created service cluster1-dc1-all-pods-service
      Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r1-sts
      Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r2-sts
      Normal   CreatedResource  9m49s                cassandra-operator  Created statefulset cluster1-dc1-r3-sts

What's next

Learn how to use the provisioned cluster in the Kubernetes environment.