Provision Cassandra or DSE in Kubernetes with Kubernetes Operator for Apache Cassandra
Complete the following procedure to provision Apache Cassandra® or DataStax Enterprise (DSE) in a Kubernetes cluster.
After you apply the Kubernetes Operator for Apache Cassandra configuration, you can use these steps to define a cluster topology for the Kubernetes Operator for Apache Cassandra to create and monitor.
In this guide, a three-node cluster is provisioned, with one datacenter made up of three racks, with one node per rack.
The steps show how to indicate whether the serverType for your Kubernetes environment is cassandra or dse, plus related values.
|
For all the parameters in this guide, DataStax provides example YAML files that you can download and customize for your purposes. |
Define the cluster and datacenter parameters
A logical datacenter is the primary resource managed by the Kubernetes Operator for Apache Cassandra. Within a single Kubernetes namespace:
-
A single
CassandaDatacenterresource defines a single-datacenter cluster. -
Two or more
CassandaDatacenterresources with differentclusterNamevalues define separate and unrelated single-datacenter clusters. Kubernetes Operator for Apache Cassandra manages both clusters because they reside within the same Kubernetes namespace. -
Two or more
CassandaDatacenterresources that share the sameclusterNamedefine a multi-datacenter cluster. Kubernetes Operator for Apache Cassandra joins the instances in each datacenter into a logical topology that acts as a single cluster.
The provisioning example in this topic defines a single-datacenter cluster. The cluster name is cluster1 and the datacenter name is dc1.
Define the Rack parameters
Cassandra and DSE are rack aware, and the racks parameter configures the Kubernetes Operator for Apache Cassandra to set up pods in a rack-aware way.
To specify the availability zone for a given rack, the Kubernetes worker nodes must have labels matching failure-domain.beta.kubernetes.io/zone. For more, see the Kubernetes documentation.
Racks must have identifiers. In this topic, the example configuration identifies the racks as r1, r2, and r3. Ensure the number of racks match the replication factor in the keyspaces you plan to create.
The number of racks cannot easily be changed after the CassandraDatacenter resource is deployed.
|
Define the node count parameters
The size parameter is the number of nodes to run in the datacenter. A node is a worker machine in Kubernetes and may be a virtual machine (VM) or a physical machine, depending on the cluster. Multiple pods can run on one node.
| For optimal performance, DataStax recommends running only one Cassandra or DSE server instance per Kubernetes worker node. The Kubernetes Operator for Apache Cassandra enforces that limit, and pods may get stuck in the Pending status if there are insufficient Kubernetes workers available. |
Use a kubectl describe node command to return information about the nodes in the Kubernetes cluster.
|
For example:
kubectl get node --selector='!node-role.kubernetes.io/master'
This topic and its examples assume you have at least three worker nodes available. If you are working on a minikube or other setup with a single Kubernetes worker node, then you must reduce the size value accordingly or set the allowMultipleNodesPerWorker parameter to true.
Define the storage parameters
Define the storage with a combination of the previously provisioned storage class and size parameters. These values inform the storage provisioner how much room to require from the backend.
Configure the database
The config key in the CassandaDatacenter resource contains the parameters used to configure the server process running in each pod. In general, it is not necessary to specify any parameters here. Settings omitted from the config key receive reasonable default values, and it is common to run demo clusters with no custom configuration.
If you are familiar with configuring Cassandra or DSE outside of containers on traditional operating systems, then you recognize that some familiar configuration parameters have been specified elsewhere in the CassandaDatacenter resource, outside of the config section. Do not repeat these parameters inside of the config section of the provisioning YAML file; Kubernetes Operator for Apache Cassandra populates them automatically from the CassandaDatacenter resource. For example:
-
cluster_nameparameter, which is normally specified in thecassandra.yamlfile. The location of this file depends on the type of installation:-
Package installations:
/etc/dse/cassandra/cassandra.yaml -
Tarball installations:
<install_location>/resources/cassandra/conf/cassandra.yaml
-
-
rackanddatacenterparameters.
Similarly, Kubernetes Operator for Apache Cassandra automatically populates any values that are normally customized on a per Cassandra node basis. Do not specify these in the CassandaDatacenter resource. For example, in the basis key, don’t specify initial_token, listen_address, or other ip-addresses.
A large number of keys and values can be specified in the config section. The config key data structure resembles the DataStax OpsCenter Lifecycle Manager (LCM) configuration profiles. If needed, extrapolate the parameters to use from the LCM config profile options.
View information about the superuser credentials
By default, Kubernetes Operator for Apache Cassandra creates a Cassandra superuser. A Kubernetes secret is created, named <clusterName>-superuser, which contains username and password keys.
To define the superuser with custom credentials, skip to Define superuser with custom credentials
|
The following example assumes that you already created your environment’s CassandaDatacenter resource for the Kubernetes Operator for Apache Cassandra in the previous configuration topic, and uses the following kubectl commands to return information about the superuser:
kubectl -n my-db-ns get secret cluster1-superuser
Sample output:
NAME TYPE DATA AGE cluster1-superuser Opaque 2 13m
kubectl -n my-db-ns get secret cluster1-superuser -o yaml
Sample output:
apiVersion: v1 kind: Secret type: Opaque metadata: name: cluster1-superuser data: password: d0...c2Vy
echo Y2x1c3RlcjEtc3VwZXJ1c2Vy | base64 -d
Sample output:
cluster1-superuser
echo 'd0g0U...IdUE=' | base64 -d
Sample output:
wH4QtZ....GnHuA
Define superuser with custom credentials
To define the superuser with your own custom credentials rather than take the default values, create a secret with kubectl.
For example:
kubectl create secret generic superuser-secret -f my-secret.yaml
To use the new superuser secret, specify the name of the secret in the CassandaDatacenter configuration YAML file that you load into the cluster.
For example:
apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
name: dtcntr
spec:
superuserSecretName: superuser-secret
Specify the server type and version
In the spec section of your YAML configuration file for the CassandraDatacenter resource, specify:
-
(Required) serverType: the value must be cassandra or dse.
-
(Required) serverVersion:
-
If
serverType: dse, then the version value must be 6.8.0 or later. -
If
serverType: cassandra, then the version value must be 3.11.7 or later.
-
-
(Optional) serverImage: Only meant as an override.
DataStax recommends that you don’t specify serverImage in the provisioning example file.
When unspecified, Kubernetes Operator for Apache Cassandra provides the appropriate image path, filename, and supported version.
- Use a default image; intentionally not specifying the
serverImage -
apiVersion: cassandra.datastax.com/v1beta1 kind: CassandraDatacenter metadata: name: dtcntr spec: serverType: dse serverVersion: 6.8.4When you are provisioning Apache Cassandra in the Kubernetes cluster and want to use the default image, set
serverType: cassandra, and then specify theserverVersionfor your version of Cassandra, such asserverVersion: 3.11.7.This approach isn’t supported for all versions of Cassandra. Version 3.11.8 and later require the fully qualified image name, as shown in Use a specific Cassandra image.
- Use a specific Cassandra image
-
apiVersion: cassandra.datastax.com/v1beta1 kind: CassandraDatacenter metadata: name: dtcntr spec: serverType: cassandra serverVersion: 3.11.7 serverImage: datastax/cassandra-mgmtapi-3_11_9:v0.1.23Set the version in
serverVersion, and then provide the fully qualified image name inserverImage.If you’re using a private registry, provide the fully qualified image name within your registry, such as
my-private-docker-registry.example.com/datastax/cassandra-mgmtapi-3_11_9:v0.1.23.Check the DataStax Docker Hub server images for the latest build numbers, such as v0.1.23, for each release.
management-apiimages taggedv0.1.22orv0.1.23are built to run Cassandra asnon-root. However, by default, Kubernetes Operator for Apache Cassandra assumes Cassandra is run asroot. For this reason, you must also setdockerImageRunsAsCassandra: truein theCassandraDatacenterspec. This setting prompts cass-operator to add aPodSecurityContextthat explicitly configures the Cassandra pod to run asnon-rootuser/group. - Use a specific DSE image
-
apiVersion: datastax.com/v1beta1 kind: CassandaDatacenter metadata: name: dtcntr spec: serverType: dse serverVersion: 6.8.4 serverImage: my-private-docker-registry.example.com/datastax.dse-server:6.8.4-20200731
Configure a NodePort service
Request a NodePort service in a CassandraDatacenter configuration YAML by setting the following fields:
spec:
networking:
nodePort:
cql: 30001
broadcast: 30002
To request the SSL versions of the ports:
spec:
networking:
nodePort:
cqlSSL: 30010 broadcast
SSL: 300202
If any of the nodePort fields are already configured, a NodePort service is created that routes from the specified external port to the identically numbered internal port. Cassandra is configured to listen on the specified ports.
Full example of a nodeport-service-dc.yaml file:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: dc1
spec:
clusterName: cluster1
serverType: dse
serverVersion: "6.8.4"
managementApiAuth:
insecure: {}
networking:
nodePort:
cql: 30001
broadcast: 30002
size: 2
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: server-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
racks:
- name: r1
- name: r2
config:
jvm-server-options:
initial_heap_size: "800m"
max_heap_size: "800m"
cassandra-yaml:
file_cache_size_in_mb: 100
memtable_space_in_mb: 100
Configure a nodeSelector
To pin pods to labeled K8s worker nodes in the Kubernetes cluster, use node selectors.
Define node selectors in a Pod spec similar to the following example:
apiVersion: cassandra.datastax.com/v1beta1
kind: Pod
metadata:
name: my-db-pod
labels:
env: mytest
spec:
containers:
- name: my-db-pod
image: my-db-pod
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
If this local configuration file is named my-db-pod-fast-storage.yaml, and the namespace is cass-operator, then apply the YAML file by running:
kubectl apply -n cass-operator -f my-db-pod-fast-storage.yaml
Encryption
Kubernetes Operator for Apache Cassandra automates the creation of key stores and trust stores for client-to-node and internode encryption. For each datacenter created with the operator, credentials are injected into the stateful set using secrets with the name <datacenter-name>-keystore. To use client-to-node or internode encryption, reference only the injected keystores from the Cassandra parameters provided in the datacenter configuration. For an example, see the datacenter encryption test yamls.
Due to limitations of Kubernetes stateful sets, the current strategy primarily focuses on internode encryption with ca-only verification (peer name verification is not currently available). Peer verification can be achieved with init containers capable of leveraging external certificate issuance architecture to enable per-node and per-client peer name verification.
By storing the certificate authority (CA) in Kubernetes secrets, you can create secrets ahead of time from user-provided or organizational certificate authorities. Leverage a single CA across multiple datacenters by copying the secrets generated for one datacenter to the secondary datacenter prior to launching the secondary datacenter.
While you could change from encrypted internode communications to unencrypted internode communications and the reverse, this change as a rolling configuration is not currently supported. The entire cluster must be stopped and started to update these features.
Download, optionally customize, and apply the provisioning YAML
If you have not already done so, download the example provisioning YAML file from the DataStax public GitHub repo and modify it as needed.
Customize the example YAML to suit your requirements. Save the file, for example, as cluster1-dc1.yaml. Use the kubectl command to apply the YAML file.
For example:
kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml
As Kubernetes Operator for Apache Cassandra proceeds with the specified deployment in your Kubernetes cluster, watch the list of pods. Completing a deployment takes several minutes per node. The best way to track the operator’s progress is to use the following commands and check the Status and Events values returned.
For example:
kubectl -n my-db-ns get pods
Sample output:
NAME READY STATUS RESTARTS AGE cass-operator-f74447c57-kdf2p 1/1 Running 0 13m gke-cluster1-dc1-r1-sts-0 1/1 Running 0 5m38s gke-cluster1-dc1-r2-sts-0 1/1 Running 0 42s gke-cluster1-dc1-r3-sts-0 1/1 Running 0 6m7s
kubectl -n my-db-ns describe cassdc dc1
Sample output:
... Status: Cassandra Operator Progress: Updating Last Server Node Started: 2021-01-30T11:37:28Z Super User Upserted: 2021-01-30T11:38:37Z Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatedResource 9m49s cassandra-operator Created service cluster1-dc1-service Normal CreatedResource 9m49s cassandra-operator Created service cluster1-seed-service Normal CreatedResource 9m49s cassandra-operator Created service cluster1-dc1-all-pods-service Normal CreatedResource 9m49s cassandra-operator Created statefulset cluster1-dc1-r1-sts Normal CreatedResource 9m49s cassandra-operator Created statefulset cluster1-dc1-r2-sts Normal CreatedResource 9m49s cassandra-operator Created statefulset cluster1-dc1-r3-sts