Provision Cassandra or DSE in Kubernetes with Kubernetes Operator for Apache Cassandra
Complete the following procedure to provision Apache Cassandra® or DataStax Enterprise (DSE) in a Kubernetes cluster.
Prerequisites
If you have not already, Create a Kubernetes cluster. |
Application of the example operator configuration, through the steps shown in
Configure Cassandra or DSE in Kubernetes with Kubernetes Operator for Apache Cassandra. These steps create a new resource type in your Kubernetes cluster named CassandraDatacenter
.
Procedure
With the Kubernetes Operator for Apache Cassandra configuration applied, you can now define a cluster topology for the Kubernetes Operator for Apache Cassandra to create and monitor. In this topic, a three-node cluster is provisioned, with one datacenter made up of three racks, with one node per rack. The steps show how to indicate whether the serverType
for your Kubernetes environment is cassandra
or dse
, plus related values.
Define the cluster and datacenter parameters
Details
A logical datacenter is the primary resource managed by the Kubernetes Operator for Apache Cassandra. Within a single Kubernetes namespace:
-
A single
CassandaDatacenter
resource defines a single-datacenter cluster. -
Two or more
CassandaDatacenter
resources with differentclusterName
values define separate and unrelated single-datacenter clusters. Kubernetes Operator for Apache Cassandra manages both clusters because they reside within the same Kubernetes namespace. -
Two or more
CassandaDatacenter
resources that share the sameclusterName
define a multi-datacenter cluster. Kubernetes Operator for Apache Cassandra joins the instances in each datacenter into a logical topology that acts as a single cluster.
The provisioning example in this topic defines a single-datacenter cluster. The cluster name is cluster1
and the datacenter name is dc1
.
Define the Rack parameters
Details
Cassandra and DSE are rack aware, and the racks
parameter configures the Kubernetes Operator for Apache Cassandra to set up pods in a rack-aware way.
To specify the availability zone for a given rack, the Kubernetes worker nodes must have labels matching failure-domain.beta.kubernetes.io/zone
. For more, see the Kubernetes documentation.
Racks must have identifiers. In this topic, the example configuration identifies the racks as r1
, r2
, and r3
. Ensure the number of racks match the replication factor in the keyspaces you plan to create.
The number of racks cannot easily be changed after the CassandraDatacenter resource is deployed.
|
Define the node count parameters
Details
The size
parameter is the number of nodes to run in the datacenter. A node is a worker machine in Kubernetes and may be a virtual machine (VM) or a physical machine, depending on the cluster. Multiple pods can run on one node.
For optimal performance, DataStax recommends running only one Cassandra or DSE server instance per Kubernetes worker node. The Kubernetes Operator for Apache Cassandra enforces that limit, and pods may get stuck in the Pending status if there are insufficient Kubernetes workers available. |
Use a kubectl describe node command to return information about the nodes in the Kubernetes cluster.
|
For example:
kubectl get node --selector='!node-role.kubernetes.io/master'
This topic and its examples assume you have at least three worker nodes available. If you are working on a minikube or other setup with a single Kubernetes worker node, then you must reduce the size
value accordingly or set the allowMultipleNodesPerWorker
parameter to true
.
Define the storage parameters
Define the storage with a combination of the previously provisioned storage class and size parameters. These values inform the storage provisioner how much room to require from the backend.
Configure the database
Details
The config key
in the CassandaDatacenter
resource contains the parameters used to configure the server process running in each pod. In general, it is not necessary to specify any parameters here. Settings omitted from the config key
receive reasonable default values, and it is common to run demo clusters with no custom configuration.
If you are familiar with configuring Cassandra or DSE outside of containers on traditional operating systems, then you recognize that some familiar configuration parameters have been specified elsewhere in the CassandaDatacenter
resource, outside of the config
section. Do not repeat these parameters inside of the config
section of the provisioning YAML file; Kubernetes Operator for Apache Cassandra populates them automatically from the CassandaDatacenter
resource. For example:
-
cluster_name
parameter, which is normally specified in thecassandra.yaml
file.The location of this file depends on the type of installation:
-
Package installations:
/etc/dse/cassandra/cassandra.yaml
-
Tarball installations:
<install_location>/resources/cassandra/conf/cassandra.yaml
-
-
rack
anddatacenter
parameters.
Similarly, Kubernetes Operator for Apache Cassandra automatically populates any values that are normally customized on a per Cassandra node basis. Do not specify these in the CassandaDatacenter
resource. For example, do not specify in the basis
key:
-
initial_token
-
listen_address
and otherip-addresses
.
A large number of keys and values can be specified in the config
section. The config key
data structure resembles the DataStax OpsCenter Lifecycle Manager (LCM) configuration profiles. If needed, extrapolate the parameters to use from the LCM config profile options.
View information about the superuser
credentials
Details
By default, Kubernetes Operator for Apache Cassandra creates a Cassandra superuser
. A Kubernetes secret is created, named <clusterName>-superuser
, which contains username
and password
keys.
To define the superuser with custom credentials, skip to Define superuser with custom credentials
|
The following example assumes that you already created your environment’s CassandaDatacenter
resource for the Kubernetes Operator for Apache Cassandra in the previous configuration topic, and uses the following kubectl
commands to return information about the superuser
:
kubectl -n my-db-ns get secret cluster1-superuser
Sample output:
NAME TYPE DATA AGE cluster1-superuser Opaque 2 13m
kubectl -n my-db-ns get secret cluster1-superuser -o yaml
Sample output:
apiVersion: v1 kind: Secret type: Opaque metadata: name: cluster1-superuser data: password: d0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE= username: Y2x1c3RlcjEtc3VwZXJ1c2Vy
echo Y2x1c3RlcjEtc3VwZXJ1c2Vy | base64 -d
Sample output:
cluster1-superuser
echo 'd0g0UXRaTTg0VzVXbENCZVo4WmNqRWVFMGx0SXVvWnhMU0k5allsampBYnVLWU9WRTU2NENSWEpwY2twYjArSDlmSnZOcHdrSExZVU8rTk11N1BJRWhhZkpXM1U0WitsdlI1U3owcUhzWmNjRHQ0enhTSFpzeHRNcEFiMzNXVWQ3R25IdUE=' | base64 -d
Sample output:
wH4QtZM84W5WlCBeZ8ZcjEeE0ltIuoZxLSI9jYljjAbuKYOVE564CRXJpckpb0+H9fJvNpwkHLYUO+NMu7PIEhafJW3U4Z+lvR5Sz0qHsZccDt4zxSHZsxtMpAb33WUd7GnHuA
Define superuser
with custom credentials
Details
To define the superuser
with your own custom credentials rather than take the default values, create a secret with kubectl
.
For example:
kubectl create secret generic superuser-secret -f my-secret.yaml
To use the new superuser
secret, specify the name of the secret in the CassandaDatacenter
configuration YAML file that you load into the cluster.
For example:
apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
name: dtcntr
spec:
superuserSecretName: superuser-secret
Specify the server type and version
Details
In the spec
section of your YAML configuration file for the CassandraDatacenter
resource, specify:
-
(Required) serverType - the value must be cassandra or dse.
-
(Required) serverVersion. For:
-
serverType: dse
, the version value must be 6.8.0 or later. -
serverType: cassandra
, the version value is 3.11.7 or later.
-
-
(Optional) serverImage - only meant as an override.
DataStax recommends that you do not specify serverImage in the provisioning example file. When unspecified, Kubernetes Operator for Apache Cassandra provides the appropriate image path, filename, and supported version.
|
Using a default image; intentionally not specifying the serverImage
:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: dtcntr
spec:
serverType: dse
serverVersion: 6.8.4
When you are provisioning Apache Cassandra in the Kubernetes cluster and want to use the default image, change serverType: dse
from the previous example to serverType: cassandra
, and specify serverVersion
: 3.11.7.
Using a specific Cassandra image:
For example:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: dtcntr
spec:
serverType: cassandra
serverVersion: 3.11.7
serverImage: my-private-docker-registry.example.com/datastax/cassandra-mgmtapi-3_11_7
Using a specific DSE image:
For example:
apiVersion: datastax.com/v1beta1
kind: CassandaDatacenter
metadata:
name: dtcntr
spec:
serverType: dse
serverVersion: 6.8.4
serverImage: my-private-docker-registry.example.com/datastax.dse-server:6.8.4-20200731
Configure a NodePort service
Details
Request a NodePort service in a CassandraDatacenter
configuration YAML by setting the following fields:
spec:
networking:
nodePort:
cql: 30001
broadcast: 30002
To request the SSL versions of the ports:
spec:
networking:
nodePort:
cqlSSL: 30010 broadcast
SSL: 300202
If any of the nodePort
fields are already configured, a NodePort service
is created that routes from the specified external port to the identically numbered internal port. Cassandra is configured to listen on the specified ports.
Full example of a nodeport-service-dc.yaml
file:
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: dc1
spec:
clusterName: cluster1
serverType: dse
serverVersion: "6.8.4"
managementApiAuth:
insecure: {}
networking:
nodePort:
cql: 30001
broadcast: 30002
size: 2
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: server-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
racks:
- name: r1
- name: r2
config:
jvm-server-options:
initial_heap_size: "800m"
max_heap_size: "800m"
cassandra-yaml:
file_cache_size_in_mb: 100
memtable_space_in_mb: 100
Configure a nodeSelector
Details
To pin pods to labeled K8s worker nodes in the Kubernetes cluster, use node selectors. Define node selectors in a Pod
spec similar to the following example:
apiVersion: cassandra.datastax.com/v1beta1
kind: Pod
metadata:
name: my-db-pod
labels:
env: mytest
spec:
containers:
- name: my-db-pod
image: my-db-pod
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
If this local configuration file is named my-db-pod-fast-storage.yaml
, and the namespace
is cass-operator
, then apply the YAML file by running:
kubectl apply -n cass-operator -f my-db-pod-fast-storage.yaml
For more information about nodeSelector
options, see the Kubernetes documentation.
Encryption
Details
Kubernetes Operator for Apache Cassandra automates the creation of key stores and trust stores for client-to-node and internode encryption. For each datacenter created with the operator, credentials are injected into the stateful set using secrets with the name <datacenter-name>-keystore
. To use client-to-node or internode encryption, reference only the injected keystores from the Cassandra parameters provided in the datacenter configuration. For an example, see the datacenter encryption test yamls.
Due to limitations of Kubernetes stateful sets, the current strategy primarily focuses on internode encryption with ca-only
verification (peer name verification is not currently available). Peer verification can be achieved with init containers capable of leveraging external certificate issuance architecture to enable per-node and per-client peer name verification.
By storing the certificate authority (CA) in Kubernetes secrets, you can create secrets ahead of time from user-provided or organizational certificate authorities. Leverage a single CA across multiple datacenters by copying the secrets generated for one datacenter to the secondary datacenter prior to launching the secondary datacenter.
While you could change from encrypted internode communications to unencrypted internode communications and the reverse, this change as a rolling configuration is not currently supported. The entire cluster must be stopped and started to update these features.
Download, optionally customize, and apply the provisioning YAML
Details
If you have not already done so, download the example provisioning YAML file from the DataStax public GitHub repo and modify it as needed.
Customize the example YAML to suit your requirements. Save the file, for example, as cluster1-dc1.yaml
. Use the kubectl
command to apply the YAML file.
For example:
kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml
As Kubernetes Operator for Apache Cassandra proceeds with the specified deployment in your Kubernetes cluster, watch the list of pods. Completing a deployment takes several minutes per node. The best way to track the operator’s progress is to use the following commands and check the Status and Events values returned.
For example:
kubectl -n my-db-ns get pods
Sample output:
NAME READY STATUS RESTARTS AGE cass-operator-f74447c57-kdf2p 1/1 Running 0 13m gke-cluster1-dc1-r1-sts-0 1/1 Running 0 5m38s gke-cluster1-dc1-r2-sts-0 1/1 Running 0 42s gke-cluster1-dc1-r3-sts-0 1/1 Running 0 6m7s
kubectl -n my-db-ns describe cassdc dc1
Sample output:
... Status: Cassandra Operator Progress: Updating Last Server Node Started: 2021-01-30T11:37:28Z Super User Upserted: 2021-01-30T11:38:37Z Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatedResource 9m49s cassandra-operator Created service cluster1-dc1-service Normal CreatedResource 9m49s cassandra-operator Created service cluster1-seed-service Normal CreatedResource 9m49s cassandra-operator Created service cluster1-dc1-all-pods-service Normal CreatedResource 9m49s cassandra-operator Created statefulset cluster1-dc1-r1-sts Normal CreatedResource 9m49s cassandra-operator Created statefulset cluster1-dc1-r2-sts Normal CreatedResource 9m49s cassandra-operator Created statefulset cluster1-dc1-r3-sts
What’s next?
Learn how to use the provisioned cluster in the Kubernetes environment.