Use Cassandra or DSE in Kubernetes with Cassandra Operator

Use Cassandra or DSE in Kubernetes with Cass Operator.

This topic explains how to use your configured and provisioned Apache Cassandra® or DSE cluster in Kubernetes.

Tip: If you haven't already, create a Kubernetes cluster. For a walkthrough of the steps – especially if you're new to Kubernetes – see the Google Kubernetes Engine (GKE) cloud example in this guide's topic, Create a Kubernetes cluster.

This topic assumes you've completed the steps to configure the Cass Operator, and to provision and deploy Cassandra or DSE cluster in your existing Kubernetes environment.

Connecting from inside the Kubernetes cluster

For an example of invoking cqlsh from inside a Kubernetes cluster, refer to Connect to Cassandra via cqlsh within Kubernetes cluster.

Cass Operator makes a Kubernetes headless service available at <clusterName>-<datacenterName>-service. Any client app that submits CQL commands inside the Kubernetes cluster should be able to connect to (for example) cluster1-dc1-service.my-db-cluster and use the nodes in a round-robin fashion as contact points.

When you use the DataStax Java driver in an app, and try connecting the driver to a DSE cluster from within the Kubernetes cluster, there are two overall choices. The Java driver accepts:
  • multiple Inet[Socket]Address parameters to connect
  • or one Inet[Socket]Address parameter to connect, in which case the Java driver uses it as the control connection, and talks directly to the cluster to discover the other nodes, and connects to them

However, with version 4.x of the DataStax Java driver, when you specify a hostname in the contact points definition in the config file, the Java driver will first resolve all the hosts associated with the hostname. In the case of the cluster deployed by the Cassandra Operator, using the hostname of the Kubernetes service (example: cluster1-dc1-service) resolves to all the IP addresses associated with the DSE cluster; that is, all the IPs of the DSE nodes. The DataStax Java driver chooses one of those nodes as the control connection, connects to the other resolved nodes, and performs cluster verification to all its connected local DSE nodes.

For example, if you are programmatically configuring the DataStax Java driver, you app may use one of the following:
  • InetAddress.getByName("cluster1-dc1-service") to resolve only one host, which the driver will use at init time, and then connect to DSE to discover the rest of the nodes.
  • Or use InetAddress.getAllByName("cluster1-dc1-service"), which will resolve to all the nodes directly, and the driver uses this setting as if you had specified the multiple IP addresses of the nodes in the contact points.

Connecting from outside the Kubernetes cluster

When applications run within a Kubernetes cluster, you'll need to access those services from outside the cluster. Connecting to a Cassandra cluster running within Kubernetes can range from trivial to complex, depending on where the client is running, latency requirements, and security requirements. See Connect to Cassandra and apps from outside the Kubernetes cluster.

Scaling up the datacenter

The size parameter on the CassandaDatacenter determines how many Cassandra or DSE instances are present in the datacenter. To add more nodes, edit the YAML file that is described in this steps of the provisioning topic. Then reapply the CassandaDatacenter configuration using the same command shown in that prior topic:
kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml
When you reapply the YAML with the additional nodes defined, Cass Operator restarts and Kubernetes adds the pods to your datacenter, provided there are sufficient Kubernetes worker nodes available.
Important: As part of the scaling up process, each rack in the Kubernetes cluster must contain the same number of server instances.

Changing the server configuration

To change the Cassandra or DSE configuration, update the CassandaDatacenter parameter and edit the config section of the spec key. Then reapply the CassandaDatacenter configuration using the same command shown in that prior topic:
kubectl -n my-db-ns apply -f ./cluster1-dc1.yaml
Important: Cass Operator updates the config and restarts one node at a time in a rolling fashion.

Establishing a multi-datacenter cluster

To make a multi-datacenter cluster, create two CassandaDatacenter resources and give them the same clusterName in the spec.

Note: However, multi-region clusters and advanced workloads are not supported, which makes many multi-dc use cases inappropriate for Cass Operator.

Using kubectl to monitor resources in the Kubernetes cluster

Use kubectl commands to get more information about the Cassandra or DSE pods running in the Kubernetes cluster.
  • To get information about ongoing or recent events:
    kubectl get event --all-namespaces
    Note: By default, each event is configured by Kubernetes to only have a one hour Time to Live (TTL).
  • To check for errors in the Kubernetes log for your operator's instance, use kubectl logs. First, get the instance name by using the kubectl get pod command, specifying your namespace. Example:
    kubectl -n my-db-ns get pod
    NAME                            READY   STATUS    RESTARTS   AGE
    cass-operator-f74447c57-kdf2p   1/1     Running   0          13m
    gke-cluster1-dc1-r1-sts-0       1/1     Running   0          5m38s
    gke-cluster1-dc1-r2-sts-0       1/1     Running   0          42s
    gke-cluster1-dc1-r3-sts-0       1/1     Running   0          6m7s
    Then use kubectl logs. The log entries may be large; consider writing the output to a file. Example:
    kubectl -n my-db-ns logs cass-operator-f74447c57-kdf2p > ~/cass-operator-log.txt
    Tip: To tail the Cassandra/DSE logs, use a command such as:
    kubectl -n my-db-ns logs --container server-system-logger --follow gke-cluster1-dc1-r1-sts-0
  • You can also use the kubectl describe pod command to get identifying information about your pod. Example:
    kubectl -n my-db-ns describe pod cass-operator-f74447c57-kdf2p
    Name:         cass-operator-f74447c57-kdf2p
    Namespace:    my-db-ns
    Priority:     0
    Node:         ip-10-101-34-70.srv101.myinternal.org/10.101.34.70
    Start Time:   Thu, 10 Sep 2020 23:39:42 -0600
    Labels:       name=cass-operator
                  pod-template-hash=f74447c57
    Annotations:  <none>
    Status:       Running
    IP:           10.244.2.2
    IPs:
      IP:           10.244.2.2
    Controlled By:  ReplicaSet/cass-operator-f74447c57
    Containers:
      dse-operator:
        Container ID:   docker://bacfba382ed6be8893a0c344089d40fbb6c36db93a3e3677464390dd358fef35
        Image:          datastax/cass-operator:1.4.1-20200910
        Image ID:       docker-pullable://datastax/cass-operator@sha256:4e80f26c54594133a99adefc9e2e7e9b2b5915788d8c6b24457407e2d470a36a
        Port:           <none>
        Host Port:      <none>
        State:          Running
          Started:      Thu, 10 Sep 2020 23:39:51 -0600
        Ready:          True
        Restart Count:  0
        Environment:
          WATCH_NAMESPACE:  my-db-ns (v1:metadata.namespace)
          POD_NAME:         cass-operator-f74447c57-kdf2p (v1:metadata.name)
          OPERATOR_NAME:    cass-operator
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from cass-operator-token-q9hq5 (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             True 
      ContainersReady   True 
      PodScheduled      True 
    Volumes:
      cass-operator-token-q9hq5:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  cass-operator-token-q9hq5
        Optional:    false
    QoS Class:       BestEffort
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:          <none>

What's next

Learn how to use the metric reporter dashboards for Cassandra or DSE clusters in Kubernetes.