Connect to Cassandra and apps from outside the Kubernetes cluster

Connect to Cassandra and apps from outside the Kubernetes cluster

This topic describes various ways to connect to Apache Cassandra® resources and applications running in Kubernetes from outside the cluster.
Tip: Solutions described in this topic include Ingress Kong, Traefik, and other options. DataStax provides GitHub-hosted samples to help you get started. See the Ingress samples.

Introduction

When applications run within a Kubernetes cluster, you'll need to access those services from outside the cluster. Connecting to a Cassandra cluster running within Kubernetes can range from trivial to complex, depending on where the client is running, latency requirements, and security requirements. This topic provides a number of solutions along with the motivation for each. The following approaches assume that the Cassandra cluster is already up and reported as running.

Pod access

Any pod running within a Kubernetes cluster may communicate with any other pod, provided the container network policies permit it. Most communication and service discovery within a Kubernetes cluster is not an issue.

  • Network supported direct access
    One method for communicating with Cassandra pods involves having Kubernetes run in an environment where the pod network address space is known and advertised, with routes at the network layer. In these types of environments, Border Gateway Protocol (BGP) and static routes may be defined at layer 3 in the OSI model. This allows for IP connectivity routing directly to pods and services running within Kubernetes from both inside and outside the cluster. Additionally, this approach allows for the consumption of service addresses externally. Unfortunately, this requires an advanced understanding of both Kubernetes networking and the infrastructure available within the enterprise or cloud where it is hosted.
    • Pros: Zero additional configuration within the application; works inside and outside of the Kubernetes network.
    • Cons: Requires configuration at the networking layer within the cloud / enterprise environment; not all environments can support this approach. Some cloud environments do not have the tooling exposed for users to enable this functionality.
  • Host network configuration
    Host Network configuration exposes all network interfaces to the underlying pod instead of a single virtual interface. This allows Cassandra to bind on the worker's interface with an externally accessible IP. Any container that is launched as part of the pod will have access to the host's interface; it cannot be fenced off to a specific container. To enable this behavior, pass hostNetwork: true in the podTemplateSpec at the top level.
    • Pros: External connectivity is possible as the service is available at the node's IP instead of an IP internal to the Kubernetes cluster.
    • Cons:
      • If a pod is rescheduled the IP address of the pod can change
      • In some Kubernetes distributions this configuration is a privileged operation
      • Additional automation would be required to identify the appropriate IP and set it for listen_address and broadcast_address
      • Only one Cassandra pod may be started per worker, regardless of the allowMultiplePodsPerWorker setting.
  • Host Port configuration
    Host Port configuration is similar to host network, but instead of being applied at the pod level, Host Port is applied to specified containers within the pod. For each port listed in the container's block, a hostPort: external_port key value is included. The external_port is the port number on the Kubernetes worker that should be forwarded to this container's port. Currently, Cass Operator does not allow modifying the cassandra container via podTemplateSpec. Configuring this value is not possible without patching each rack's stateful set.
    • Pros: External connectivity is possible as the service is available at the nodes IP instead of an IP internal to the Kubernetes cluster; easier configuration – a separate container to determine the appropriate IP is not required.
    • Cons:
      • If a pod is rescheduled the IP address of the pod can change
      • In some Kubernetes distributions this configuration is a privileged operation
      • Only one Cassandra pod may be started per worker, regardless of the allowMultiplePodsPerWorker setting.
      • Not recommended according to Kubernetes Configuration Best Practices.

Services exposed by Cass Operator

If the application is running within the same Kubernetes cluster as the Cassandra cluster, connectivity is straightforward. Cass Operator exposes a number of services representing a Cassandra cluster, datacenters, and seeds. Applications running within the same Kubernetes cluster may leverage these services to discover and identify pods within the target Cassandra cluster.

Unlike internal apps, external apps do not have access to this information via DNS. It is possible to forward DNS requests to Kubernetes from outside the cluster and resolve configured services. Unfortunately, this approach will provide the internal pod IP addresses and not those routable unless Network Supported Direct Access is possible within the environment. In most scenarios, external applications will not be able to leverage the exposed services from Cass Operator.

Exposing a Load Balancer service

It is possible to configure a service within Kubernetes outside of those provided by Cass Operator that is accessible from outside of the Kubernetes cluster. These services have a type: LoadBalancer key in the spec: block. In most cloud environments, this configuration results in a native cloud load balancer being provisioned to point at the appropriate pods with an external IP. Once the load balancer is provisioned running, kubectl get svc will display the external IP address that is pointed at the Cassandra nodes.
  • Pros: Available from outside the Kubernetes cluster.
  • Cons:
    • Requires use of an AddressTranslator client side to restrict attempts by the drivers to connect directly with pods, and instead to direct connections to the load balancer.
    • Removes the possibility of a TokenAwarePolicy Load Balancing Policy (LBP)
    • Does not support Transport Layer Security (TLS) termination at the service layer, but rather within the application.

Ingress introduction

Tip: See the Ingress GitHub-hosted samplesprovided by DataStax.

Ingress is a feature that forwards requests to services running within a Kubernetes cluster based on rules. These rules may include specifying the protocol, port, or even path. They may provide additional functionality like termination of SSL / TLS traffic, load balancing across a number of protocols, and name-based virtual hosting.

Behind the Ingress Kubernetes type is an Ingress Controller. There are a number of controllers available with varying features to service the defined ingress rules. Think of Ingress as an interface for routing and an Ingress Controller as the implementation of that interface. In this way, any number of Ingress Controllers may be used based on the workload requirements.

Ingress Controllers function at Layer 4 & 7 of the OSI model. When the Ingress specification was created, it focused specifically on HTTP / HTTPS workloads. From the documentation: "An Ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically uses a service of type Service.`Type=NodePort` or Service.`Type=LoadBalancer`."

Cassandra workloads don't use HTTP as a protocol, but rather a specific TCP protocol. Ingress Controllers that we want to leverage require support for TCP load balancing. This approach provides routing semantics similar to those of LoadBalancer Services.

If the Ingress Controller also supports SSL termination with Server Name Indication (SNI), then secure access is possible from outside the cluster while keeping Token Aware routing support. Additionally, operators should consider whether the chosen Ingress Controller supports client SSL certificates allowing for Mutual TLS to restrict access from unauthorized clients.

  • Pros:
    • Highly-available entry point into the cluster
    • Some implementations support TCP load balancing
    • Some implementations support Mutual TLS (mTLS)
    • Some implementations support SNI
  • Cons:
    • No standard implementation. Requires careful selection.
    • Initially designed for HTTP/HTTPS only workloads.
      Note: Many ingresses support pure TCP workloads, but it is not defined in the original design specification. Some configurations require fairly heavy-handed templating of base configuration files. This may lead to difficult upgrade paths of those components in the future.
    • Only some implementations support TCP load balancing
    • Only some implementations support mTLS
    • Only some implementations support SNI with TCP workloads

Kong as an Ingress

Kong is open source API gateway. Built for multi-cloud and hybrid, Kong is optimized for microservices and distributed architectures. Kong does not have to be deployed on Kubernetes supporting a multitude of environments. The DataStax GitHub-hosted sample installs Kong as an Ingress for a Kubernetes cluster.

Traefik as an Ingress Controller

Traefik is an open-source Edge Router that is designed to work in a number of environments, and not just Kubernetes. When running on Kubernetes, Traefik is generally installed as an Ingress Controller. Traefik supports TCP load balancing along with SSL termination and SNI. It is automatically included as the default Ingress Controller of K3s and K3d.

Sample Java driver configurations for Ingress

Each of the three reference implementations has a corresponding configuration in the sample-java-application subdirectory with associated configuration files and sample code.

Sample CassandraDatacenter reference for Ingress

See sample-cluster-sample-dc.yaml.

SSL certificate generation for Ingress

See ssl/README.md for directions around creating a CA, client, and ingress certificates.

What's next?

For related information, refer to the Accessing Kubernetes Pods from Outside of the Cluster blog.