Connect to Cassandra and apps from outside the Kubernetes cluster

This topic describes various ways to connect to Apache Cassandra® resources and applications running in Kubernetes from outside of the cluster.

Solutions described in this topic include Ingress Kong, Traefik, and other options. DataStax provides GitHub-hosted samples to help you get started. See the Ingress samples.

Introduction

When applications run within a Kubernetes cluster, you need to access those services from outside the cluster. Connecting to a Cassandra cluster running within Kubernetes can range from trivial to complex, depending on where the client is running, on latency requirements, and on security requirements. This topic provides a number of solutions along with the motivation for each. These solutions assume that the Cassandra cluster is already up and reported as running.

Pod access

Any pod running within a Kubernetes cluster may communicate with any other pod, provided the container network policies permit it. Most communication and service discovery within a Kubernetes cluster is not an issue.

  • Network supported direct access

    One method for communicating with Cassandra pods involves having Kubernetes run in an environment where the pod network address space is known and advertised, with routes at the network layer. In these types of environments, Border Gateway Protocol (BGP) and static routes may be defined at layer 3 in the Open Systems Interconnection (OSI) model. This allows for IP connectivity routing directly to pods and services running within Kubernetes from both inside and outside the cluster. Additionally, this approach allows for the consumption of service addresses externally. Unfortunately, this requires an advanced understanding of both Kubernetes networking and the infrastructure available within the enterprise or cloud where it is hosted.

    • Pros: Zero additional configuration within the application; works inside and outside of the Kubernetes network.

    • Cons: Requires configuration at the networking layer within the cloud / enterprise environment; not all environments can support this approach. Some cloud environments do not have the tooling exposed for users to enable this functionality.

  • Host network configuration

    Host Network configuration exposes all network interfaces to the underlying pod instead of a single virtual interface. This allows Cassandra to bind on the worker’s interface with an externally accessible IP. Any container that is launched as part of the pod has access to the host’s interface; it cannot be fenced off to a specific container. To enable this behavior, pass hostNetwork: true in the podTemplateSpec at the top level.

    • Pros: External connectivity is possible as the service is available at the node’s IP instead of an IP internal to the Kubernetes cluster.

    • Cons:

      • If a pod is rescheduled them the IP address of the pod can change.

      • In some Kubernetes distributions this configuration is a privileged operation.

      • Additional automation would be required to identify the appropriate IP and set it for listen_address and broadcast_address.

      • Only one Cassandra pod may be started per worker, regardless of the allowMultiplePodsPerWorker setting.

  • Host Port configuration

    Host Port configuration is similar to host network, but instead of being applied at the pod level, Host Port is applied to specified containers within the pod. For each port listed in the container’s block, a hostPort: external_port key value is included. The external_port is the port number on the Kubernetes worker that should be forwarded to this container’s port. Currently, Kubernetes Operator for Apache Cassandra does not allow modifying the cassandra container via podTemplateSpec. Configuring this value is not possible without patching each rack’s stateful set.

    • Pros: External connectivity is possible as the service is available at the nodes IP instead of an IP internal to the Kubernetes cluster; easier configuration – a separate container to determine the appropriate IP is not required.

    • Cons:

      • If a pod is rescheduled then the IP address of the pod can change.

      • In some Kubernetes distributions this configuration is a privileged operation.

      • Only one Cassandra pod may be started per worker, regardless of the allowMultiplePodsPerWorker setting.

      • Not recommended according to Kubernetes Configuration Best Practices.

Services exposed by Kubernetes Operator for Apache Cassandra

If the application is running within the same Kubernetes cluster as the Cassandra cluster, connectivity is straightforward. Kubernetes Operator for Apache Cassandra exposes a number of services representing a Cassandra cluster, datacenters, and seeds. Applications running within the same Kubernetes cluster may leverage these services to discover and identify pods within the target Cassandra cluster.

Unlike internal apps, external apps do not have access to this information via DNS. It is possible to forward DNS requests to Kubernetes from outside the cluster and resolve configured services. Unfortunately, this approach provides the internal pod IP addresses and not those routable unless Network Supported Direct Access is possible within the environment. In most scenarios, external applications are not able to leverage the exposed services from Kubernetes Operator for Apache Cassandra.

Exposing a Load Balancer service

It is possible to configure a service within Kubernetes outside of those provided by Kubernetes Operator for Apache Cassandra that is accessible from outside of the Kubernetes cluster. These services have a type: LoadBalancer key in the spec: block. In most cloud environments, this configuration results in a native cloud load balancer being provisioned to point at the appropriate pods with an external IP. Once the load balancer is provisioned running, kubectl get svc displays the external IP address that is pointed at the Cassandra nodes.

  • Pros: Available from outside the Kubernetes cluster.

  • Cons:

    • Requires use of an AddressTranslator client side to restrict attempts by the drivers to connect directly with pods, and instead to direct connections to the load balancer.

    • Removes the possibility of a TokenAwarePolicy Load Balancing Policy (LBP).

    • Does not support Transport Layer Security (TLS) termination at the service layer, but rather within the application.

Ingress introduction

See the Ingress GitHub-hosted samples provided by DataStax.

Ingress is a feature that forwards requests to services running within a Kubernetes cluster based on rules. These rules may include specifying the protocol, the port, or even the path. Rules may provide additional functionality such as termination of SSL or TLS traffic, load balancing across a number of protocols, and name-based virtual hosting.

Behind the Ingress Kubernetes type is an Ingress Controller. There are a number of controllers available with varying features to service the defined ingress rules. Think of Ingress as an interface for routing and an Ingress Controller as the implementation of that interface. In this way, any number of Ingress Controllers may be used based on the workload requirements.

Ingress Controllers function at Layer 4 and 7 of the OSI model. At creation, the Ingress specification focused specifically on HTTP or HTTPS workloads. From the documentation: "An Ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically uses a service of type Service.Type=NodePort or Service.Type=LoadBalancer."

Cassandra workloads do not use HTTP as a protocol, but rather a specific TCP protocol. Leveraging Ingress Controllers requires support for TCP load balancing. This approach provides routing semantics similar to those of LoadBalancer Services.

If the Ingress Controller also supports SSL termination with Server Name Indication (SNI), then secure access is possible from outside the cluster while keeping Token Aware routing support. Additionally, operators should consider whether the chosen Ingress Controller supports client SSL certificates, allowing for Mutual TLS to restrict access from unauthorized clients.

  • Pros:

    • Highly-available entry point into the cluster.

    • Some implementations support TCP load balancing.

    • Some implementations support Mutual TLS (mTLS).

    • Some implementations support SNI.

  • Cons:

    • No standard implementation. Requires careful selection.

    • Initially designed for HTTP or HTTPS only workloads.

      Many ingresses support pure TCP workloads, but it is not defined in the original design specification. Some configurations require fairly heavy-handed templating of base configuration files. This may lead to difficult upgrade paths of those components in the future.

    • Only some implementations support TCP load balancing.

    • Only some implementations support mTLS.

    • Only some implementations support SNI with TCP workloads.

Kong as an Ingress

Kong is open source API gateway. Built for multi-cloud and hybrid, Kong is optimized for microservices and distributed architectures. Kong does not have to be deployed on Kubernetes supporting a multitude of environments. The DataStax GitHub-hosted sample installs Kong as an Ingress for a Kubernetes cluster.

See the sample implementations:

Traefik as an Ingress Controller

Traefik is an open-source Edge Router that is designed to work in a number of environments, and not just Kubernetes. When running on Kubernetes, Traefik is generally installed as an Ingress Controller. Traefik supports TCP load balancing along with SSL termination and SNI. It is automatically included as the default Ingress Controller of K3s and K3d.

See the sample implementations:

Sample Java driver configurations for Ingress

Each of the three reference implementations has a corresponding configuration in the sample-java-application subdirectory with associated configuration files and sample code.

SSL certificate generation for Ingress

See ssl/README.md for directions around creating a CA, a client, and ingress certificates.

What’s next?

For related information, refer to the Accessing Kubernetes Pods from Outside of the Cluster blog.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com