Load balancing with DataStax drivers
DataStax drivers use load balancing policies to control the distribution of requests across a cluster. For a given query execution, the load balancing policy determines the node that coordinates the query execution and the nodes that can be used as failover hosts, if any.
The driver creates and maintains connection pools for nodes that are selected by the load balancing policy.
DataStax drivers offer built-in and custom load balancing policies. Generally, DataStax recommends the default built-in load balancing policy. However, you must determine the ideal load balancing policy for your application.
Coordinator selection in load balancing policies
Each time a query is executed, the load balancing policy returns a query plan that determines which hosts are eligible to receive the query. The driver uses the first host on the list to execute the request, leaving the successive hosts for retry and speculative execution.
Token awareness in load balancing policies
A token-aware load balancing policy routes requests, by priority, to the replicas that own the data being queried.
Specifically, token awareness retrieves replica nodes based on the primary key information for a given query and parameters. Selecting replicas ensures that the query’s coordinator is also the node that owns the data being written or read. This avoids an extra network connection on the server side.
Token awareness is common to all drivers.
For prepared statement executions, the key is automatically calculated to obtain accurate query routing.
Datacenter awareness in load balancing policies
A datacenter-aware load balancing policy limits requests to a specific datacenter to ensure that the data is returned to the user as efficiently as possible. For example, in a global application, requests from users in North America would be directed to a datacenter in North America. Similarly, requests from users in Europe would be routed to a datacenter in Europe.
To accomplish this, you must specify the local datacenter in the driver’s load balancing policy for Cassandra, DSE, and HCD clusters. For Astra DB, this is provided by the database’s Secure Connect Bundle (SCB).
Explicitly set the local datacenter in datacenter-aware load balancing policies
When using a datacenter-aware load balancing policy, make sure that your application explicitly sets the local datacenter in the cluster object, instead of allowing the driver to infer the local datacenter from the contact points.
If a driver incorrectly selects a remote datacenter instead of the actual closest datacenter, it increases cross-datacenter traffic, which often increases latency and costs compared to inter-datacenter traffic.
For Astra DB, the contact points and datacenter information are provided by the database’s Secure Connect Bundle (SCB).
For Cassandra, DSE, and HCD, it is possible to include contact points in remote datacenters or invalid datacenters in your cluster object’s configuration. For example, an application could include contact points for an internal datacenter used during testing. Explicitly setting the local datacenter avoids these types of errors.
Failover to remote datacenters can spike latency
If requests to the local datacenter fail, most drivers support using remote datacenter hosts for queries. Although this seems like a datacenter failover strategy, this configuration can lead to latency spikes and unexpected behaviors in your application. For more information, see Designing Fault Tolerant Applications with DataStax and Apache Cassandra®.
Configure load balancing policies
DataStax drivers offer built-in and custom load balancing policies, and each driver has a default load balancing policy.
Default load balancing policies include token awareness and datacenter awareness. Specifically, the default policy does the following:
-
Retrieves the replicas for a given token.
-
Returns a list of hosts in the local datacenter. Hosts that contain the replicas are listed first, followed by the rest of the nodes in the specified local datacenter.
-
Uses a load-distributing algorithm to fairly distribute the load across the selected replica nodes.
If the default policy isn’t suitable or your application requires custom routing and load balancing, you can use another built-in policy or extend the existing load balancing interface. For example, some drivers allow filtering policies to restrict the list of hosts that are eligible to receive a query.
For more information about default, built-in, and custom load balancing policies, see your driver’s documentation:
C/C++ driver load balancing policies
For general load balancing policy information, see C/C++ driver load balancing.
For specific functions and parameters, such as cass_cluster_set_load_balance_dc_aware
, see struct.CassCluster
.
C# driver load balancing policies
GoCQL driver host policies
Specify the PoolConfig.HostSelectionPolicy
that best suits your application.
The GoCQL driver’s default host policy is RoundRobinHostPolicy
, where each host is tried sequentially for each query.
In most cases, DataStax recommends TokenAwareHostPolicy
with DCAwareRoundRobinPolicy
as a fallback:
cluster := gocql.NewCluster("127.0.0.1:9043", "127.0.0.1:9044", "127.0.0.1:9045")
cluster.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("datacenter1"))
session, err := cluster.CreateSession()
For multi-region deployments, DataStax recommends that you use DCAwareRoundRobinPolicy
with the application’s closest datacenter:
cluster := gocql.NewCluster("172.16.0.19", "172.16.0.20", "172.16.0.21")
cluster.PoolConfig.HostSelectionPolicy = gocql.DCAwareRoundRobinPolicy("DC1")
For more information, see Datacenter awareness and query routing.
Java driver load balancing policies
By default, Java driver 4.x prevents cross-datacenter traffic.
For example, applications deployed in AWS us-east-1
only communicate with Cassandra nodes in the same datacenter.
If there are no available Cassandra nodes in the local datacenter, then the driver can’t execute any queries.
Make sure the load balancing policy has the correct name for the local datacenter where the application instance is deployed.
To allow the driver to connect to nodes in remote datacenters, set max-nodes-per-remote-dc
greater than 0.
datastax-java-driver.basic.load-balancing-policy {
local-datacenter = datacenter1
}
datastax-java-driver.advanced.load-balancing-policy.dc-failover {
max-nodes-per-remote-dc = 2
}
datastax-java-driver.advanced.connection {
max-requests-per-connection = 1024
pool {
local.size = 1
remote.size = 1
}
}
For more information, including custom load balancing policies, see Java driver load balancing.
Node.js driver load balancing policies
PHP driver load balancing policies
Python driver load balancing policies
See cassandra.polcies
.