Connection pooling in DataStax drivers
DataStax drivers maintain a pool of connections to each node selected by the load balancing policy. By default, the driver instance creates one connection to each of the local datacenter hosts in the default load balancing policies.
Connection pools and initial contact points
Connection pooling is separate from the initial contact points. Initial contact points are supplied to the root driver instance, and those contact points are used to establish the control connection required to discover the cluster topology only.
When you create a cluster object for Cassandra, DSE, or HCD, DataStax recommends that you specify multiple contact points from your cluster. If your cluster is deployed to multiple availability zones, specify contact points from different zones.
The driver discovers all the nodes after a successful connection to the cluster, but you can improve resilience by providing multiple contact points. If no contact points are available, then cluster object creation fails.
You don’t need to do this for Astra DB because the contact points are provided in the database’s Secure Connect Bundle (SCB).
Connection pools are accessed asynchronously, and multiple requests can be submitted on a single connection simultaneously. For most workloads, DataStax recommends one long-lived connection from the driver to each host server, such as a DSE server.
You can use your driver’s default connection pool settings, or you can customize connection pool settings such as the number of connections per host and the maximum number of concurrent requests per connection. Available settings vary by driver, and these settings can be constrained by underlying CQL/Cassandra limitations. For example, the binary protocol (Cassandra Native Protocol) allows no more than 32,768 concurrent requests per connection.
Configure connection pools
C/C++ driver connection pooling
For the C/C++ driver, you can configure connection pooling in the CassCluster
object.
For example, you can set cass_cluster_set_core_connections_per_host
and cass_cluster_set_max_concurrent_requests_threshold
.
C# driver connection pooling
GoCQL driver connection pooling
See PoolConfig
.
Java driver connection pooling
Node.js driver connection pooling
PHP driver connection pooling
For the PHP driver, you can configure connection pooling in Cluster\Builder
.
For example, you can set withConnectionsPerHost
.
Python driver connection pooling
For the Python driver, you can configure hosts and connection pools with cassandra.pool
.
However, you can’t change the number of connections per host due to single thread/Global Interpreter Lock (GIL) in Python.
Ruby driver connection pooling
For the Ruby driver, you can configure connection pooling in CLUSTER_OPTIONS
.
For example, you can set connections_per_local_node
and requests_per_connection
.
Inspect connected nodes
You can check which nodes your driver sessions connect to over time.
Logging node metadata can help with performance monitoring, troubleshooting, cost optimization, and more.
C/C++ driver node metadata
See Host State Changes and CassSession
.
C# driver node metadata
See Retrieving metadata.
GoCQL driver node metadata
Each instance of CqlSession
maintains an internal connection pool to many Cassandra nodes from the cluster.
However, the GoCQL driver doesn’t provide a direct way to get information about the nodes a client is connected to at a given time.
Java driver node metadata
Each instance of CqlSession
maintains an internal connection pool to many Cassandra nodes from the cluster.
You can use session.getMetadata().getNodes()
to get information about the specific nodes that a given client is connected to at a certain point in time:
import org.apache.commons.lang3.StringUtils;
private static String prettyPrintConnectedNodes(CqlSession session) {
StringBuilder message = new StringBuilder();
message.append("Connected Cassandra nodes:\n");
message.append("Endpoint | Host ID | Location | State | Connections\n");
message.append("-------------------------------+--------------------------------------+---------------------------+--------------+------------\n");
session.getMetadata().getNodes().forEach((hostId, node) -> {
message
.append(printColumn(node.getEndPoint().toString(), 30)).append(" | ")
.append(printColumn(node.getHostId(), 36)).append(" | ")
.append(printColumn(node.getDatacenter() + (node.getRack() != null ? " (" + node.getRack() + ")" : ""), 25)).append(" | ")
.append(printColumn(node.getState(), 12)).append(" | ")
.append(printColumn(node.getOpenConnections() + (node.isReconnecting() ? " (reconn)" : ""), 12))
.append("\n");
});
return message.toString();
}
private static String printColumn(Object value, int length) {
return StringUtils.rightPad(StringUtils.abbreviate(value != null ? value.toString() : "NULL", length), length);
}
Connected Cassandra nodes:
Endpoint | Host ID | Location | State | Connections
---------------+---------------------------------------+--------------------+-------+--
/127.0.0.1:9044 | eaadabc5-dc40-4b81-8230-436901eba91d | datacenter1 (rack1) | UP | 2
/127.0.0.1:9043 | 109d8339-54f3-4fd7-9706-8dd746800136 | datacenter1 (rack1) | UP | 1
/127.0.0.1:9045 | 04b4f617-b5e3-47dc-8f04-117ea63c2e57 | datacenter1 (rack1) | UP | 1
Node.js driver node metadata
See ClientState
.
Python driver node metadata
Each instance of Session
maintains an internal connection pool to many Cassandra nodes from the cluster.
In cassandra.pool
, the Host
class provides attributes that you can use to get information about the nodes a client is connected to at a given time.
For example, the following code snippet logs information like host_id
, endpoint
, datacenter
, and rack
:
def log_connected_node(session: cluster.Session):
log = logging.getLogger(__name__)
row_format = "{:>30}|{:>40}|{:>25}|{:>12}"
log.debug(row_format.format('Endpoint', 'Host ID', 'Location', 'State'))
log.debug(row_format.format('-' * 30, '-' * 40, '-' * 25, '-' * 12))
for host in session.hosts:
log.debug(row_format.format(str(host.endpoint),
str(host.host_id),
host.datacenter + '(' + host.rack + ')',
"UP" if host.is_up else "DOWN"))
Ruby driver node metadata
See Cassandra::Host
.