Connection pooling in Cassandra drivers
Drivers maintain a pool of connections to each node selected by the load balancing policy. By default, the driver instance creates one connection to each of the local datacenter hosts in the default load balancing policies.
Connection pools are accessed asynchronously, and multiple requests can be submitted on a single connection simultaneously. For most Astra DB Serverless workloads, DataStax recommends one long-lived connection from the driver to each host server (Astra cluster/database).
You can use your driver’s default connection pool settings, or you can customize connection pool settings such as the number of connections per host and the maximum number of concurrent requests per connection. Available settings vary by driver, and these settings can be constrained by underlying CQL/Cassandra limitations. For example, the binary protocol (Cassandra Native Protocol) allows no more than 32,768 concurrent requests per connection.
Connection pools and initial contact points
Connection pooling is separate from the initial contact points.
Initial contact points are supplied to the driver’s root object (typically a cluster), and those contact points are used to establish the control connection required to discover the cluster topology only.
The driver discovers all the nodes after a successful connection to the cluster. If no contact points are available, then the connection fails and the root object cannot be created.
When you create a root object to connect to Astra, the contact points are provided by the database’s Secure Connect Bundle (SCB). Don’t manually set contact points in your driver configuration.
For a multi-region database, you must download the SCB for each region. Then, in your application’s code, create one root object for each region, selecting the appropriate region-specific SCB for each connection. For example, use custom logic to select the appropriate SCB based on the client’s location or other factors.
Configure connection pools
|
For Astra, connection pools are configured automatically by the SCB. Don’t manually configure connection pools for Astra. The following information is provided for reference purposes only. |
- C/C++ driver
-
Use
CassClusteroptions, such ascass_cluster_set_core_connections_per_hostandcass_cluster_set_max_concurrent_requests_threshold. - C# driver
-
See Connection pooling.
- Go driver
-
Use
PoolConfig. - Java driver
-
See the documentation for your version of the Java driver:
- Node.js driver
-
See the documentation for your version of the Node.js driver:
- Python driver
-
For the Python driver, you can configure hosts and connection pools with
cassandra.pool. However, you cannot change the number of connections per host due to single thread/Global Interpreter Lock (GIL) in Python. For more information, see the documentation for your version of the Python driver:
Inspect connected nodes
You can check which nodes your driver sessions connect to over time.
Logging node metadata can help with application development requirements like performance monitoring, troubleshooting, and cost optimization. For implementation details and more information, see the documentation for your driver:
- C/C++ driver
-
See Host State Changes and
CassSession. - C# driver
-
See Retrieving metadata.
- Go driver
-
Each instance of
CqlSessionmaintains an internal connection pool to many Cassandra nodes from the cluster. However, the GoCQL driver doesn’t provide a direct way to get information about the nodes a client is connected to at a given time. - Java driver
-
Each instance of
CqlSessionmaintains an internal connection pool to many Cassandra nodes from the cluster.You can use
session.getMetadata().getNodes()to get information about the specific nodes that a given client is connected to at a certain point in time:Example: Capture node metadata logsimport org.apache.commons.lang3.StringUtils; private static String prettyPrintConnectedNodes(CqlSession session) { StringBuilder message = new StringBuilder(); message.append("Connected Cassandra nodes:\n"); message.append("Endpoint | Host ID | Location | State | Connections\n"); message.append("-------------------------------+--------------------------------------+---------------------------+--------------+------------\n"); session.getMetadata().getNodes().forEach((hostId, node) -> { message .append(printColumn(node.getEndPoint().toString(), 30)).append(" | ") .append(printColumn(node.getHostId(), 36)).append(" | ") .append(printColumn(node.getDatacenter() + (node.getRack() != null ? " (" + node.getRack() + ")" : ""), 25)).append(" | ") .append(printColumn(node.getState(), 12)).append(" | ") .append(printColumn(node.getOpenConnections() + (node.isReconnecting() ? " (reconn)" : ""), 12)) .append("\n"); }); return message.toString(); } private static String printColumn(Object value, int length) { return StringUtils.rightPad(StringUtils.abbreviate(value != null ? value.toString() : "NULL", length), length); }Example: Node metadata outputConnected Cassandra nodes: Endpoint | Host ID | Location | State | Connections ---------------+---------------------------------------+--------------------+-------+-- /127.0.0.1:9044 | eaadabc5-dc40-4b81-8230-436901eba91d | datacenter1 (rack1) | UP | 2 /127.0.0.1:9043 | 109d8339-54f3-4fd7-9706-8dd746800136 | datacenter1 (rack1) | UP | 1 /127.0.0.1:9045 | 04b4f617-b5e3-47dc-8f04-117ea63c2e57 | datacenter1 (rack1) | UP | 1For more information, see the documentation for your version of the Java driver:
- Node.js driver
-
See the documentation for your version of the Node.js driver:
- Python driver
-
Each instance of
Sessionmaintains an internal connection pool to many Cassandra nodes from the cluster.In
cassandra.pool, theHostclass provides attributes that you can use to get information about the nodes a client is connected to at a given time. For example, the following code snippet logs information likehost_id,endpoint,datacenter, andrack:def log_connected_node(session: cluster.Session): log = logging.getLogger(__name__) row_format = "{:>30}|{:>40}|{:>25}|{:>12}" log.debug(row_format.format('Endpoint', 'Host ID', 'Location', 'State')) log.debug(row_format.format('-' * 30, '-' * 40, '-' * 25, '-' * 12)) for host in session.hosts: log.debug(row_format.format(str(host.endpoint), str(host.host_id), host.datacenter + '(' + host.rack + ')', "UP" if host.is_up else "DOWN"))