Get started with the Python driver

Because DataStax Enterprise (DSE) is based on Apache Cassandra®, you can use Cassandra drivers to connect to your DSE databases.

This quickstart explains how to install a driver, connect it to your DSE database, and then send some CQL statements to the database.

To use the Python driver, you need to choose a compatible version, install the driver and its dependencies, and then connect the driver to your DSE database. Once connected, you can write scripts that use the driver to run commands against your database.

Python driver ownership and other important changes in version 3.30

Starting with version 3.30, the Python driver is maintained by the Apache Software Foundation (ASF). Prior versions were maintained by DataStax.

As of version 3.30, there is no change to the Python driver package name. You can still install the driver with pip install cassandra-driver.

However, there are important changes in version 3.30 that you should be aware of, including use of pyproject.toml, the DRIVER_NAME in STARTUP messages, and the supported Python versions. For more information, see the Python driver upgrade guide.

Python driver compatibility

DataStax officially supports the latest 12 months of releases, and DataStax recommends using the latest driver version whenever possible. Compatibility isn’t guaranteed for earlier versions. For upgrade guides and compatibility information for earlier versions, see Unsupported drivers.

New features and bug fixes are developed on the latest minor version of the driver, and users are encouraged to stay current with those minor releases. APIs are maintained stable according to semantic versioning conventions, and upgrades should be trivial.

Unless otherwise specified, compatibility version ranges include all patch versions. For example, a range of 4.0 to 4.3 includes all versions from 4.0.0 to the last 4.3.z release.

Python driver compatibility
Driver version	DSE 6.9	DSE 6.8	DSE 5.1	Comments
3.28 and later	Fully compatible	Fully compatible	Fully compatible	Starting with version 3.30, this driver is maintained by the Apache Software Foundation (ASF).
3.22 to 3.27	Partially compatible	Fully compatible	Fully compatible	Doesn’t support the vector type.
3.21	Partially compatible	Partially compatible	Fully compatible	Doesn’t support the vector type. Might be incompatible with DSE features like Unified Authentication.
Earlier versions	Not compatible	Not compatible	Not compatible

Prepare the environment and database

To use this driver, you need the following:

A running DSE cluster
A keyspace
Python version 3.10 or later installed

This quickstart uses the latest version of the Python driver. For Python support in earlier versions, see the Python driver documentation.
pip version 23.0 or later installed

Authentication methods for drivers

The DSE-compatible drivers ship with built-in authentication providers that provide the necessary utilities to connect to secure DSE clusters. However, DSE clusters have no authentication service enabled by default. This simplifies initial setup, but it isn’t intended for production deployments. You must configure your preferred authentication method, security schemes, users, and roles in your clusters before attempting to use a non-default authentication method through a driver. The required credentials for the driver connection depend on your cluster and authentication method.

DSE unified authentication provides a single, flexible security model. One DSE server can accept multiple forms of authentication, and clients with different levels of access can use varying authentication schemes to connect to the same server.

Supported authentication methods include internal usernames and passwords, LDAP/Active Directory usernames and passwords, and Kerberos authentication. All of these features are supported directly in most DSE-compatible drivers with built-in classes to enable the desired security configuration. Some drivers include support for custom authentication provider classes if your desired authentication method isn’t supported by the built-in providers.

Internal username and password authentication

Always use this method in conjunction with client-server transport encryption because it transmits credentials in clear text in the native protocol.

Drivers use a plain text authentication provider to perform internal username and password authentication. For DSE, the driver sends a plain text username and password to the server that authenticates to the underlying Authentication scheme.

To authenticate with a username and password, provide the username and password in the driver configuration. For example, in a test environment, you could use superuser credentials. In production, use narrowly scoped user roles for better security.

In addition to traditional role-based access control (RBAC), DSE supports proxy authentication (authorization through proxy roles). With proxy authentication, the driver authenticates with a fixed set of credentials that authorize access to a cluster in lieu of direct role assignment. The driver uses the credentials to connect to the cluster and execute requests (with proxy execute) within the context of the proxy roles.

This guide uses username and password authentication for simplicity. For examples of other authentication methods, including proxy authentication, see your driver’s documentation.

LDAP/Active Directory authentication

Always use this method in conjunction with client-server transport encryption because it transmits credentials in clear text in the native protocol.

Drivers use a plain text authentication provider to perform LDAP/Active Directory username and password authentication. For DSE, the driver sends a plain text username and password to the server that authenticates to the underlying LDAP scheme.

For usage instructions and examples, see your driver’s documentation.

Kerberos authentication

Most DSE-compatible drivers extend authentication providers to support Kerberos authentication for DSE either directly or through custom provider implementations.

Kerberos authentication uses keytabs, ticket caches, and Kerberos configuration files:

Kerberos keytabs: A keytab can be used to authenticate with Kerberos without requiring any additional credentials or a password. Keytab files must have their permissions set properly to restrict access. The permissions must be set to allow the application user to access the keytab.
Kerberos ticket cache: To use the Kerberos ticket cache, use the kinit command to authenticate with the Kerberos server and obtain a ticket. Then, verify the ticket cache contains a ticket for the successful authentication with the klist command. Once you verify there is a ticket in the ticket cache, you can run an application that is configured to use the Kerberos authentication provider. If multiple principals have valid tickets in the ticket cache, and no principal was specified in the application, then the driver arbitrarily chooses one and uses that ticket.
Kerberos configuration file: Driver authentication against a Kerberos-enabled DSE cluster requires a krb5.conf file containing the Kerberos configuration settings. If this file isn’t in the node’s /etc directory, contact your Kerberos system administrator to locate the file. To reference a krb5.conf file in a non-default location, set the KRB5_CONFIG environment variable to the location of your krb5.conf. Kerberos command line tools such as kinit, klist, and kdestroy respect this variable, as well as drivers with support for Kerberos authentication with krb5.conf.

For more information, see the documentation for your version of the Python driver:

SSL-encrypted connections for drivers

Cassandra drivers support SSL-encrypted connections between the driver and server. Encrypted driver connections follow a typical SSL workflow:

The client opens a TCP connection to the server on the configured SSL port.
An SSL handshake is initialized by the server, sending its public key (or certificate) to the client.
The client uses that public key certificate to generate an encrypted session key and sends it back to the server.
The server decrypts the message using its private key and retrieves the session key.
All communication from that point on is encrypted using that session key.

SSL isn’t required, but it is recommended for production deployments, especially those with clients communicating over the public internet.

To use SSL-encrypted connections, you must do the following:

Select an identity verification method:

No identity verification (Not recommended)

As a best practice for secure driver communication, never use SSL without identity verification. Always use either client-to-server or server-to-client identity verification.

While most drivers support creating SSL connections to the server without identity verification, DataStax doesn’t recommend this for production deployments. When a secure browser contacts a web server, the browser verifies the identity of the server before sending it requests in case an attacker is masquerading as the web server. A secure communication to a bad actor defeats the purpose of configuring secure communication between the browser and web server in the first place.

Client verifies server

To verify the identity of a server, the driver must be configured with a list of trusted certificate authorities (CAs). When the driver receives the server’s SSL certificate during the SSL handshake, it checks that the certificate was signed by one of the registered CAs. If the certificate wasn’t signed by a registered CA, the client checks that the signer was signed by one of the registered CAs. It continues through the signers until it finds one that is in the client’s list of trusted CAs. If the client doesn’t find a registered CA, then identity verification fails.

Server verifies client

To configure a server to verify the identity of a client, edit cassandra.yaml, find client_encryption_options, and then set require_client_auth to true. This scenario requires that clients have their own certificates to send to the server upon request during the SSL handshake. For more information, see Configure SSL for client-to-node connections in DSE.
Configure SSL in your DSE cluster.

By default DSE clusters are configured to communicate with clients using an unencrypted binary protocol. This is convenient for getting started but it isn’t suitable for production environments.

To enable SSL in a DSE cluster, you need access to your cluster’s cassandra.yaml file. The location of the cassandra.yaml file depends on your DSE installation method. For information about editing cassandra.yaml and configuring SSL, see Configure SSL for DataStax Enterprise.
Configure your driver to use the SSL certificates and the SSL-encrypted connection based on your preferred identity verification method. For instructions, see the documentation for your version of the Python driver:
- Apache Cassandra Python driver 3.30 and later: Security
- DataStax Python driver 3.29 and earlier: Security

Install the Python driver

Install the Python driver:

pip install cassandra-driver

If you install an earlier version of the driver, make sure your version is compatible with DSE and your application’s CQL statements. For example, if you need to query vector data, make sure your driver version supports the vector type.

Optional: Verify the installation:
```
pip show cassandra-driver
```
Make sure the returned Version is the latest version or the specific version that you installed.

Connect the Python driver

In the root of your Python project, create a connect_database.py file:
```
cd python_project
touch connect_database.py
```
Copy one of the following connection code examples into connect_database.py.

Both examples create a Cluster instance to connect to your DSE database. You typically have one Cluster for each DSE database, and only one Session for the entire application. For more information, see Best practices: Session and cluster handling and Connection pools and initial contact points.
Production configuration (recommended)
When using the Python driver in production environments or with simulated production workloads, DataStax recommends robust session configuration with profile and cluster details to help optimize driver performance.

The following example uses authentication credentials stored in environment variables, and it sets options for connection timeout, request timeout, and protocol version.

connect_database.py

from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT, ProtocolVersion from cassandra.auth import PlainTextAuthProvider auth_provider = PlainTextAuthProvider("username", "password") profile = ExecutionProfile(request_timeout=30) cluster = Cluster( auth_provider=auth_provider, execution_profiles={EXEC_PROFILE_DEFAULT: profile}, protocol_version=ProtocolVersion.V4 ) session = cluster.connect()
Minimal configuration
You can use a minimal session configuration for testing or lower environments where you don’t need to optimize the cluster details for production workloads.

The following example uses authentication credentials stored in environment variables and default values for all other connection options.

connect_database.py

import os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra') cluster = Cluster(auth_provider=auth_provider) session = cluster.connect()
To test the connection, add a simple query to the script.

The following example queries the system.local table. You can replace the example SELECT statement with any CQL statement that you want to run against a keyspace and table in your database.
connect_database.py
```
row = session.execute("select release_version from system.local").one()
if row:
    print(row[0])
else:
    print("An error occurred.")
```
Save and run your Python script:
```
python ./connect_database.py
```
If you ran the example SELECT statement on the system.local table, then the cluster_name value from the system.local table is printed to the console if the script runs successfully.

Next, you can extend or modify this script to run other commands against your database or connect to other databases. For more information, see the documentation for your version of the Python driver:

Run a vector search with the Python driver

The following example shows how you can use the Python driver to index vector data and then run a vector search:

Create a table and vector index.

The following code creates a table named vector_test with columns for an integer id, text, and a 5-dimensional vector. Then, it creates a custom index on the vector column using dot product similarity function for efficient vector searches.

This example uses a keyspace named default_keyspace. Replace this value if you want to use a different keyspace.

keyspace = "default_keyspace"
v_dimension = 5

session.execute((
    "CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, "
    "text TEXT, vector VECTOR<FLOAT,{v_dimension}>);"
).format(keyspace=keyspace, v_dimension=v_dimension))

session.execute((
    "CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test "
    "ON {keyspace}.vector_test "
    "(vector) USING 'StorageAttachedIndex' WITH OPTIONS = "
    "{{'similarity_function' : 'cosine'}};"
).format(keyspace=keyspace))

Insert vector data.

The following code inserts some rows with embeddings into the vector_test table:

text_blocks = [
    (1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]),
    (2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]),
    (3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]),
]
for block in text_blocks:
    id, text, vector = block
    session.execute(
        f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)",
        (id, text, vector)
    )

Perform a vector search.

The following code performs a vector search to find rows that are close to a specific vector embedding:

ann_query = (
    f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test "
    "ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2"
)
for row in session.execute(ann_query):
    print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")

Use DSE advanced workloads

If you have enabled DSE advanced workloads, you must configure your driver to connect to compatible nodes when sending DSE Search or Graph queries. For more information, see DSE advanced workloads in Cassandra drivers.

Reconnect the Python driver after a migration

If you migrate your data from one Cassandra database platform to another, you must update your client applications to connect to your new databases.

At minimum, you must update the driver connection strings. Additional changes might be required if you upgraded to a new major driver version or migrated to a database platform with a different feature set. For example, if you migrate to Astra, your drivers cannot create keyspaces because CQL for Astra doesn’t support CREATE KEYSPACE.

For information about updating driver connections after a migration, see the DataStax migration documentation on Connecting client applications to your new target database. Although the referenced documentation is in the context of zero downtime migration, the information applies to most Cassandra-to-Cassandra migrations where you need to update Cassandra driver connection strings.

Get started with the Python driver

Python driver ownership and other important changes in version 3.30

Python driver compatibility

Prepare the environment and database

Authentication methods for drivers

SSL-encrypted connections for drivers

Install the Python driver

Connect the Python driver

Run a vector search with the Python driver

Use DSE advanced workloads

Reconnect the Python driver after a migration

Documentation and release notes

Was this helpful?

Give Feedback