Get started with the Python driver
Because DataStax Enterprise (DSE) is based on Apache Cassandra®, you can use Cassandra drivers to connect to your DSE databases.
This quickstart explains how to install a driver, connect it to your DSE database, and then send some CQL statements to the database.
To use the Python driver, you need to choose a compatible version, install the driver and its dependencies, and then connect the driver to your DSE database. Once connected, you can write scripts that use the driver to run commands against your database.
Python driver ownership and other important changes in version 3.30
Starting with version 3.30, the Python driver is maintained by the Apache Software Foundation (ASF). Prior versions were maintained by DataStax.
As of version 3.30, there is no change to the Python driver package name.
You can still install the driver with pip install cassandra-driver.
However, there are important changes in version 3.30 that you should be aware of, including use of pyproject.toml, the DRIVER_NAME in STARTUP messages, and the supported Python versions.
For more information, see the Python driver upgrade guide.
Python driver compatibility
DataStax officially supports the latest 12 months of releases, and DataStax recommends using the latest driver version whenever possible. Compatibility isn’t guaranteed for earlier versions. For upgrade guides and compatibility information for earlier versions, see Unsupported drivers.
New features and bug fixes are developed on the latest minor version of the driver, and users are encouraged to stay current with those minor releases. APIs are maintained stable according to semantic versioning conventions, and upgrades should be trivial.
Unless otherwise specified, compatibility version ranges include all patch versions. For example, a range of 4.0 to 4.3 includes all versions from 4.0.0 to the last 4.3.z release.
| Driver version | DSE 6.9 | DSE 6.8 | DSE 5.1 | Comments |
|---|---|---|---|---|
Fully compatible |
Fully compatible |
Fully compatible |
Starting with version 3.30, this driver is maintained by the Apache Software Foundation (ASF). |
|
3.22 to 3.27 |
Partially compatible |
Fully compatible |
Fully compatible |
Doesn’t support the vector type. |
3.21 |
Partially compatible |
Partially compatible |
Fully compatible |
Doesn’t support the vector type. Might be incompatible with DSE features like Unified Authentication. |
Earlier versions |
Not compatible |
Not compatible |
Not compatible |
Prepare the environment and database
To use this driver, you need the following:
-
A running DSE cluster
-
A keyspace
-
Python version 3.10 or later installed
This quickstart uses the latest version of the Python driver. For Python support in earlier versions, see the Python driver documentation.
-
pip version 23.0 or later installed
Authentication methods for drivers
The DSE-compatible drivers ship with built-in authentication providers that provide the necessary utilities to connect to secure DSE clusters. However, DSE clusters have no authentication service enabled by default. This simplifies initial setup, but it isn’t intended for production deployments. You must configure your preferred authentication method, security schemes, users, and roles in your clusters before attempting to use a non-default authentication method through a driver. The required credentials for the driver connection depend on your cluster and authentication method.
DSE unified authentication provides a single, flexible security model. One DSE server can accept multiple forms of authentication, and clients with different levels of access can use varying authentication schemes to connect to the same server.
Supported authentication methods include internal usernames and passwords, LDAP/Active Directory usernames and passwords, and Kerberos authentication. All of these features are supported directly in most DSE-compatible drivers with built-in classes to enable the desired security configuration. Some drivers include support for custom authentication provider classes if your desired authentication method isn’t supported by the built-in providers.
- Internal username and password authentication
-
Always use this method in conjunction with client-server transport encryption because it transmits credentials in clear text in the native protocol.
Drivers use a plain text authentication provider to perform internal username and password authentication. For DSE, the driver sends a plain text username and password to the server that authenticates to the underlying Authentication scheme.
To authenticate with a username and password, provide the username and password in the driver configuration. For example, in a test environment, you could use superuser credentials. In production, use narrowly scoped user roles for better security.
In addition to traditional role-based access control (RBAC), DSE supports proxy authentication (authorization through proxy roles). With proxy authentication, the driver authenticates with a fixed set of credentials that authorize access to a cluster in lieu of direct role assignment. The driver uses the credentials to connect to the cluster and execute requests (with proxy execute) within the context of the proxy roles.
This guide uses username and password authentication for simplicity. For examples of other authentication methods, including proxy authentication, see your driver’s documentation.
- LDAP/Active Directory authentication
-
Always use this method in conjunction with client-server transport encryption because it transmits credentials in clear text in the native protocol.
Drivers use a plain text authentication provider to perform LDAP/Active Directory username and password authentication. For DSE, the driver sends a plain text username and password to the server that authenticates to the underlying LDAP scheme.
For usage instructions and examples, see your driver’s documentation.
- Kerberos authentication
-
Most DSE-compatible drivers extend authentication providers to support Kerberos authentication for DSE either directly or through custom provider implementations.
Kerberos authentication uses keytabs, ticket caches, and Kerberos configuration files:
-
Kerberos keytabs: A keytab can be used to authenticate with Kerberos without requiring any additional credentials or a password. Keytab files must have their permissions set properly to restrict access. The permissions must be set to allow the application user to access the keytab.
-
Kerberos ticket cache: To use the Kerberos ticket cache, use the
kinitcommand to authenticate with the Kerberos server and obtain a ticket. Then, verify the ticket cache contains a ticket for the successful authentication with theklistcommand. Once you verify there is a ticket in the ticket cache, you can run an application that is configured to use the Kerberos authentication provider. If multiple principals have valid tickets in the ticket cache, and no principal was specified in the application, then the driver arbitrarily chooses one and uses that ticket. -
Kerberos configuration file: Driver authentication against a Kerberos-enabled DSE cluster requires a
krb5.conffile containing the Kerberos configuration settings. If this file isn’t in the node’s/etcdirectory, contact your Kerberos system administrator to locate the file. To reference akrb5.conffile in a non-default location, set theKRB5_CONFIGenvironment variable to the location of yourkrb5.conf. Kerberos command line tools such askinit,klist, andkdestroyrespect this variable, as well as drivers with support for Kerberos authentication withkrb5.conf.
-
For more information, see the documentation for your version of the Python driver:
SSL-encrypted connections for drivers
Cassandra drivers support SSL-encrypted connections between the driver and server. Encrypted driver connections follow a typical SSL workflow:
-
The client opens a TCP connection to the server on the configured SSL port.
-
An SSL handshake is initialized by the server, sending its public key (or certificate) to the client.
-
The client uses that public key certificate to generate an encrypted session key and sends it back to the server.
-
The server decrypts the message using its private key and retrieves the session key.
-
All communication from that point on is encrypted using that session key.
SSL isn’t required, but it is recommended for production deployments, especially those with clients communicating over the public internet.
To use SSL-encrypted connections, you must do the following:
-
Select an identity verification method:
- No identity verification (Not recommended)
-
As a best practice for secure driver communication, never use SSL without identity verification. Always use either client-to-server or server-to-client identity verification.
While most drivers support creating SSL connections to the server without identity verification, DataStax doesn’t recommend this for production deployments. When a secure browser contacts a web server, the browser verifies the identity of the server before sending it requests in case an attacker is masquerading as the web server. A secure communication to a bad actor defeats the purpose of configuring secure communication between the browser and web server in the first place.
- Client verifies server
-
To verify the identity of a server, the driver must be configured with a list of trusted certificate authorities (CAs). When the driver receives the server’s SSL certificate during the SSL handshake, it checks that the certificate was signed by one of the registered CAs. If the certificate wasn’t signed by a registered CA, the client checks that the signer was signed by one of the registered CAs. It continues through the signers until it finds one that is in the client’s list of trusted CAs. If the client doesn’t find a registered CA, then identity verification fails.
- Server verifies client
-
To configure a server to verify the identity of a client, edit
cassandra.yaml, findclient_encryption_options, and then setrequire_client_authto true. This scenario requires that clients have their own certificates to send to the server upon request during the SSL handshake. For more information, see Configure SSL for client-to-node connections in DSE.
-
Configure SSL in your DSE cluster.
By default DSE clusters are configured to communicate with clients using an unencrypted binary protocol. This is convenient for getting started but it isn’t suitable for production environments.
To enable SSL in a DSE cluster, you need access to your cluster’s
cassandra.yamlfile. The location of thecassandra.yamlfile depends on your DSE installation method. For information about editingcassandra.yamland configuring SSL, see Configure SSL for DataStax Enterprise. -
Configure your driver to use the SSL certificates and the SSL-encrypted connection based on your preferred identity verification method. For instructions, see the documentation for your version of the Python driver:
Install the Python driver
-
pip install cassandra-driverIf you install an earlier version of the driver, make sure your version is compatible with DSE and your application’s CQL statements. For example, if you need to query vector data, make sure your driver version supports the vector type.
-
Optional: Verify the installation:
pip show cassandra-driverMake sure the returned
Versionis the latest version or the specific version that you installed.
Connect the Python driver
-
In the root of your Python project, create a
connect_database.pyfile:cd python_project touch connect_database.py -
Copy one of the following connection code examples into
connect_database.py.Both examples create a
Clusterinstance to connect to your DSE database. You typically have oneClusterfor each DSE database, and only oneSessionfor the entire application. For more information, see Best practices: Session and cluster handling and Connection pools and initial contact points.- Production configuration (recommended)
-
When using the Python driver in production environments or with simulated production workloads, DataStax recommends robust
sessionconfiguration withprofileandclusterdetails to help optimize driver performance.The following example uses authentication credentials stored in environment variables, and it sets options for connection timeout, request timeout, and protocol version.
connect_database.pyfrom cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT, ProtocolVersion from cassandra.auth import PlainTextAuthProvider auth_provider = PlainTextAuthProvider("username", "password") profile = ExecutionProfile(request_timeout=30) cluster = Cluster( auth_provider=auth_provider, execution_profiles={EXEC_PROFILE_DEFAULT: profile}, protocol_version=ProtocolVersion.V4 ) session = cluster.connect() - Minimal configuration
-
You can use a minimal
sessionconfiguration for testing or lower environments where you don’t need to optimize the cluster details for production workloads.The following example uses authentication credentials stored in environment variables and default values for all other connection options.
connect_database.pyimport os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra') cluster = Cluster(auth_provider=auth_provider) session = cluster.connect()
-
To test the connection, add a simple query to the script.
The following example queries the
system.localtable. You can replace the exampleSELECTstatement with any CQL statement that you want to run against a keyspace and table in your database.connect_database.pyrow = session.execute("select release_version from system.local").one() if row: print(row[0]) else: print("An error occurred.") -
Save and run your Python script:
python ./connect_database.pyIf you ran the example
SELECTstatement on thesystem.localtable, then thecluster_namevalue from thesystem.localtable is printed to the console if the script runs successfully.
Next, you can extend or modify this script to run other commands against your database or connect to other databases. For more information, see the documentation for your version of the Python driver:
Run a vector search with the Python driver
The following example shows how you can use the Python driver to index vector data and then run a vector search:
-
Create a table and vector index.
The following code creates a table named
vector_testwith columns for an integer id, text, and a 5-dimensional vector. Then, it creates a custom index on the vector column using dot product similarity function for efficient vector searches.This example uses a keyspace named
default_keyspace. Replace this value if you want to use a different keyspace.keyspace = "default_keyspace" v_dimension = 5 session.execute(( "CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, " "text TEXT, vector VECTOR<FLOAT,{v_dimension}>);" ).format(keyspace=keyspace, v_dimension=v_dimension)) session.execute(( "CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test " "ON {keyspace}.vector_test " "(vector) USING 'StorageAttachedIndex' WITH OPTIONS = " "{{'similarity_function' : 'cosine'}};" ).format(keyspace=keyspace)) -
Insert vector data.
The following code inserts some rows with embeddings into the
vector_testtable:text_blocks = [ (1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]), (2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]), (3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]), ] for block in text_blocks: id, text, vector = block session.execute( f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)", (id, text, vector) ) -
Perform a vector search.
The following code performs a vector search to find rows that are close to a specific vector embedding:
ann_query = ( f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test " "ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2" ) for row in session.execute(ann_query): print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")
Use DSE advanced workloads
If you have enabled DSE advanced workloads, you must configure your driver to connect to compatible nodes when sending DSE Search or Graph queries. For more information, see DSE advanced workloads in Cassandra drivers.
Reconnect the Python driver after a migration
If you migrate your data from one Cassandra database platform to another, you must update your client applications to connect to your new databases.
At minimum, you must update the driver connection strings.
Additional changes might be required if you upgraded to a new major driver version or migrated to a database platform with a different feature set.
For example, if you migrate to Astra, your drivers cannot create keyspaces because CQL for Astra doesn’t support CREATE KEYSPACE.
For information about updating driver connections after a migration, see the DataStax migration documentation on Connecting client applications to your new target database. Although the referenced documentation is in the context of zero downtime migration, the information applies to most Cassandra-to-Cassandra migrations where you need to update Cassandra driver connection strings.