Get started with the Python driver
|
DataStax recommends the Data API and clients for Serverless (vector) databases. You can use the Data API to run CQL statements on tables in Serverless (vector) databases. Use drivers for Serverless (non-vector) databases, legacy applications that rely on a driver, and use cases that require specific CQL functions that aren’t supported by the Data API |
Because Astra DB Serverless is based on Apache Cassandra®, you can use Cassandra drivers to connect to your Astra databases.
This quickstart explains how to install a driver, connect it to your Astra database, and then send some CQL statements to the database.
To use the Python driver, you need to choose a compatible version, install the driver and its dependencies, and then connect the driver to your Astra database. Once connected, you can write scripts that use the driver to run commands against your database.
Python driver ownership and other important changes in version 3.30
Starting with version 3.30, the Python driver is maintained by the Apache Software Foundation (ASF). Prior versions were maintained by DataStax.
As of version 3.30, there is no change to the Python driver package name.
You can still install the driver with pip install cassandra-driver.
However, there are important changes in version 3.30 that you should be aware of, including use of pyproject.toml, the DRIVER_NAME in STARTUP messages, and the supported Python versions.
For more information, see the Python driver upgrade guide.
Python driver compatibility
DataStax officially supports the latest 12 months of releases, and DataStax recommends using the latest driver version whenever possible. Compatibility isn’t guaranteed for earlier versions. For upgrade guides and compatibility information for earlier versions, see Unsupported drivers.
New features and bug fixes are developed on the latest minor version of the driver, and users are encouraged to stay current with those minor releases. APIs are maintained stable according to semantic versioning conventions, and upgrades should be trivial.
Unless otherwise specified, compatibility version ranges include all patch versions. For example, a range of 4.0 to 4.3 includes all versions from 4.0.0 to the last 4.3.z release.
| Driver version | Astra compatibility | Comments |
|---|---|---|
Fully compatible |
Starting with version 3.30, this driver is maintained by the Apache Software Foundation (ASF). |
|
3.20 to 3.27 |
Partially compatible |
Doesn’t support the vector type. |
Earlier versions |
Not compatible |
Prepare the environment and database
-
Install Python version 3.10 or later.
This quickstart uses the latest version of the Python driver. For Python support in earlier versions, see the Python driver documentation.
-
Install pip version 23.0 or later.
-
Download your database’s Secure Connect Bundle (SCB).
For more information, including connections to multi-region databases, see The SCB and encrypted connections for drivers.
-
Set the following environment variables:
-
DATABASE_ID: The database ID. -
APPLICATION_TOKEN: An application token with the Database Administrator role.For more information, see Authentication methods for drivers.
-
Authentication methods for drivers
You use an application token and a Secure Connect Bundle (SCB) to connect a driver to an Astra database.
The application token authenticates the driver to the database, and the token’s role determines the actions that the driver is authorized to perform on the database.
When you generate a token, the token details include a clientId, secret, and token:
{
"clientId": "CLIENT_ID",
"secret": "CLIENT_SECRET",
"token": "APPLICATION_TOKEN"
}
-
clientIdandsecretare legacy authentication methods that predatetoken. -
tokenis a unified token that comprises everything you need for Astra token authentication.
Cassandra drivers use username and password authentication for Astra connections, typically through an authentication class or argument, such as PlainTextAuthProvider.
To set the username and password for a Cassandra driver connection, you can use either the unified token or the legacy clientId and secret:
- Unified
tokenauthentication (Recommended) -
To authenticate with the unified application token, set the username to the literal string
token, and set the password to your unified application token. For example:("token", "APPLICATION_TOKEN") - Legacy
clientIdandsecretauthentication -
For legacy applications and older driver versions that don’t use unified application tokens, you can use the
clientIdas the username and thesecretas the password. For example:("CLIENT_ID", "SECRET")However, if you are using a legacy token created prior to the introduction of the unified
tokenformat, DataStax recommends rotating these tokens due to their age.
In addition to the application token, you must provide an SCB to set contact points and establish a secure connection to your database. For more information, see The SCB and encrypted connections for drivers.
The SCB and encrypted connections for drivers
In addition to an application token, you must provide an SCB to set contact points and provide certificates necessary to establish a secure mutual TLS (mTLS) connection to your database.
To establish an encrypted connection between your application and database, the driver uses the SSL certificates and trusted certificate authorities (CAs) in the SCB to verify the Astra server’s identity. Mechanically, when the driver receives the server’s SSL certificate during the SSL handshake, it checks that the certificate was signed by one of the registered CAs. If the certificate wasn’t signed by a registered CA, the driver checks that the signer was signed by one of the registered CAs. It continues through the signers until it finds one that is in the list of trusted CAs. If there are no matches, then identity verification fails and the driver connection isn’t established.
All Astra-compatible drivers have configuration file attributes, builder methods, or constructor parameters to use the SCB. In your driver configuration, you set the path to the SCB zip file, and then the driver automatically gets the required information and files from the SCB. When using an SCB, don’t set any options that are inferred from the SCB, such as contact points and SSL encryption settings. Additionally, don’t extract the SCB zip file; it must be provided to the driver as an unextracted archive.
For multi-region databases, you need the region-specific SCB for each region that your application will connect to.
To connect to one region of a multi-region database, download the SCB for a region that is geographically close to your application to reduce latency.
To connect to multiple regions or databases in the same application, download the SCB for each region or database.
Then, in your application’s code, create one root driver instance (session or cluster) for each region or database, using custom logic to select the appropriate SCB for each instance.
For more information, see Best practices: Session and cluster handling and Connection pools and initial contact points.
DataStax recommends that you use a driver version that supports SCB authentication for simplified configuration and reduced chance of connection failures.
However, if you must support a legacy application with an earlier driver, you can use cql-proxy, extract the SCB, and then manually provide the required certificates to the driver.
Additionally, you must use the token’s clientId and secret for the username and password, respectively.
For an example, see DataStax Ruby and PHP drivers (Maintenance).
Install the Python driver
-
pip install cassandra-driverIf you install an earlier version of the driver, make sure your version is compatible with Astra and your application’s CQL statements. For example, if you need to query vector data, make sure your driver version supports the vector type.
-
Optional: Verify the installation:
pip show cassandra-driverMake sure the returned
Versionis the latest version or the specific version that you installed.
Connect the Python driver
-
In the root of your Python project, create a
connect_database.pyfile:cd python_project touch connect_database.py -
Copy one of the following connection code examples into
connect_database.py.Both examples create a
Clusterinstance to connect to your Astra database. You typically have oneClusterfor each Astra database, and only oneSessionfor the entire application. For more information, see Best practices: Session and cluster handling and Connection pools and initial contact points.- Production configuration (recommended)
-
When using the Python driver in production environments or with simulated production workloads, DataStax recommends robust
sessionconfiguration withprofileandclusterdetails to help optimize driver performance.The following example uses authentication credentials stored in environment variables, and it sets options for connection timeout, request timeout, and protocol version.
connect_database.pyimport os from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT, ProtocolVersion from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': "PATH/TO/SCB.zip", 'connect_timeout': 30 } auth_provider=PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]) profile = ExecutionProfile(request_timeout=30) cluster = Cluster( cloud=cloud_config, auth_provider=auth_provider, execution_profiles={EXEC_PROFILE_DEFAULT: profile}, protocol_version=ProtocolVersion.V4 ) session = cluster.connect()Replace
PATH/TO/SCB.zipwith the absolute path to your database’s Secure Connect Bundle (SCB) zip file (secure-connect-DATABASE_NAME.zip). - Minimal configuration
-
You can use a minimal
sessionconfiguration for testing or lower environments where you don’t need to optimize the cluster details for production workloads.The following example uses authentication credentials stored in environment variables and default values for all other connection options.
connect_database.pyimport os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json session = Cluster( cloud={"secure_connect_bundle": "PATH/TO/SCB.zip"}, auth_provider=PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]), ).connect()Replace
PATH/TO/SCB.zipwith the absolute path to your database’s Secure Connect Bundle (SCB) zip file (secure-connect-DATABASE_NAME.zip).
-
To test the connection, add a simple query to the script.
The following example queries the
system.localtable. You can replace the exampleSELECTstatement with any CQL statement that you want to run against a keyspace and table in your database.connect_database.pyrow = session.execute("select release_version from system.local").one() if row: print(row[0]) else: print("An error occurred.") -
Save and run your Python script:
python ./connect_database.pyIf you ran the example
SELECTstatement on thesystem.localtable, then thecluster_namevalue from thesystem.localtable is printed to the console if the script runs successfully.
Next, you can extend or modify this script to run other commands against your database or connect to other databases. For more information, see the documentation for your version of the Python driver:
Run a vector search with the Python driver
The following example shows how you can use the Python driver to index vector data and then run a vector search:
-
Create a table and vector index.
The following code creates a table named
vector_testwith columns for an integer id, text, and a 5-dimensional vector. Then, it creates a custom index on the vector column using dot product similarity function for efficient vector searches.This example uses a keyspace named
default_keyspace. Replace this value if you want to use a different keyspace.keyspace = "default_keyspace" v_dimension = 5 session.execute(( "CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, " "text TEXT, vector VECTOR<FLOAT,{v_dimension}>);" ).format(keyspace=keyspace, v_dimension=v_dimension)) session.execute(( "CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test " "ON {keyspace}.vector_test " "(vector) USING 'StorageAttachedIndex' WITH OPTIONS = " "{{'similarity_function' : 'cosine'}};" ).format(keyspace=keyspace)) -
Insert vector data.
The following code inserts some rows with embeddings into the
vector_testtable:text_blocks = [ (1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]), (2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]), (3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]), ] for block in text_blocks: id, text, vector = block session.execute( f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)", (id, text, vector) ) -
Perform a vector search.
The following code performs a vector search to find rows that are close to a specific vector embedding:
ann_query = ( f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test " "ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2" ) for row in session.execute(ann_query): print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")
Reconnect the Python driver after a migration
If you migrate your data from one Cassandra database platform to another, you must update your client applications to connect to your new databases.
At minimum, you must update the driver connection strings.
Additional changes might be required if you upgraded to a new major driver version or migrated to a database platform with a different feature set.
For example, if you migrate to Astra, your drivers cannot create keyspaces because CQL for Astra doesn’t support CREATE KEYSPACE.
For information about updating driver connections after a migration, see the DataStax migration documentation on Connecting client applications to your new target database. Although the referenced documentation is in the context of zero downtime migration, the information applies to most Cassandra-to-Cassandra migrations where you need to update Cassandra driver connection strings.
The following steps summarize the process for updating your driver connection strings after you migrate to Astra:
-
The latest version is recommended, but you can use any Astra-compatible version.
-
In your existing Python driver code, modify the connection code to use the SCB and
tokenauthentication:import os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': 'PATH/TO/SCB.zip' } auth_provider = PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]) cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider) session = cluster.connect()For more information, see Connect the Python driver.
-
Run your Python script.