Connect with the Python driver
|
DataStax recommends the Data API and clients for Serverless (vector) databases. You can use the Data API to run CQL statements on tables in Serverless (vector) databases. DataStax recommends drivers only for Serverless (non-vector) databases, legacy applications that rely on a driver, or CQL functions that aren’t supported by the Data API. For more information, see Connect to Astra DB Serverless databases. |
Because Astra DB is based on Apache Cassandra®, you can use Cassandra drivers to connect to your Astra DB Serverless databases.
To use the Python driver, you need to install the driver and its dependencies, and then connect the driver to your database. Once connected, you can write scripts that use the driver to run commands against your database.
This quickstart explains how to use the Python driver to connect to an Astra DB Serverless database and send some Cassandra Query Language (CQL) statements to the database. It also explains how to upgrade from an earlier version of the Python driver to a version that supports Astra DB.
Prerequisites
-
Install Python 3.7 or later.
-
Install pip version 23.0 or later.
-
Download your database’s Secure Connect Bundle (SCB).
For multi-region databases, download the SCB for a region that is geographically close to your application to reduce latency.
If you need to connect to multiple regions in the same application, you need the SCB for each region, and your driver code must instantiate one root object (
session) for each region. For more information, see Best practices for Cassandra drivers. -
Set the following environment variables:
-
DATABASE_ID: The database ID. -
APPLICATION_TOKEN: An application token with the Database Administrator role.
-
Driver authentication methods
There are two driver authentication methods:
tokenauthentication-
The
tokenauthentication method is supported and recommended for most recent driver versions.In your driver authentication code, pass the literal string
tokenas the username and your application token value (AstraCS:…) as the password. For example:("token", "AstraCS:...") clientIdandsecretauthentication-
If you are on an older driver version that doesn’t support
tokenauthentication, then you might need to useclientIdandsecret.When you generate an application token, download or copy the
token.jsonthat contains the following values:{ "clientId": "CLIENT_ID", "secret": "CLIENT_SECRET", "token": "APPLICATION_TOKEN" }Then, in your driver authentication code, pass
clientIdas the username andsecretas the password. For example:("CLIENT_ID", "SECRET")
For more information, see Token details.
Install the Python driver
-
pip install cassandra-driverIf you install an earlier version of the driver, make sure your version is compatible with Astra DB. If you need to query vector data in Astra DB Serverless (vector) databases, make sure your version also supports vector data. For more information, see Cassandra drivers supported by DataStax.
-
Optional: Verify the installation:
pip show cassandra-driverMake sure the returned
Versionis the latest version or the specific version that you installed.
Connect the Python driver
-
In the root of your Python project, create a
connect_database.pyfile:cd python_project touch connect_database.py -
Copy one of the following connection code examples into
connect_database.py, and then replacePATH/TO/SCB.zipwith the absolute path to your database’s Secure Connect Bundle (SCB) zip file (secure-connect-DATABASE_NAME.zip):- Production configuration (recommended)
-
When using the Python driver in production environments or with simulated production workloads, DataStax recommends robust
sessionconfiguration withprofileandclusterdetails to help optimize driver performance.The following code initializes a session to connect to your database with the
cassandra-driver. It uses an SCB and authentication credentials stored in environment variables. Additionally, it includes options for connection timeout, request timeout, and protocol version.connect_database.pyimport os from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT, ProtocolVersion from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': "PATH/TO/SCB.zip", 'connect_timeout': 30 } auth_provider=PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]) profile = ExecutionProfile(request_timeout=30) cluster = Cluster( cloud=cloud_config, auth_provider=auth_provider, execution_profiles={EXEC_PROFILE_DEFAULT: profile}, protocol_version=ProtocolVersion.V4 ) session = cluster.connect() - Minimal configuration
-
You can use a minimal
sessionconfiguration for testing or lower environments where you don’t need to optimize the cluster details for production workloads.The following code initializes a session to connect to your database with the
cassandra-driver. It uses an SCB and authentication credentials stored in environment variables.connect_database.pyimport os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json session = Cluster( cloud={"secure_connect_bundle": "PATH/TO/SCB.zip"}, auth_provider=PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]), ).connect()
The connection code creates a
Clusterinstance to connect to your Astra DB database. You typically have oneClusterfor each Astra DB database, and only oneSessionfor the entire application. For more information, see Best practices for Cassandra drivers. -
To test the connection, add a simple query to the script.
The following example queries the
system.localtable. You can replace the exampleSELECTstatement with any CQL statement that you want to run against a keyspace and table in your database.connect_database.pyrow = session.execute("select release_version from system.local").one() if row: print(row[0]) else: print("An error occurred.") -
Save and run your Python script:
python ./connect_database.pyIf you ran the example
SELECTstatement on thesystem.localtable, then thecluster_namevalue from thesystem.localtable is printed to the console if the script runs successfully.
Run a vector search with the Python driver
The following example shows how you can use the Python driver to index vector data and then run a vector search:
-
Create a table and vector index.
The following code creates a table named
vector_testwith columns for an integer id, text, and a 5-dimensional vector. Then, it creates a custom index on the vector column using dot product similarity function for efficient vector searches.This example uses a keyspace named
default_keyspace. Replace this value if you want to use a different keyspace.keyspace = "default_keyspace" v_dimension = 5 session.execute(( "CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, " "text TEXT, vector VECTOR<FLOAT,{v_dimension}>);" ).format(keyspace=keyspace, v_dimension=v_dimension)) session.execute(( "CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test " "ON {keyspace}.vector_test " "(vector) USING 'StorageAttachedIndex' WITH OPTIONS = " "{{'similarity_function' : 'cosine'}};" ).format(keyspace=keyspace)) -
Insert vector data.
The following code inserts some rows with embeddings into the
vector_testtable:text_blocks = [ (1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]), (2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]), (3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]), ] for block in text_blocks: id, text, vector = block session.execute( f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)", (id, text, vector) ) -
Perform a vector search.
The following code performs a vector search to find rows that are close to a specific vector embedding:
ann_query = ( f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test " "ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2" ) for row in session.execute(ann_query): print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")
Upgrade the Python driver
Use these steps if you need to upgrade from an earlier version of the Python driver to a version that supports Astra DB:
-
In your existing DataStax Python driver code, modify the connection code to use the SCB and
tokenauthentication.import os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': 'PATH/TO/SCB.zip' } auth_provider = PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]) cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider) session = cluster.connect()For more information, see Connect the Python driver.
-
Run your Python script.
Next steps
You can extend or modify the example script used in this guide to run other commands against your database, or connect to other databases. For more information, see the following: