Connect with the Python driver

DataStax recommends the Data API and clients for Serverless (Vector) databases. You can use the Data API to perform CQL operations on your table data in Serverless (Vector) databases.

DataStax recommends drivers only for Serverless (Non-Vector) databases, legacy applications that rely on a driver, or for CQL functions that aren’t supported by the Data API. For more information, see Connect to a database.

Because Astra DB is based on Apache Cassandra®, you can use Cassandra drivers to connect to your Astra DB Serverless databases.

To use the DataStax Python driver, you need to install the driver and its dependencies, and then connect the driver to your Astra DB Serverless database. Once connected, you can write scripts that use the driver to run commands against your database.

This quickstart explains how to use the Python driver to connect to a Serverless (Vector) database and send some CQL statements to the database. It also includes instructions to migrate an existing DataStax Python driver to a version that supports Astra DB.

Prerequisites

Install Python 3.7 or later.
Install pip version 23.0 or later.
Create a database.
Download your database’s Secure Connect Bundle (SCB).

For multi-region databases, download the Secure Connect Bundle (SCB) for a region that is geographically close to your application to reduce latency.

If you need to connect to multiple regions in the same application, you need the Secure Connect Bundle (SCB) for each region, and your driver code must instantiate one root object (session) for each region. For more information, see Best practices for Cassandra drivers.
Set the following environment variables:
- ASTRA_DB_ID: The database ID.
- KEYSPACE_NAME: A keyspace in your database, such as default_keyspace.
- APPLICATION_TOKEN: An application token with the Database Administrator role.

Driver authentication methods

There are two driver authentication methods: token authentication, or clientId and secret authentication.

Token authentication
Client ID and secret authentication

This authentication method is supported and recommended for most recent driver versions.

In your driver authentication code, pass the literal string token as the username and your application token value (AstraCS:…) as the password. For example:

("token", "AstraCS:...")

If you are on an older driver version that doesn’t support token authentication, then you might need to use clientId and secret.

When you generate an application token, download or copy the token.json that contains the following values:

{
  "clientId": "CLIENT_ID",
  "secret": "CLIENT_SECRET",
  "token": "APPLICATION_TOKEN"
}

In your driver authentication code, pass clientId as the username and secret as the password. For example:

("CLIENT_ID", "SECRET")

For more information, see Token details.

Install the Python driver

Install the DataStax Python driver:

pip install cassandra-driver

If you choose to install an earlier version, make sure you choose a version that is compatible with Astra DB. If you need to query vector data, make sure your chosen version also supports vector data. For more information, see Cassandra drivers supported by DataStax.

Connect the Python driver

In the root of your Python project, create a connect_database.py file:
```
cd python_project
touch connect_database.py
```
In connect_database.py, add code to import the necessary libraries and establish a connection to your database.
- Production configuration
- Basic configuration
When using the Python driver in production environments or with simulated production workloads, DataStax recommends robust session configuration with profile and cluster details to help optimize driver performance.

The following code initializes a session to connect to your database with the cassandra-driver. It uses an SCB and authentication credentials stored in environment variables. Additionally, it includes options for connection timeout, request timeout, and protocol version.
connect-database.py

import os from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT, ProtocolVersion from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': "PATH/TO/SCB.zip", 'connect_timeout': 30 } auth_provider=PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]) profile = ExecutionProfile(request_timeout=30) cluster = Cluster( cloud=cloud_config, auth_provider=auth_provider, execution_profiles={EXEC_PROFILE_DEFAULT: profile}, protocol_version=ProtocolVersion.V4 ) session = cluster.connect()
Replace PATH/TO/SCB.zip with the absolute path to your database’s Secure Connect Bundle (SCB) (secure-connect-DATABASE_NAME.zip).
You can use a minimal session configuration for testing or lower environments where you don’t need to optimize the cluster details for production workloads.

The following code initializes a session to connect to your database with the cassandra-driver. It uses an SCB and authentication credentials stored in environment variables.
connect-database.py

import os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json session = Cluster( cloud={"secure_connect_bundle": "PATH/TO/SCB.zip"}, auth_provider=PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"]), ).connect()
Replace PATH/TO/SCB.zip with the absolute path to your database’s Secure Connect Bundle (SCB) (secure-connect-DATABASE_NAME.zip).

The connection code creates a Cluster instance to connect to your Astra DB database. You typically have one Cluster for each Astra DB database, and only one Session for the entire application. For more information, see Best practices for Cassandra drivers.

Run commands with the Python driver

After you connect to the database, you can use the driver to perform operations on your database.

Submit a simple query

Add code to run a CQL query and print the output to the console:

row = session.execute("select release_version from system.local").one()
if row:
    print(row[0])
else:
    print("An error occurred.")

Save and run your Python script:
```
python ./connect_database.py
```
The output prints the release_version value from the system.local table in your Astra DB database.

Create a table and vector index

The following code creates a table named vector_test with columns for an integer id, text, and a 5-dimensional vector. Then, it creates a custom index on the vector column using dot product similarity function for efficient vector searches.

keyspace = "default_keyspace"
v_dimension = 5

session.execute((
    "CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, "
    "text TEXT, vector VECTOR<FLOAT,{v_dimension}>);"
).format(keyspace=keyspace, v_dimension=v_dimension))

session.execute((
    "CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test "
    "ON {keyspace}.vector_test "
    "(vector) USING 'StorageAttachedIndex' WITH OPTIONS = "
    "{{'similarity_function' : 'cosine'}};"
).format(keyspace=keyspace))

Insert data

The following code inserts some rows with embeddings into the vector_test table.

text_blocks = [
    (1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]),
    (2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]),
    (3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]),
]
for block in text_blocks:
    id, text, vector = block
    session.execute(
        f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)",
        (id, text, vector)
    )

Perform a vector search

The following code performs a vector search to find rows that are close to a specific vector embedding.

ann_query = (
    f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test "
    "ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2"
)
for row in session.execute(ann_query):
    print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")

Upgrade the Python driver

Use these steps if you need to upgrade your driver from an earlier version to a version that supports Astra DB:

Complete the prerequisites.
Install the latest Python driver.

In your existing DataStax Python driver code, modify the connection code to use the SCB and token authentication. For more information, see Connect the Python driver.

import os
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import json

cloud_config= {
        'secure_connect_bundle': 'PATH/TO/SCB.zip'
        }
auth_provider = PlainTextAuthProvider("token", os.environ["APPLICATION_TOKEN"])
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session = cluster.connect()

Run your Python script.