Python driver quickstart
DataStax recommends the Data API and clients for Serverless (Vector) databases. You can use the Data API to perform CQL operations on your table data in Serverless (Vector) databases. DataStax recommends drivers only for Serverless (Non-Vector) databases, legacy applications that rely on a driver, or for CQL functions that aren’t supported by the Data API. For more information, see Compare connection methods. |
To use the DataStax Python driver, you need to install the driver and its dependencies, and then connect the driver to your Astra DB Serverless database. Once connected, you can write scripts that use the driver to run commands against your database.
This quickstart explains how to use the Python driver to connect to a Serverless (Vector) database, create a table, create an vector-compatible index, load data with vector embeddings, and perform a similarity search. It also includes instructions to migrate an existing DataStax Python driver to a version that supports Astra DB.
Prerequisites
-
Set the following environment variables:
-
ASTRA_DB_ID
: The database ID -
ASTRA_DB_REGION
: A region where your database is deployed and where you want to connect to the database, such asus-east-2
-
ASTRA_DB_KEYSPACE
: A keyspace in your database, such asdefault_keyspace
-
ASTRA_DB_APPLICATION_TOKEN
: An application token with the Database Administrator role.The
token.json
has the following format:{ "clientId": "CLIENT_ID", "secret": "CLIENT_SECRET", "token": "APPLICATION_TOKEN" }
For driver authentication, you can use either
clientId
andsecret
or the literal stringtoken
and theAstraCS
token value. If you are on an older driver version that doesn’t support thetoken
option, then you might need to useclientId
andsecret
. For more information, see Token details.
-
-
Download your database’s Secure Connect Bundle (SCB).
-
Install Python 3.7 or later.
-
Install pip version 23.0 or later.
Install the Python driver
-
Install the DataStax Python driver:
pip install cassandra-driver
Make sure you use a driver version that is compatible with Astra DB. For more information, see DataStax driver matrix.
Connect the Python driver
-
In the root of your Python project, create a
connect_database.py
file:cd python_project touch connect_database.py
-
In
connect_database.py
, add code to import the necessary libraries and establish a connection to your database.-
Production configuration
-
Basic configuration
Use this configuration is for proofs of concept or production use.
This configuration initializes a session to connect to your database with the
cassandra-driver
. It uses an SCB and authentication credentials stored in environment variables. Additionally, it includes options for connection timeout, request timeout, and protocol version.connect-database.pyimport os from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT, ProtocolVersion from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': "PATH_TO_SCB", 'connect_timeout': 30 } auth_provider=PlainTextAuthProvider("token", os.environ["ASTRA_DB_APPLICATION_TOKEN"]) profile = ExecutionProfile(request_timeout=30) cluster = Cluster( cloud=cloud_config, auth_provider=auth_provider, execution_profiles={EXEC_PROFILE_DEFAULT: profile}, protocol_version=ProtocolVersion.V4 ) session = cluster.connect()
Replace
PATH_TO_SCB
with the absolute path to your database’s Secure Connect Bundle (SCB) (secure-connect-DATABASE_NAME.zip
).DataStax doesn’t recommend this configuration for basic proofs of concept or production use.
The basic configuration initializes a session to connect to your database with the
cassandra-driver
. It uses an SCB and authentication credentials stored in environment variables.connect-database.pyimport os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json session = Cluster( cloud={"secure_connect_bundle": "PATH_TO_SCB"}, auth_provider=PlainTextAuthProvider("token", os.environ["ASTRA_DB_APPLICATION_TOKEN"]), ).connect()
Replace
PATH_TO_SCB
with the absolute path to your database’s Secure Connect Bundle (SCB) (secure-connect-DATABASE_NAME.zip
). -
The connection code creates a Cluster
instance to connect to your Astra DB database.
You typically have one instance of Cluster for each Astra DB database that you want to interact with.
Run commands with the Python driver
After you connect to the database, you can use the driver to perform operations on your database.
Create a table and vector index
The following code creates a table named vector_test
with columns for an integer id, text, and a 5-dimensional vector.
Then, it creates a custom index on the vector column using dot product similarity function for efficient vector searches.
keyspace = "default_keyspace"
v_dimension = 5
session.execute((
"CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, "
"text TEXT, vector VECTOR<FLOAT,{v_dimension}>);"
).format(keyspace=keyspace, v_dimension=v_dimension))
session.execute((
"CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test "
"ON {keyspace}.vector_test "
"(vector) USING 'StorageAttachedIndex' WITH OPTIONS = "
"{{'similarity_function' : 'cosine'}};"
).format(keyspace=keyspace))
Load data
The following code loads some rows with embeddings into the vector_test
table.
text_blocks = [
(1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]),
(2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]),
(3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]),
]
for block in text_blocks:
id, text, vector = block
session.execute(
f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)",
(id, text, vector)
)
Perform a similarity search
The following code performs a similarity search to find rows that are close to a specific vector embedding.
ann_query = (
f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test "
"ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2"
)
for row in session.execute(ann_query):
print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")
Migrate the Python driver
If necessary, you can migrate an earlier DataStax Python driver to a version that supports Astra DB.
-
In your existing DataStax Python driver code, modify the connection code to use the SCB credentials. For more information, see Connect the Python driver.
import os from cassandra.cluster import Cluster from cassandra.auth import PlainTextAuthProvider import json cloud_config= { 'secure_connect_bundle': 'PATH_TO_SCB' } auth_provider = PlainTextAuthProvider('clientId', 'secret') cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider) session = cluster.connect()
-
Run your Python script.