Python driver quickstart

This driver supports the vector and non-vector data types.

DataStax recommends the Data API and clients for Serverless (Vector) databases. You can use the Data API to perform CQL operations on your table data in Serverless (Vector) databases.

DataStax recommends drivers only for Serverless (Non-Vector) databases, existing applications that previously used a CQL-based driver, or if you need to use some CQL functions that aren’t supported by the Data API. For more information, see Connection methods comparison.

To determine the option that best suits your use case, see Compare connection methods.

This quickstart explains how to use the Python driver to connect to your database, load a set of vector embeddings, and perform a similarity search to find vectors that are close to the one in your query.

Install the cassandra-driver package

Check your pip version and, if necessary, upgrade pip before you install the cassandra-driver package.

  1. Verify that pip is version 23.0 or later:

    pip --version
  2. Upgrade pip, if needed:

    python -m pip install --upgrade pip
  3. Install the cassandra-driver package:

    pip install cassandra-driver

Import libraries and connect to the database

Import the necessary libraries and establish a connection to your database.

DataStax recommends this basic configuration for use cases that are not proofs of concept or for production use. For proofs of concept or production use, see Production configuration.

import os
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

session = Cluster(
    cloud={"secure_connect_bundle": os.environ["ASTRA_DB_SECURE_BUNDLE_PATH"]},
    auth_provider=PlainTextAuthProvider("token", os.environ["ASTRA_DB_APPLICATION_TOKEN"]),
).connect()

After you connect to the database, you can use the driver to perform operations on your database.

Create a table and vector-compatible Storage Attached Index (SAI)

Create a table named vector_test in your database with columns for an integer id, text, and a 5-dimensional vector.

Then, create a custom index on the vector column of this table using a storage-attached index with a dot product similarity function for efficient vector searches.

keyspace = "default_keyspace"
v_dimension = 5

session.execute((
    "CREATE TABLE IF NOT EXISTS {keyspace}.vector_test (id INT PRIMARY KEY, "
    "text TEXT, vector VECTOR<FLOAT,{v_dimension}>);"
).format(keyspace=keyspace, v_dimension=v_dimension))

session.execute((
    "CREATE CUSTOM INDEX IF NOT EXISTS idx_vector_test "
    "ON {keyspace}.vector_test "
    "(vector) USING 'StorageAttachedIndex' WITH OPTIONS = "
    "{{'similarity_function' : 'cosine'}};"
).format(keyspace=keyspace))

Load data

Insert a few documents with embeddings into the collection.

text_blocks = [
    (1, "Chat bot integrated sneakers that talk to you", [0.1, 0.15, 0.3, 0.12, 0.05]),
    (2, "An AI quilt to help you sleep forever", [0.45, 0.09, 0.01, 0.2, 0.11]),
    (3, "A deep learning display that controls your mood", [0.1, 0.05, 0.08, 0.3, 0.6]),
]
for block in text_blocks:
    id, text, vector = block
    session.execute(
        f"INSERT INTO {keyspace}.vector_test (id, text, vector) VALUES (%s, %s, %s)",
        (id, text, vector)
    )

Find documents that are close to a specific vector embedding.

ann_query = (
    f"SELECT id, text, similarity_cosine(vector, [0.15, 0.1, 0.1, 0.35, 0.55]) as sim FROM {keyspace}.vector_test "
    "ORDER BY vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 2"
)
for row in session.execute(ann_query):
    print(f"[{row.id}] \"{row.text}\" (sim: {row.sim:.4f})")

Migrate the Python driver

If necessary, you can migrate an earlier DataStax Python driver to a version that can connect to your Astra DB database.

  1. Complete the prerequisites.

  2. Install the latest Python driver package.

  3. In your existing DataStax Python driver code, modify the connection code to use the SCB credentials.

    In the cloud_config parameter, include the path to the SCB for your Astra DB database (secure-connect-DATABASE_NAME.zip).

    import os
    from cassandra.cluster import Cluster
    from cassandra.auth import PlainTextAuthProvider
    
    cloud_config= {
            'secure_connect_bundle': '/SECURE_CONNECT_BUNDLE_PATH/secure-connect-DATABASE_NAME.zip'
            }
    auth_provider = PlainTextAuthProvider('clientId', 'clientSecret')
    cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
    session = cluster.connect()
  4. Run your Python script to connect to your database.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com