Performance Notes

The Python driver for Cassandra offers several methods for executing queries. You can synchronously block for queries to complete using Session.execute(), you can obtain asynchronous request futures through Session.execute_async(), and you can attach a callback to the future with ResponseFuture.add_callback().

Examples of multiple request patterns can be found in the benchmark scripts included in the driver project.

The choice of execution pattern will depend on the application context. For applications dealing with multiple requests in a given context, the recommended pattern is to use concurrent asynchronous requests with callbacks. For many use cases, you don’t need to implement this pattern yourself. dse.concurrent.execute_concurrent() and dse.concurrent.execute_concurrent_with_args() provide this pattern with a synchronous API and tunable concurrency.

Due to the GIL and limited concurrency, the driver can become CPU-bound pretty quickly. The sections below discuss further runtime and design considerations for mitigating this limitation.

PyPy

PyPy is an alternative Python runtime which uses a JIT compiler to reduce CPU consumption. This leads to a huge improvement in the driver performance, more than doubling throughput for many workloads.

Cython Extensions

Cython is an optimizing compiler and language that can be used to compile the core files and optional extensions for the driver. Cython is not a strict dependency, but the extensions will be built by default.

See Installation for details on controlling this build.

multiprocessing

All of the patterns discussed above may be used over multiple processes using the multiprocessing module. Multiple processes will scale better than multiple threads, so if high throughput is your goal, consider this option.

Be sure to never share any Cluster, Session, or ResponseFuture objects across multiple processes. These objects should all be created after forking the process, not before.

For further discussion and simple examples using the driver with multiprocessing, see this blog post.