Prepared statements with Cassandra drivers

Prepared statements are queries that you can run multiple times with different parameters. They are ideal for frequently executed queries, such as fetching user profiles or logging actions, and queries that have a fixed structure with variable parameters.

You only need to define the statement once, and then your application can call the prepared statement as needed, passing unique parameters to each execution.

Prepared statements can increase efficiency by reducing the amount of processing and network traffic required to run a query. For example, prepared statements are preprocessed and cached, allowing queries to be reused without passing and parsing the entire statement each time. Additionally, the server doesn’t send response metadata after the initial preparation, which reduces the data sent over the network and the corresponding client-side processing.

Caching and repreparation

Prepared statements are preprocessed to produce an execution plan that is cached on both the server and client. At runtime, the cached execution plan is used to execute the query.

However, statements can be reprepared if the prepared statement’s metadata or execution environment changes. For example:

Schema changes and node/cluster topology changes require repreparation to ensure consistency and validity.
Clusters can evict prepared statements from the cache when the cache reaches its limit or expires. This requires repreparation of the statement before the next execution. For more information and an example of this scenario, see Dynamic queries with encoded literals can evict prepared statements from cache.
Any event requiring reconnection triggers repreparation, such as node failure, driver reconnection, session reinitialization, network partitioning, and restarts.

The repreparation process repeats the preprocessing and caching steps, including generating a new prepared statement ID. When repreparation occurs, it can add latency and tax cluster resources, depending on the number of statements that need to be reprepared and how often repreparation occurs.

Most drivers implicitly reprepare statements when required. However, if your application frequently reprepares statements, DataStax recommends that you investigate and address the root cause to reduce performance impacts from excessive repreparation.

Explicit repreparation is never required with DataStax drivers, assuming there are no other issues in your application’s logic, data model, or query design. If you encounter a situation where you must explicitly reprepare a statement, first check your code for antipatterns or inefficiencies, such as those described in Optimize your prepared statements. If the issue persists, contact DataStax Support because this could indicate a bug in the driver.

Define and use prepared statements

C/C++ driver prepared statements

See C/C++ driver prepared statements and CassPrepared.

C# driver prepared statements

See C# driver prepared statements.

GoCQL driver prepared statements

See GoCQL driver prepared statements.

Java driver prepared statements

See the documentation for your version of the Java driver:

Node.js driver prepared statements

See Using query parameters and prepared statements.

PHP driver prepared statements

See PHP driver prepared statements.

Python driver prepared statements

See Python driver prepared statements and cassandra.query.PreparedStatement.

Ruby driver prepared statements

See Ruby driver prepared statements.

Optimize your prepared statements

Use prepared statements where appropriate, and ensure your statements are designed for optimal performance.

Use prepared statements where appropriate

Prepared statements are best for frequently repeated queries and queries that have a fixed structure with variable parameters.

Don’t use prepared statements for the following types of queries:

Ad-hoc queries and queries with unpredictable structures, such as dynamically-generated queries. By design, these are not appropriate for prepared statements.
Queries on tables with frequent schema changes. For more information, see Caching and repreparation.
Rarely repeated queries. In this case, consider whether the performance benefits justify the overhead of preparing the query. For example, if a query is repeated only two or three times out of thousands of queries, then the overhead required to prepare the query might not be justified by the comparatively small performance gain.

Don’t explicitly set null values in prepared statements

In prepared statements, avoid explicitly setting parameters to null where not required. Instead, leave values unset in prepared statements.

Explicitly binding parameters to null generates tombstones and removes the original values.

While tombstones can create inefficiency in any database, non-prepared queries can dynamically adjust based on live data. In contrast, prepared statements reuse a preprocessed execution plan, which can suffer from performance issues when reading tables with many tombstones. For example, a prepared statement can execute quickly at first, but performance will degrade as the table accumulates tombstones, and the prepared statement requires more time to filter out deleted data.

Regularly reprocessing prepared statements can mitigate performance impacts if your queries must handle some tombstones. However, as a general best practice for CQL tables, it’s best to minimize tombstones insofar as possible:

Use efficient filters in your queries to avoid fetching unnecessary rows, which can contain tombstones.
Optimize garbage collection and compaction strategies to remove tombstones efficiently.
Use tracing to monitor how tombstones impact query performance.

For more information, see Manage tombstones, What are tombstones, and Garbage collection of tombstones.

Dynamic queries with encoded literals can evict prepared statements from cache

If your application uses dynamically-generated queries with unique literals encoded into every query string, you can hit node cache limits, causing the cache to be evicted, including cached prepared statements. Cache eviction can impact performance at the source and in other applications. Namely, evicted prepared statements must be reprepared in all related client caches.

To reduce the chance of hitting the cache limit, use placeholders to bind concrete values instead of encoded literals.

When cache eviction occurs, the driver can return warnings like prepared statements discarded in the last minute because cache limit reached. For more information, see the following DataStax Support articles: