Prepared statements

Use prepared statements for queries that are executed multiple times in your application:

PreparedStatement prepared = session.Prepare("insert into product (sku, description) values (?, ?)");
BoundStatement bound = prepared.Bind("234827", "Mouse");
session.Execute(bound);

When you prepare the statement, Cassandra parses the query string, caches the result and returns a unique identifier (the PreparedStatement object keeps an internal reference to that identifier):

Text Diagram

When you bind and execute a prepared statement, the driver only sends the identifier, which allows Cassandra to skip the parsing phase:

Text Diagram

Advantages of prepared statements

Beyond saving a bit of parsing overhead on the server, prepared statements have other advantages; the PREPARED response also contains useful metadata about the CQL query:

  • information about the result set that will be produced when the statement gets executed. The driver caches this, so that the server doesn’t need to include it with every response. This saves a bit of bandwidth, and the resources it would take to decode it every time. This is only enabled for protocol v5+, i.e., Apache Cassandra 4.0+ and DataStax Enterprise 6.0+.
  • the CQL types of the bound variables. This allows the PreparedStatement.Bind method to perform better checks, and fail fast (without a server round-trip) if the types are wrong.
  • which bound variables are part of the partition key. This allows bound statements to automatically compute their routing key.
  • more optimizations might get added in the future. For example, CASSANDRA-10813 suggests adding an idempotent flag to the response.

If you have a unique query that is executed only once, a simple statement will be more efficient. But note that this should be pretty rare: most client applications typically repeat the same queries over and over, and a parameterized version can be extracted and prepared.

Preparing

Session.Prepare() accepts a plain query string.

We recommend avoiding repeated calls to Prepare() because the driver does not cache prepared statements so there could be performance issues if the same query is prepared multiple times.

Parameters and binding

The prepared query string will usually contain placeholders, which can be either anonymous or named:

ps1 = session.Prepare("insert into product (sku, description) values (?, ?)");
ps2 = session.Prepare("insert into product (sku, description) values (:s, :d)");

To turn the statement into its executable form, you need to bind it in order to create a BoundStatement. As shown previously, there is a shorthand to provide the parameters in the same call:

BoundStatement bound = ps1.Bind("324378", "LCD screen");

Unset values

With native protocol V3, all variables must be bound. With native protocol V4 (Cassandra 2.2+ / DSE 5+) or above, variables can be left unset, in which case they will be ignored (no tombstones will be generated). If you’re reusing a bound statement, you can use the unset method to unset variables that were previously set:

BoundStatement bound = ps1.bind("324378", Unset.Value);

How the driver prepares

Cassandra does not replicate prepared statements across the cluster. It is the driver’s responsibility to ensure that each node’s cache is up to date. It uses a number of strategies to achieve this:

  1. When a statement is initially prepared, it is first sent to a single node in the cluster (this avoids hitting all nodes in case the query string is wrong). Once that node replies successfully, the driver re-prepares on all remaining nodes:

    Text Diagram

    The prepared statement identifier is deterministic (it’s a hash of the query string), so it is the same for all nodes.

  2. if a node crashes, it might lose all of its prepared statements (this depends on the version: since Cassandra 3.10, prepared statements are stored in a table, and the node is able to reprepare on its own when it restarts). So the driver keeps a client-side cache; anytime a node is marked back up, the driver re-prepares all statements on it;

  3. finally, if the driver tries to execute a statement and finds out that the coordinator doesn’t know about it, it will re-prepare the statement on the fly (this is transparent for the client, but will cost two extra roundtrips):

    Text Diagram

You can customize these strategies through the Builder.QueryOptions() method:

  • QueryOptions.SetPrepareOnAllHosts() controls whether statements are initially re-prepared on other hosts (step 1 above);
  • QueryOptions.SetReprepareOnUp controls whether statements are re-prepared on a node that comes back up (step 2 above).