DataStax Java Driver - Performance

Performance

This page is intended as a checklist for everything related to driver performance. Most of the information is already available in other sections of the manual, but it’s linked here for easy reference if you’re benchmarking your application or diagnosing performance issues.

Statements

Statements are some of the driver types you’ll use the most. Every request needs one – even session.execute(String) creates a SimpleStatement under the hood.

Immutability and builders

Statements are by default implemented with immutable classes: every call to a setter method creates an intermediary copy. If you have multiple attributes to set, use a builder instead:

SimpleStatement statement = SimpleStatement.builder("SELECT * FROM foo")
    .setPageSize(20)
    .setConsistencyLevel(DefaultConsistencyLevel.QUORUM)
    .setIdempotence(true)
    .build();

Also, note that you don’t need a driver Session to create simple statements: they can be initialized statically and stored as constants.

Prepared statements

Prepared statements allow Cassandra to cache parsed query strings server-side, but that’s not their only benefit for performance:

the driver also caches the response metadata, which can then be skipped in subsequent responses. This saves bandwidth, as well as the CPU and memory resources required to parse it.
in some cases, prepared statements set routing information automatically, which allows the driver to target the most appropriate replicas.

You should use prepared statements for all recurring requests in your application. Simple statements should only be used for one-off queries, for example some initialization that will be done only once at startup.

The driver caches prepared statements – see CqlSession.prepare(SimpleStatement) for the fine print. However, if the query is static, it’s still a good practice to cache your PreparedStatement instances to avoid calling prepare() every time. One common pattern is to use some sort of DAO component:

public static class UserDao {

  private final CqlSession session;
  private final PreparedStatement preparedFindById;

  public UserDao(CqlSession session) {
    this.session = session;
    this.preparedFindById = session.prepare("SELECT * FROM user WHERE id = ?");
  }

  public User findById(int id) {
    Row row = session.execute(preparedFindById.bind(id)).one();
    return new User(row.getInt(id), row.getString("first_name"), row.getString("last_name"));
  }
}

Request execution

Connection pooling

By default, the driver opens 1 connection per node, and allows 1024 concurrent requests on each connection. In our experience this is enough for most scenarios.

If your application generates a very high throughput (hundreds of thousands of requests per second), you might want to experiment with different settings. See the tuning section in the connection pooling page.

Compression

Consider compression if your queries return large payloads; it might help to reduce network traffic.

Timestamp generation

Each query is assigned a timestamp to order them relative to each other.

By default, this is done driver-side with AtomicTimestampGenerator. This is a very simple operation so unlikely to be a bottleneck, but note that there are other options, such as a thread-local variant that creates slightly less contention, writing your own implementation or letting the server assign timestamps.

Tracing

Tracing should be used for only a small percentage of your queries. It consumes additional resources on the server, and fetching each trace requires background requests.

Do not enable tracing for every request; it’s a sure way to bring your performance down.

Request trackers

Request trackers are on the hot path (that is, invoked on I/O threads, each time a request is executed), and users can plug custom implementations.

If you experience throughput issues, check if any trackers are configured, and what they are doing. They should avoid blocking calls, as well as any CPU-intensive computations.

Metrics

Similarly, some of the driver’s metrics are updated for every request (if the metric is enabled).

By default, the driver ships with all metrics disabled. Enable them conservatively, and if you’re investigating a performance issue, try disabling them temporarily to check that they are not the cause.

Throttling

Throttling can help establish more predictable server performance, by controlling how much load each driver instance is allowed to put on the cluster. The throttling algorithm itself incurs a bit of overhead in the driver, but that shouldn’t be a problem since the goal is to stay under reasonable rates in the first place.

If you’re debugging an unfamiliar application and experience a throughput plateau, make sure that it’s not caused by a throttler.

Caching reusable objects

Many driver objects are immutable. If you reuse the same values often, consider caching them in private fields or constants to alleviate GC pressure.

Identifiers

The driver uses CqlIdentifier to deal with case sensitivity. When you call methods that take raw strings, the driver generally wraps them under the hood:

session.getMetadata().getKeyspace("inventory"); // shortcut for getKeyspace(CqlIdentifier.fromCql("inventory")

// Caching the identifier:
public static final String INVENTORY_ID = "inventory";

session.getMetadata().getKeyspace(INVENTORY_ID);

This also applies to built queries, although it’s less important because generally the whole query can be cached (see below).

Note however that row getters and bound statement setters do not wrap their argument: because those methods are used very often, they handle raw strings with an optimized algorithm that does not require creating an identifier (the rules are detailed here).

// No need to extract a CqlIdentifier, raw strings are handled efficiently:
Row row = session.execute("SELECT * FROM user WHERE id = 1").one();
row.getInt("age");

PreparedStatement pst = session.prepare("UPDATE user SET name=:name WHERE id=:id");
pst.bind().setInt("age", 25);

Type tokens

GenericType is used to express complex generic types – such as nested collections – in getters and setters. These objects are immutable and stateless, so they are good candidates for constants:

public static final GenericType<Set<List<String>>> SET_OF_LIST_OF_STRING = new GenericType<Set<List<String>>>() {};

Set<List<String>> teams = row.get("teams", SET_OF_LIST_OF_STRING);

GenericType itself already exposes a few of those constants. You can create your own utility class to store yours.

Built queries

Similarly, built queries are immutable and don’t need a reference to a live driver instance. If you create them statically, they can be stored as constants:

public static final BuildableQuery SELECT_SERVER_VERSION =
    selectFrom("system", "local").column("release_version");

Note that you don’t necessarily need to extract CqlIdentifier constants since the construction already happens at initialization time.

Derived configuration profiles

The configuration API allows you to build derived profiles at runtime.

DriverExecutionProfile dynamicProfile =
  defaultProfile.withString(
      DefaultDriverOption.REQUEST_CONSISTENCY, DefaultConsistencyLevel.EACH_QUORUM.name());

Their use is generally discouraged (you should define profiles statically in the configuration file as much as possible), but if there’s no other way and you reuse them over time, store them instead of recreating them each time.

Metadata

The driver maintains metadata about the state of the Cassandra cluster. This work is done on dedicated “admin” threads (see the thread pooling section below), so it’s not in direct competition with regular requests.

Filtering

You can disable entire parts of the metadata with those configuration options:

datastax-java-driver.advanced.metadata {
  schema.enabled = true
  token-map.enabled = true
}

This will save CPU and memory resources, but you lose some driver features:

if schema is disabled, session.getMetadata().getKeyspaces() will always be empty: your application won’t be able to inspect the database schema dynamically.
if the token map is disabled, session.getMetadata().getTokenMap() will always be empty, and you lose the ability to use token-aware routing.

Note that disabling the schema implicitly disables the token map (because computing the token map requires the keyspace replication settings).

Perhaps more interestingly, metadata can be filtered to a specific subset of keyspaces. This is handy if you connect to a shared cluster that holds data for multiple applications:

datastax-java-driver.advanced.metadata {
  schema.refreshed-keyspaces = [ "users", "inventory" ]
}

To get a sense of the time spent on metadata refreshes, enable debug logs and look for entries like this:

[s0-io-0] DEBUG c.d.o.d.i.c.m.s.q.CassandraSchemaQueries - [s0] Schema queries took 88 ms
[s0-admin-0] DEBUG c.d.o.d.i.c.m.s.p.CassandraSchemaParser - [s0] Schema parsing took 71 ms
[s0-admin-0] DEBUG c.d.o.d.i.c.metadata.DefaultMetadata - [s0] Refreshing token map (only schema has changed)
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.DefaultTokenMap - [s0] Computing keyspace-level data for {system_auth={class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}, system_schema={class=org.apache.cassandra.locator.LocalStrategy}, system_distributed={class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=3}, system={class=org.apache.cassandra.locator.LocalStrategy}, system_traces={class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}}
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.DefaultTokenMap - [s0] Computing new keyspace-level data for {class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1}
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.KeyspaceTokenMap - [s0] Computing keyspace-level data for {class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=1} took 12 ms
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.DefaultTokenMap - [s0] Computing new keyspace-level data for {class=org.apache.cassandra.locator.LocalStrategy}
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.KeyspaceTokenMap - [s0] Computing keyspace-level data for {class=org.apache.cassandra.locator.LocalStrategy} took 1 ms
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.DefaultTokenMap - [s0] Computing new keyspace-level data for {class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=3}
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.KeyspaceTokenMap - [s0] Computing keyspace-level data for {class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=3} took 54 us
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.DefaultTokenMap - [s0] Computing new keyspace-level data for {class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}
[s0-admin-0] DEBUG c.d.o.d.i.c.m.token.KeyspaceTokenMap - [s0] Computing keyspace-level data for {class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2} took 98 us
[s0-admin-0] DEBUG c.d.o.d.i.c.metadata.DefaultMetadata - [s0] Rebuilding token map took 32 ms
[s0-admin-0] DEBUG c.d.o.d.i.c.metadata.MetadataManager - [s0] Applying schema refresh took 34 ms

Debouncing

The driver receives push notifications of schema and topology changes from the Cassandra cluster. These signals are debounced, meaning that rapid series of events will be amortized, for example:

if multiple schema objects are created or modified, only perform a single schema refresh at the end.
if a node’s status oscillates rapidly between UP and DOWN, wait for gossip to stabilize and only apply the last state.

Debouncing is controlled by these configuration options (shown here with their defaults):

datastax-java-driver.advanced.metadata {
  topology-event-debouncer {
    # How long the driver waits to propagate an event. If another event is received within that
    # time, the window is reset and a batch of accumulated events will be delivered.
    window = 1 second

    # The maximum number of events that can accumulate. If this count is reached, the events are
    # delivered immediately and the time window is reset.
    max-events = 20
  }
  schema.debouncer {
    window = 1 second
    max-events = 20
  }
}

You may adjust those settings depending on your application’s needs: higher values mean less impact on performance, but the driver will be slower to react to changes.

Schema updates

You should group your schema changes as much as possible.

Every change made from a client will be pushed to all other clients, causing them to refresh their metadata. If you have multiple client instances, it might be a good idea to deactivate the metadata on all clients while you apply the updates, and reactivate it at the end (reactivating will trigger an immediate refresh, so you might want to ramp up clients to avoid a “thundering herd” effect).

Schema changes have to replicate to all nodes in the cluster. To minimize the chance of schema disagreement errors:

apply your changes serially. The driver handles this automatically by checking for schema agreement after each DDL query. Run them from the same application thread, and, if you use the asynchronous API, chain the futures properly.
send all the changes to the same coordinator. This is one of the rare cases where we recommend using Statement.setNode().

Thread pooling

The driver architecture is designed around two code paths:

the hot path is everything directly related to the execution of requests: encoding/decoding driver types to/from low-level binary payloads, and network I/O. This is where the driver spends most of its cycles in a typical application: when we have to make design tradeoffs, performance is always the priority. Hot code runs on 3 categories of threads:
- your application’s thread for the construction of statements;
- the driver’s “I/O” event loop group for encoding/decoding and network I/O. You can configure it with the options in datastax-java-driver.advanced.netty.io-group.
- the driver’s “timer” thread for request timeouts and speculative executions. See datastax-java-driver.advanced.netty.timer.
the cold path is for all administrative tasks: managing the control connection, parsing metadata, reacting to cluster events (node going up/down, getting added/removed, etc), and scheduling periodic events (reconnections, reloading the configuration). Comparatively, these tasks happen less often, and are less critical (for example, stale schema metadata is not a blocker for request execution). They are scheduled on a separate “admin” event loop group, controlled by the options in datastax-java-driver.advanced.netty.admin-group.

By default, the number of I/O threads is set to Runtime.getRuntime().availableProcessors() * 2, and the number of admin threads to 2. It’s hard to give one-size-fits-all recommendations because every case is different, but you might want to try lowering I/O threads, especially if your application already creates a lot of threads on its side.

Note that you can gain more fine-grained control over thread pools via the internal API (look at the NettyOptions interface). In particular, it is possible to reuse the same event loop group for I/O, admin tasks, and even your application code (the driver’s internal code is fully asynchronous so it will never block any thread). The timer is the only one that will have to stay on a separate thread.