Testing your cluster before production

Tests to ensure that your cluster's memory, CPU, disks, number of nodes, and network settings meet your business needs.

To ensure that your selections of memory, CPU, disks, number of nodes, and network settings meet your business needs, test your cluster prior to production. By simulating your production workload, you can avoid unpleasant surprises and ensure successful production deployment.

Your testing workload should match the production workload as closely as possible. Variances from production in testing must fully account for any differences between workloads.

Considerations if you're coming from a relational database world

Cassandra-based applications and clusters are much different than relational databases and use a data model based on the types of queries, not on modeling entities and relationships. Besides learning a new database technology and API, writing an application that scales on a distributed database instead of a single machine requires a new way of thinking.

A common strategy new users employ for making the transition from single machine to distributed node is to make heavy use of materialized views, secondary indexes, and UDF (user defined functions). It’s important to realize that in most applications at large scale these techniques are not heavily used, and if they are, allowances must be made for the performance penalty in node sizing and SLAs.

Service level agreements targets

Before going into production, decide what constitutes fast and slow Service level agreements (SLA), and how often slow SLAs are allowed to occur.

Table 1. Basic rules for determining request percentiles
Percentile Use case
100 Not recommended: The worst measurement is often irrelevant to testing and based on factors external to the database.
99.9 Aggressive: Use only for latency sensitive use cases. Be aware that hardware expenses will be high if you need to keep this close to average.
99 Sweet spot and the most common.
95 Loose: Use when where latency is not easily noticed this is a good metric to measure.

The following metrics are based on clusters with more data on disk than RAM. Measurement is in the 99th percentile using nodetool tablehistograms for Cassandra (3.x | 3.0 | 2.2 | 2.1).

Table 2. Latency metrics
Hardware Latency
Top-of-the-line high CPU nodes with NVM flash storage < 10ms
Common SATA based SSDs < 150ms

Factors, such as testing, controller, and contention, can shift this measurement significantly. This can result in some hardware approaching NVM specifications.

Common SATA based-systems 500ms is common.

This metric varies significantly due to controller, contention, data model, and so on. Tuning can shift this figure an order of magnitude.

Establish TPS targets

Determine how many transactions per second (TPS) you expect your cluster to handle. Ideally, match this number with the same node count in your test cluster. Alternately, test with ratio of TPS to nodes to evaluate how the cluster behaves.

If the cluster does not meet your SLA requirements, look for the root cause. Helpful questions include:
  • Does the cluster meet SLAs below that level?
  • At what TPS level do your SLAs fail?

Monitor the important metrics

For Cassandra, use JMX MBeans or Linux iostat or dstat commands. Monitor the following metrics:

System hints
Generated system hints should not be continually high. If a node is down or a datacenter link is down, expect this metric to accumulate (by design).
Pending compactions
The number of acceptable pending compactions depends on the type of compaction strategy.
Compaction strategy Pending compactions
SizedTieredCompactionStrategy (STCS) Should not exceed 100 for more than short periods.
LeveledCompactionStrategy (LCS) Not an accurate measure for pending compactions. Generally, if it stays over 100 is indication of stress. Over 1000 is pathological.
TimeWindowCompactionsStrategy (TWCS)
Pending writes
Should not go over 10,000 writes for more than short periods. Otherwise, a tuning or sizing problem exists.
heap usage
If huge spikes with long pauses exist, adjust the heap.
I/O wait
Metrics greater than 5 indicates that the CPU is waiting too long on I/O and is a sign of being I/O bound.
CPU Usage
For CPUs using hyper-threading, it is helpful to remember that for each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them. This means that if half your threads are at 100% CPU usage, the CPU capacity is at maximum even if the tool reports 50% usage. To see the actual thread usage, use the ttop command.
Monitor your SLAs
Monitor according to the targets you set above.
SSTable counts
If SSTable counts go over 10,000, see CASSANDRA-11831.
Note: DateTieredCompactionStrategy (DTCS) is deprecated in Cassandra 3.0.8/3.8.
Log warnings and errors
Look for the following:
  • SlabPoolCleaner
  • WARN
  • ERROR
  • GCInspector
  • hints

Simulate real outages

Simulating outages is key to ensuring that your cluster functions well in production. To simulate outages:

  1. Take one or two nodes offline (more is preferable, depending on the cluster size).
    • Do your queries start failing?
    • Do the nodes start working without any change when they come back online?
  2. Take a datacenter offline while taking load.

    How long can it be offline before system.hints build up to unmanageable levels?

  3. Run an expensive task that eats up CPU on a couple of nodes, such as find "a /

    This scenario helps you prepare for the effect on your nodes in case this happens.

  4. Do expensive range queries in CQLSH while taking load to see how the cluster behaves. For example:
    PAGING OFF;
    SELECT * FROM FOO LIMIT 100000000;

    This test ensures your cluster is ready for naive or unintended actions by users.

Simulate production operations

Do the following during load testing and repeat under outages as described above:

  1. Run repair to ensure that you have enough overhead to complete the repair.
  2. Bootstrap a new node into the cluster.
  3. Add a new datacenter. This exercise gives you an idea of the process when you need to do it in production.
  4. Change the schema by adding new tables, removing tables, and altering existing tables.
  5. Replace a failed node. Note how long it takes to bootstrap a new node.

Run a real replication factor on at least 5 nodes

Ideally run the same number of nodes that you expect to run in production. Moreover, if you’re planning on running only 3 nodes in production, test with 5. You may find that the queries that were fast with 3 nodes are much slower with 5 nodes. In this scenario, secondary indexes as a common query path are exposed as being slower at scale.

Be sure to use the same replication factor as in production.

Simulate the production datacenter setup

Each datacenter adds additional write costs. If your production cluster has more datacenters than your test cluster, your writes might be greater than you anticipate. For example, if you use RF 3 and test your cluster with 2 datacenters, each write goes to 4 nodes in 2 data centers. If your production cluster uses the same RF and has 5 datacenters, each write goes to 7 nodes with 5 data centers. This means writes are now almost twice as expensive on the coordinator node because in each remote datacenter, the forwarding coordinator sends the data to the other replicas in it’s datacenter.

Simulate the network topology

Simulate your network topology with the same latency and pipe. Test with the expected TPS. Be aware that remote links can be a limiting factor for many use cases and expected bandwidth.

Simulate expected per node density

Note: Generally, unless using SSDs, a high number of CPUs, and significant RAM, using more than 1 TB per node is rarely a good idea. See Capacity per node recommendations.
Once you’ve determined your target per node density, add data to the test cluster until the target capacity is reached. Test for the following:
  • What happens to your SLAs as the target density is approached?
  • What happens if you go over your the density?
  • How long does bootstrap take?
  • How long does with repair take?

Use the cassandra-stress tool for testing

The cassandra-stress tool (3.x | 3.0 | 2.2 | 2.1) is a Java-based stress testing utility for cluster load testing and basic benchmarking. Use this tool to test hardware choices and discover issues with data modeling. The cassandra-stress tool also supports a YAML-based profile for testing reads, writes, and mixed workloads, plus the ability to specify schemas with various compaction strategies, cache settings, and types. Be sure to test common administrative operations, such as bootstrap, repair, and failure.