The cassandra-stress tool
A Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.
The cassandra-stress
tool is a Java-based stress testing utility for
basic benchmarking and load testing a Cassandra cluster.
cassandra-stress
tool
helps you in this endeavor by populating your cluster and supporting stress testing
of arbitrary CQL tables and arbitrary queries on tables. Use the
cassandra-stress
to:- Quickly determine how a schema performs.
- Understand how your database scales.
- Optimize your data model and settings.
- Determine production capacity.
cassandra-stress
tool also supports a YAML-based profile for
defining specific schema with potential compaction strategies, cache settings, and
types. Sample files are located in:- Package installations: /usr/share/doc/cassandra/examples
- Tarball installations: install_location/tools
The cassandra-stress
tool creates a keyspace called
keyspace1
and within that, tables named
standard1
or counter1
, depending on what type
of table is being tested. These are automatically created the first time you run the
stress test and are reused on subsequent runs unless you drop the keyspace using
CQL. You cannot change the names; they are
hard-coded.
- Package installations:
/usr/bin/cassandra-stress command [options]
- Tarball installations:
cd install_location/tools
bin/cassandra-stress command [options]
On tarball installations, you can use these commands and options with or without the cassandra-stress daemon running.
Command | Description |
---|---|
read | Multiple concurrent reads. The cluster must first be populated by a write test. |
write | Multiple concurrent writes against the cluster. |
mixed | Interleave basic commands with configurable ratio and distribution. The cluster must first be populated by a write test. |
counter_write | Multiple concurrent updates of counters. |
counter_read | Multiple concurrent reads of counters. The cluster must first be populated by a counter_write test. |
user | Interleave user provided queries with configurable ratio and distribution. |
help |
Display help for a command or option. |
Inspect the output of a distribution definition. | |
legacy | Legacy support mode. |
cassandra-stress help option
For an example, see View schema help.
Option | Description |
---|---|
-pop | Population distribution and intra-partition visit order. |
Usage | -pop seq=? [no-wrap] [read-lookback=DIST(?)] [contents=?]or -pop [dist=DIST(?)] [contents=?] |
-insert | Insert specific options relating to various methods for batching and splitting partition updates. |
Usage |
-col [n=DIST(?)] [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)] |
-col | Column details, such as size and count distribution, data generator, names, and comparator. |
Usage |
-col [n=DIST(?)] [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)] |
-rate | Thread count, rate limit, or automatic mode (default is auto). |
Usage | -rate threads=? [limit=?]or -rate [threads>=?] [threads<=?] [auto] |
-mode | Thrift or CQL with options. |
Usage | -mode thrift [smart] [user=?] [password=?]or -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] [password=?]or -mode simplenative [prepared] cql3 [port=?] |
-errors | How to handle errors when encountered during stress. |
Usage |
-errors [retries=?] [ignore] |
-sample | Specify the number of samples to collect for measuring latency. |
Usage |
sample [history=?] [live=?] [report=?] |
-schema | Replication settings, compression, compaction, and so on. |
Usage |
-schema [replication(?)] [keyspace=?] [compaction(?)] [compression=?] |
-node | Nodes to connect to. |
Usage |
-node [whitelist] [file=?] [] |
-log | Where to log progress and the interval to use. |
Usage |
-log [level=?] [no-summary] [file=?] [interval=?] |
-transport | Custom transport factories. |
Usage |
-transport [factory=?] [truststore=?] [truststore-password=?] [ssl-protocol=?] [ssl-alg=?] [store-type=?] [ssl-ciphers=?] |
-port | Specify port for connecting Cassandra nodes. |
Usage |
-port [native=?] [thrift=?] [jmx=?] |
-sendto | Specify stress server to send this command to. |
Usage |
-sendToDaemon <host> |
In Cassandra 2.1.5 and later, there are additional options:
Command | Description |
---|---|
profile=? | Designate the YAML file to use with
cassandra-stress . |
ops(?) | Specify what operations (inserts and/or queries) to run and the number of each. |
clustering=DIST(?) | Distribution clustering runs of operations of the same kind. |
err<? | Specify a standard error of the mean; when this value is reached,
cassandra-stress will end. Default is 0.02. |
n>? | Specify a minimum number of iterations to run before accepting uncertainly convergence. |
n<? | Specify a maximum number of iterations to run before accepting uncertainly convergence. |
n=? | Specify the number of operations to run. |
duration=? | Specify the time to run, in seconds, minutes or hours. |
no-warmup | Do not warmup the process, do a cold start. |
truncate=? | Truncate the table created during cassandra-stress .
Options are never, once, or always. Default is never. |
cl=? | Set the consistency level to use during
cassandra-stress . Options are ONE, QUORUM,
LOCAL_QUORUM, EACH_QUORUM, ALL, and ANY. Default is LOCAL_ONE. |
Simple read and write examples
cassandra-stress write n=1000000
Insert (write) one million rows.
cassandra-stress read n=200000
Read two hundred thousand rows.
cassandra-stress read duration=3m
Read rows for a duration of 3 minutes.
cassandra-stress read n=200000 no-warmup
Read 200,000 rows without a warmup of 50,000 rows first.
View schema help
cassandra-stress help -schema
replication([strategy=?][factor=?][<option 1..N>=?]): Define the replication strategy and any parameters
strategy=? (default=org.apache.cassandra.locator.SimpleStrategy) The replication strategy to use
factor=? (default=1) The number of replicas
keyspace=? (default=keyspace1) The keyspace name to use
compaction([strategy=?][<option 1..N>=?]): Define the compaction strategy and any parameters
strategy=? The compaction strategy to use
compression=? Specify the compression to use for SSTable, default:no compression
Populate the database
Generally it is easier to let cassandra-stress
create the basic
schema and then modify it in CQL:
#Load one row with default schema $ cassandra-stress write n=1 cl=one -mode native cql3 -log file=~/create_schema.log #Modify schema in CQL $ cqlsh #Run a real write workload $ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -log file=~/load_1M_rows.log
Changing the replication strategy
Changes the replication strategy to NetworkTopologyStrategy
.
cassandra-stress write n=500000 no-warmup -node existing0 -schema "replication(strategy=NetworkTopologyStrategy, existing=2)"
Running a mixed workload
When running a mixed workload, you must escape parentheses, greater-than and less-than signs, and other such things. This example invokes a workload that is one-quarter writes and three-quarters reads.
cassandra-stress mixed ratio\(write=1,read=3\) n=100000 cl=ONE -pop dist=UNIFORM\(1..1000000\) -schema keyspace="keyspace1" -mode native cql3 -rate threads\>=16 threads\<=256 -log file=~/mixed_autorate_50r50w_1M.log
Notice the following in this example:
- The
ratio
requires backslash-escaped parenthesis. - The value of
n
is different than in write phase. During the write phase,n
records are written. However in the read phase, ifn
is too large, it is inconvenient to read all the records for simple testing. Generally,n
does not need be large when validating the persistent storage systems of a cluster.The
-pop dist=UNIFORM\(1..1000000\)
portion says that of the n=100,000 operations, select the keys uniformly distributed between 1 and 1,000,000. Use this when you want to specify more data per node than what fits in DRAM. - In the
rate
section, the greater-than and less-than signs are escaped. If not escaped, the shell will attempt to use them for IO redirection. Specifically, the shell will try to read from a non-existent file called =256 and create a file called =16. Therate
section tells cassandra-stress to automatically attempt different numbers of client threads and not test less that 16 or more than 256 client threads.
Standard mixed read/write workload keyspace for a single node
CREATE KEYSPACE "keyspace1" WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' }; USE "keyspace1"; CREATE TABLE "standard1" ( key blob, "C0" blob, "C1" blob, "C2" blob, "C3" blob, "C4" blob, PRIMARY KEY (key) ) WITH bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'};
Splitting up a load over multiple cassandra-stress instances on different nodes
This example is useful for loading into large clusters, where a single
cassandra-stress
load generator node cannot saturate the
cluster. In this example, $NODES
is a variable whose value is a
comma delimited list of IP addresses such as 10.0.0.1,10.0.0.2, and so on.
#On Node1 $$ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -pop seq=1..1000000 -log file=~/node1_load.log -node $NODES #On Node2 $ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -pop seq=1000001..2000000 -log file=~/node2_load.log -node $NODES
Using a YAML file to run cassandra-stress
This example uses a YAML file for the keyspace and table definitions, as well as
query definition. The operation defined as simple1 will be
completed once. No warmup is specified, and the consistency level is set to
QUORUM
.
cassandra-stress user profile=tools/cqlstress-example.yaml ops\(simple1=1\) no-warmup cl=QUORUM
For a complete description on using these sample files, see Improved Cassandra 2.1 Stress Tool: Benchmark Any Schema – Part 1.