cassandra-stress write

Multiple concurrent writes against the cluster.

Synopsis

cassandra-stress write [arguments]

Command options

cl=?: Set the consistency level to use during cassandra-stress. Options are ONE, QUORUM, LOCAL_ONE (default), LOCAL_QUORUM, EACH_QUORUM, ALL, and ANY.
clustering=DIST(?): Distribution clustering runs of operations of the same kind.
duration=?: Specify the time to run, in seconds, minutes, or hours.
err<?: Specify a standard error of the mean. When this value is reached, cassandra-stress will end.

Default: 0.02
n>?: Specify a minimum number of iterations to run before accepting uncertainly convergence.
n<?: Specify a maximum number of iterations to run before accepting uncertainly convergence.
n=?: Specify the number of operations to run.
no-warmup: Do not warmup the process. Do a cold start.
ops(?): Specify what operations to run and the number of each. Only valid with the user option.
profile=?: Designate the YAML file to use with cassandra-stress. Only valid with the user option.
truncate=?: Truncate the table created during cassandra-stress. Options are never, once, always.

Default: never

The truncate option must be inserted before the mode argument, otherwise the cassandra-stress tool won’t apply truncation as specified.

Command arguments

-col

Column details, such as size and count distribution, data generator, names, and comparator:

Supports multiple syntax formats:

-col names=? [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)]

-col [n=DIST(?)] [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)]

-errors

How to handle errors when encountered during stress testing:

-errors [retries=N] [ignore] [skip-read-validation]

retries=<N>: Number of times to try each operation before failing.
ignore: If included, won’t fail on errors. Omit this argument to fail on errors.
skip-read-validation: Include to skip read validation and message output.

-graph

Graph results of cassandra-stress tests. Multiple tests can be graphed together.

-graph file=? [revision=?] [title=?] [op=?]

-insert

Insert specific options relating to various methods for batching and splitting partition updates.

-insert [revisit=DIST(?)] [visits=DIST(?)] partitions=DIST(?) [batchtype=?] select-ratio=DIST(?) row-population-ratio=DIST(?)

-log

Where to log progress and the interval to use.

-log [level=?] [no-summary] [file=?] [hdrfile=?] [interval=?] [no-settings] [no-progress] [show-queries] [query-log-file=?]

-mode

Thrift or CQL with options. Supports multiple syntax formats:

-mode thrift [smart] [user=?] [password=?]

-mode native [unprepared] cql3 [compression=?] [port=?] [user=?] [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] [protocolVersion=?]

-mode simplenative [prepared] cql3 [port=?]

-node

Nodes to connect to.

-node [datacenter=?] [whitelist] [file=?]

-pop

Population distribution and intra-partition visit order. Supports multiple syntax formats:

-pop seq=? [no-wrap] [read-lookback=DIST(?)] [contents=?]

-pop [dist=DIST(?)] [contents=?]

-port

Specify port for connecting nodes. Port can be specified for Apache Cassandra native protocol (native=), Thrift protocol (thrift=), or a JMX port (jmx=) for retrieving statistics.

-port [native=?] [thrift=?] [jmx=?]

-rate

Set the rate. Supports multiple syntax formats:

Throttle to a fixed rate of operations per second with a specified number of threads:
```
-rate threads=? [throttle=?] [fixed=?]
```
- threads=?: Number of clients to run concurrently.
- throttle=?: Throttle operations per second across all clients to a maximum rate (or less) with no implied schedule. Default: 0
- fixed=?: Expect fixed rate of operations per second across all clients with implied schedule. Default: 0
Gradually increase the number of threads until a specified minimum or maximum is reached, or until throughput saturates:
```
-rate [threads>=?] [threads<=?] [auto]
```
- threads>=?: Run at least this many clients concurrently. Default: 4
- threads⇐?: Run at most this many clients concurrently. Default: 1000
- auto: Stop increasing threads once throughput saturates.

-schema

Schema configuration including replication settings, keyspace, compression, and compaction:

-schema [replication(?)] [keyspace=?] [compaction(?)] [compression=?]

-sendto

Specify a server to send the stress command to.

-sendto <host>

-tokenrange

Token range settings.

-tokenrange [no-wrap] [split-factor=?] [savedata=?]

-transport

Custom transport factories.

-transport [factory=?] [truststore=?] [truststore-password=?] [keystore=?] [keystore-password=?] [ssl-protocol=?] [ssl-alg=?] [store-type=?] [ssl-ciphers=?]

Simple write example

Insert (write) one million rows:

cassandra-stress write n=1000000 -rate threads=50

Populate the database

Generally it is easier to let cassandra-stress create the basic schema and then modify it in CQL:

#Load one row with default schema
cassandra-stress write n=1 cl=one -mode native cql3 -log file=create_schema.log

#Modify schema in CQL
cqlsh

#Run a real write workload
cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -log file=load_1M_rows.log

Change the replication strategy

Change the replication strategy to NetworkTopologyStrategy and targets one node named existing:

cassandra-stress write n=500000 no-warmup -node existing -schema "replication(strategy=NetworkTopologyStrategy, existing=2)"

Split up a load over multiple cassandra-stress instances on different nodes

This example demonstrates loading into large clusters, where a single cassandra-stress load generator node cannot saturate the cluster. In this example, $NODES is a variable whose value is a comma-separated list of IP addresses such as 10.0.0.1, 10.0.0.2.

#On Node1
cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -pop seq=1..1000000 -log file=~/node1_load.log -node $NODES

#On Node2
cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -pop seq=1000001..2000000 -log file=~/node2_load.log -node $NODES

Run cassandra-stress with authentication and SSL encryption

The following example shows using the -mode option to supply a username and password, and the -transport option for SSL parameters with self-signed CA certificates. Cassandra authentication and SSL encryption must already be configured before executing cassandra-stress with these options.

cassandra-stress write n=100k cl=ONE no-warmup -mode native cql3 user=cassandra password=cassandra
-transport truststore=/usr/local/lib/dsc-cassandra/conf/server-truststore.jks truststore-password=truststorePass
factory=org.apache.cassandra.thrift.SSLTransportFactory
keystore=/usr/local/lib/dsc-cassandra/conf/server-keystore.jks keystore-password=myKeyPass

Run cassandra-stress using the truncate option

The truncate option must be inserted before the mode option, otherwise the cassandra-stress tool won’t apply truncation as specified.

cassandra-stress write n=100000000 cl=QUORUM truncate=always -schema keyspace=keyspace-rate threads=200 -log file=write_$NOW.log