cassandra-stress user

Interleave user provided queries with configurable ratio and distribution.

Synopsis

cassandra-stress user [arguments]

Table 1. Legend
Syntax conventions	Description
UPPERCASE	Literal keyword.
Lowercase	Not literal.
`Italics`	Variable value. Replace with a valid option or user-defined value.
`[ ]`	Optional. Square brackets ( `[ ]` ) surround optional command arguments. Do not type the square brackets.
`( )`	Group. Parentheses ( `( )` ) identify a group to choose from. Do not type the parentheses.
`\|`	Or. A vertical bar ( `\|` ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
`...`	Repeatable. An ellipsis ( `...` ) indicates that you can repeat the syntax element as often as required.
`'Literal string'`	Single quotation ( `'` ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
`{ key:value }`	Map collection. Braces ( `{ }` ) enclose map collections or key value pairs. A colon separates the key and the value.
`<datatype1,datatype2>`	Set, list, map, or tuple. Angle brackets ( `< >` ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
`cql_statement;`	End CQL statement. A semicolon ( `;` ) terminates all CQL statements.
`[ -- ]`	Separate the command line options from the command arguments with two hyphens ( `--` ). This syntax is useful when arguments might be mistaken for command line options.
`' <schema> ... </schema> '`	Search CQL only: Single quotation marks ( `'` ) surround an entire XML schema declaration.
`@xml_entity='xml_entity_type'`	Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

Definition

Command options

cl=?: Set the consistency level to use during cassandra-stress. Options are ONE, QUORUM, LOCAL_QUORUM, EACH_QUORUM, ALL, and ANY. Default is LOCAL_ONE.
clustering=DIST(?): Distribution clustering runs of operations of the same kind.
duration=?: Specify the time to run, in seconds, minutes or hours.
err<?: Specify a standard error of the mean; when this value is reached, cassandra-stress will end. Default is 0.02.
n>?: Specify a minimum number of iterations to run before accepting uncertainly convergence.
n<?: Specify a maximum number of iterations to run before accepting uncertainly convergence.
n=?: Specify the number of operations to run.
no-warmup: Do not warmup the process, do a cold start.
ops(?): Specify what operations to run and the number of each. (only with the user option)
profile=?: Designate the YAML file to use with cassandra-stress. (only with the user option)
truncate=?: Truncate the table created during cassandra-stress. Options are never, once, or always. Default is never.

Command arguments

-col

Column details, such as size and count distribution, data generator, names, and comparator.

Usage:

-col names=? [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)]
 or 
-col [n=DIST(?)] [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)]

-errors

How to handle errors when encountered during stress testing.

Usage:

-errors [retries=N] [ignore] [skip-read-validation]

retries=N Number of times to try each operation before failing.
ignore Do not fail on errors.
skip-read-validation Skip read validation and message output.

-graph

Graph results of cassandra-stress tests. Multiple tests can be graphed together.

Usage:

-graph file=? [revision=?] [title=?] [op=?]

-insert

Insert specific options relating to various methods for batching and splitting partition updates.

Usage:

-insert [revisit=DIST(?)] [visits=DIST(?)] partitions=DIST(?) [batchtype=?] select-ratio=DIST(?) row-population-ratio=DIST(?)

-log

>Where to log progress and the interval to use.

Usage:

-log [level=?] [no-summary] [file=?] [hdrfile=?] [interval=?] [no-settings] [no-progress] [show-queries] [query-log-file=?]

-mode

Thrift or CQL with options.

Usage:

-mode thrift [smart] [user=?] [password=?]
  or 
-mode native [unprepared] cql3 [compression=?] [port=?] [user=?] [password=?] [auth-provider=?] [maxPending=?] [connectionsPerHost=?] [protocolVersion=?]
  or
-mode simplenative [prepared] cql3 [port=?]

-node

Nodes to connect to.

Usage:

-node [datacenter=?] [whitelist] [file=?] []

-pop

Population distribution and intra-partition visit order.

Usage:

-pop seq=? [no-wrap] [read-lookback=DIST(?)] [contents=?]
  or
-pop [dist=DIST(?)] [contents=?]

-port

Specify port for connecting Cassandra nodes. Port can be specified for Cassandra native protocol, Thrift protocol or a JMX port for retrieving statistics.

Usage:

-port [native=?] [thrift=?] [jmx=?]

-rate

Set the rate using the following options:

-rate threads=N [throttle=N] [fixed=N]

where

threads=N number of clients to run concurrently.
throttle=N throttle operations per second across all clients to a maximum rate (or less) with no implied schedule. Default is 0.
fixed=N expect fixed rate of operations per second across all clients with implied schedule. Default is 0.

-rate [threads>=N] [threads<=N] [auto]

Where

threads>=N run at least this many clients concurrently. Default is 4.
threads<=N run at most this many clients concurrently. Default is 1000.
auto stop increasing threads once throughput saturates.

-schema

Replication settings, compression, compaction, and so on.

Usage:

-schema [replication(?)] [keyspace=?] [compaction(?)] [compression=?]

-sendto

Specify a server to send the stress command to.

Usage:

-sendto <host>

-tokenrange

Token range settings.

Usage:

-tokenrange [no-wrap] [split-factor=?] [savedata=?]

-transport

Custom transport factories.

Usage:

-transport [factory=?] [truststore=?] [truststore-password=?] [keystore=?] [keystore-password=?] [ssl-protocol=?] [ssl-alg=?] [store-type=?] [ssl-ciphers=?]

Use the -graph option

In Cassandra 3.2 and later, the -graph option provides visual feedback for cassandra-stress tests. A file must be named to build the resulting HTML file. A title and revision are optional, but revision must be used if multiple stress tests are graphed on the same output.

cassandra-stress user profile=tools/cqlstress-example.yaml ops\(insert=1\) -graph file=test.html title=test revision=test1

An interactive graph can be displayed with a web browser:

Use a YAML file to run cassandra-stress

This example uses a YAML file named cqlstress-example.yaml, which contains the keyspace and table definitions, and a query definition. The keyspace name and definition are the first entries in the YAML file:

keyspace: perftesting

keyspace_definition: 

  CREATE KEYSPACE perftesting WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3};

The table name and definition are created in the next section using CQL:

table: users

table_definition:

  CREATE TABLE users (
    username text,
    first_name text,
    last_name text,
    password text,
    email text,
    last_access timeuuid,
    PRIMARY KEY(username)
  );

In the extra_definitions section you can add secondary indexes or materialized views to the table:

extra_definitions:
  - CREATE MATERIALIZED VIEW perftesting.users_by_first_name AS SELECT * FROM perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY (first_name, username);
  - CREATE MATERIALIZED VIEW perftesting.users_by_first_name2 AS SELECT * FROM perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY (first_name, username);
  - CREATE MATERIALIZED VIEW perftesting.users_by_first_name3 AS SELECT * FROM perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY (first_name, username);

The population distribution can be defined for any column in the table. This section specifies a uniform distribution between 10 and 30 characters for username values in gnerated rows, that the values in the generated rows willcreates , a uniform distribution between 20 and 40 characters for generated startdate over the entire Cassandra cluster, and a Gaussian distribution between 100 and 500 characters for description values.

columnspec:
  - name: username
    size: uniform(10..30)
  - name: first_name
    size: fixed(16)
  - name: last_name
    size: uniform(1..32)
  - name: password
    size: fixed(80) # sha-512
  - name: email
    size: uniform(16..50)
  - name: startdate
    cluster: uniform(20...40)
  - name: description
    size: gaussian(100...500)

After the column specifications, you can add specifications for how each batch runs. In the following code, the partitions value directs the test to use the column definitions above to insert a fixed number of rows in the partition in each batch:

insert:
  partitions: fixed(10)
  batchtype: UNLOGGED

The last section contains a query, read1, that can be run against the defined table.

queries:
  read1:
    cql: select * from users where username = ? and startdate = ?
    fields: samerow     # samerow or multirow (select arguments from the same row, or randomly from all rows in the partition)

The following example shows using the user option and its parameters to run cassandra-stress tests from cqlstress-example.yaml:

cassandra-stress user profile=tools/cqlstress-example.yaml n=1000000 ops\(insert=3,read1=1\) no-warmup cl=QUORUM

Notice that:

The user option is required for the profile and opt parameters.
The value for the profile parameter is the path and filename of the .yaml file.
In this example, -n specifies the number of batches that run.
The values supplied for ops specifies which operations run and how many of each. These values direct the command to insert rows into the database and run the read1 query.
How many times? Each insert or query counts as one batch, and the values in ops determine how many of each type are run. Since the total number of batches is 1,000,000, and ops says to run three inserts for each query, the result will be 750,000 inserts and 250,000 of the read1 query.
Use escaping backslashes when specifying the ops value.

For more information, see Improved Cassandra 2.1 Stress Tool: Benchmark Any Schema – Part 1.