nodetool nodesyncservice ratesimulator

Simulates rates necessary to achieve the NodeSync deadline.

Simulates rates necessary to achieve the NodeSync deadline based on configurable assumptions. Rate simulations are useful, but in production simulations are not a viable substitute for monitoring NodeSync and adjusting the rate.
Restriction: Do not use this command on a keyspace with RF=1 or on a single node cluster.

Monitor NodeSync status using OpsCenter. See .

Synopsis

nodetool [connection_options] nodesyncservice ratesimulator
[--deadline-overrides keyspace_name.table_name:deadline_target_time, ...] 
[-e keyspace_name.table_name, ...]
[help] [-i keyspace_name.table_name, ...]
[--ignore-replication-factor]
[simulate -ds factor_integer -rs factor_integer -sg factor_integer | 
recommended | recommended_minimum | theoretical_minimum] 
[] [-v] 
Table 1. Legend
Syntax conventions Description
UPPERCASE Literal keyword.
Lowercase Not literal.
Italics Variable value. Replace with a valid option or user-defined value.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

Definition

The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname
The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--deadline-overrides
Allows override on the configure deadline for some/all of the tables in the simulation.
-ds, --deadline-safety-factor
Specify factor (integer) to decrease table deadlines to account for imperfect conditions.
Only for simulate sub-command.
-e, --excludes keyspace_name.table_name, ...
A comma-separated list of tables to exclude from the simulation when NodeSync is enabled on the server-side; this simulates the impact on the rate of disabling NodeSync on those tables.
help
Displays options and usage instructions.
--ignore-replication-factor
Ignores the replication factor for the simulation. Without this option, the default assumes that NodeSync runs on every node of the cluster (which is highly recommended) and assumes that validation work is spread among replicas. When NodeSync runs on every node of the cluster, each node must validate the fraction 1/RF of the data the node owns. This option removes that assumption, and computes a rate that accounts for all the data the node stores.
-i, --includes keyspace_name.table_name, ...
A comma-separated list of tables to include in the simulation when NodeSync is not enabled server-side; simulates the impact on the rate of enabling NodeSync on those tables.
-rs, --rate-safety-factor factor_integer
Represents a factor of how much to increase the final rate to account for imperfect conditions. Applies only to the simulate sub-command.
-sg, --size-growth-factor factor_integer
Represents a factor of how much to increase data sizes to account for data growth. Applies only to the simulate sub-command.
-v, --verbose
Provides details on how the simulation is carried out. Displays all steps taken by the simulation. Although this option is useful for understanding the simulations, results can be large or may be excessive if many tables exist.

Examples

Simulate rates for comments table

nodetool nodesyncservice ratesimulator -i cycling.comments
Computed rate: 420kB/s.

Simulate rates with new target times for the comments table

nodetool nodesyncservice ratesimulator --deadline-overrides cycling.comments:20h

Simulate example

  1. In CQL, create tables within a keyspace of RF > 1 and NodeSync enabled. For example:
    CREATE KEYSPACE cycling WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
    USE cycling;
    CREATE TABLE comments (record_id timeuuid, id uuid, commenter text, comment text, created_at timestamp,
      PRIMARY KEY (id, created_at)) WITH nodesync={'enabled': 'true'};
    CREATE TABLE comments2 (record_id timeuuid, id uuid, commenter text, comment text, created_at timestamp,
      PRIMARY KEY (id, created_at)) WITH nodesync={'enabled': 'true'};
  2. Insert data into the tables. For example:
    INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter ) values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-02-14 12:43:20-0800', 'Raining too hard should have postponed', 'Alex');
    INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter ) values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-02-14 12:43:20.234-0800', 'Raining too hard should have postponed', 'Alex');
    INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter ) values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-03-21 13:11:09.999-0800', 'Second rest stop was out of water', 'Alex');
    INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter ) values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-04-01 06:33:02.16-0800', 'LATE RIDERS SHOULD NOT DELAY THE START', 'Alex');
    INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter ) values (now(), c7fceba0-c141-4207-9494-a29f9809de6f, totimestamp(now()), 'The gift certificate for winning was the best', 'Amy');
    INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter ) values (now(), c7fceba0-c141-4207-9494-a29f9809de6f, '2017-02-17 12:43:20.234+0400', 'Glad you ran the race in the rain', 'Amy');
    ...
  3. Run the simulator:
    nodetool nodesyncservice ratesimulator recommended
    Computed rate: 16B/s.
    As expected, the computed rate is rather small because very little data was inserted.
  4. Run the simulator with the verbose flag to view insights on why that rate was calculated:
    nodetool nodesyncservice ratesimulator recommended -v
    Using parameters:
     - Size growing factor:    1.00
     - Deadline safety factor: 0.25
     - Rate safety factor:     0.10
    
    cycling.comments:
      - Deadline target=7.5d, adjusted from 10d for safety.
      - Size=1.1MB to validate (2.3MB total (adjusted from 1.1MB for future growth) but RF=2).
      - Added to previous tables, 1.1MB to validate in 7.5d => 2B/s
      => New minimum rate: 2B/s
    cycling.comments2:
      - Deadline target=7.5d, adjusted from 10d for safety.
      - Size=7.1MB to validate (14MB total (adjusted from 7.1MB for future growth) but RF=2).
      - Added to previous tables, 8.3MB to validate in 7.5d => 14B/s
      => New minimum rate: 14B/s
    
    Computed rate: 16B/s, adjusted from 14B/s for safety.
    As expected, the computed rate is rather small because very little data was inserted.