Setting the NodeSync rate

Estimate NodeSync rate impacts and set rates.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Estimating NodeSync rate setting impact

The rate_in_kb sets the per node rate of the local NodeSync service. It controls the maximum number of bytes per second used to validate data. There is a fundamental tradeoff between how fast NodeSync validates data and how many resources it consumes. The rate is a limit on the amount of resources used and a target that NodeSync tries to achieve by auto-tuning internals. The set rate might not be achieved in practice, because validation can complete at a slower rate on new or small cluster or the node might temporarily or permanently lack available resources.

Initial NodeSync rate setting

There is no strong requirement to keep all nodes validating at the same rate. Some nodes will simply validate more data than others. When setting the rate, use the simplest method first by using the defaults.

  1. Check the rate_in_kb setting within the nodesync section in the cassandra.yaml file.
  2. Try increasing or decreasing the value at run time:
    nodetool nodesyncservice setrate value_in_kb_sec
  3. Check the configured rate.
    nodetool nodesyncservice getrate
    Tip: The configured rate is different from the effective rate, which can be found in the NodeSync Service metrics.

Simulating NodeSync rates

When adjusting rates, use the NodeSync rate simulator to help determine the configuration settings by computing the rate necessary to validate all tables within their allowed deadlines.
Unfortunately, no perfect value exists because NodeSync also deals with many unknown or difficult to predict factors, such as:
  • Failures - When a node fails, it does not participate in NodeSync validation while it is offline.
  • Temporary overloads - During periods of overload, such as an unexpected events, nodes can not achieve the configured rate.
  • Data size variation - The rate required to repair all tables within a fixed amount of time directly depends on the size of the data to validate, which is typically a moving target.
All these factors can impact the overall NodeSync rate. Therefore build safety margins within the configured rate. The NodeSyncServiceRate simulator helps to set the rate.