Cluster synchronization settings
Use NodeSync or the OpsCenter Repair Service to maintain cluster synchronization.
For clusters running DataStax Enterprise (DSE) 6.0 or greater, the built-in NodeSync feature eliminates the need to run the OpsCenter Repair Service. For clusters running versions earlier than DSE 6.0, run the Repair Service to maintain cluster synchronization. NodeSync reduces operator complexity and is the recommended solution for cluster synchronization.
NodeSync settings
Enable NodeSync for all relevant keyspaces and tables, and configure the synchronization rate.
Ensure that NodeSync is enabled for all relevant keyspaces and tables through the OpsCenter interface. Monitor NodeSync status and potentially configure the NodeSync rate to ensure that synchronization happens in time allotted by the NodeSync deadline.
Repair Service settings
Use the Repair Service to run repair operations across nodes and their replicas.
When enabled, the Repair Service runs repair operations, which synchronize the most current data across nodes and their replicas, including repairing any corrupted data encountered at the file system level. Once enabled, the Repair Service resumes running where it left off. If this behavior is not desired (for example, when altering tokenranges_partitions), stop and start the Repair Service in the OpsCenter interface to restart synchronization.
Ensure that you are running the latest OpsCenter patch version (6.7.x, 6.5.x) when running the Repair Service. Nodes managed by OpsCenter 6.5.x must be running DSE 5.0.7 or later, and nodes managed by Opscenter 6.7.x must be running DSE 5.1.x or later.
Parallel repair operations
Setting the maximum for parallel subrange repairs property controls how many repair operations are issued to the cluster in parallel. This number is computed automatically by dividing the number of nodes by the largest replication factor of any keyspace in the cluster included in the Repair Service. The automatic calculation ensures nodes do not receive multiple repair requests.
This value might need to be tuned manually when keyspaces with a high replication factors exist. However, modifying this value might result in multiple repair requests being sent to a node.
However, ensure that the maximum for parallel subrange repairs is not being automatically calculated to a very low value. This automatic calculation is typically caused by having a keyspace in a cluster with a high replication factor.
Slower repairs
Slowing the progression of the Repair Service might benefit clusters that have a limited amount of data, but can reduce the impact of the Repair Service.
Consider the following advanced Repair Service configurations:
- Increasing the min_repair_time property (by doubling it or more) allows more time to pass between Repair Service requests.
- Increasing the time_to_completion_target_percentage throttle property (from current value up to 100) further slows Repair Service tasks.
Faster repairs
Increasing the speed of the Repair Service progression is necessary when a cluster is large or dense, which requires the Repair Service to run more quickly to synchronize the cluster. Increasing the Repair Service speed will increase the load on a cluster due to the increased synchronization activities.
Consider the following changes to increase the speed of the Repair Service:
- Enabling distributed subrange repairs can increase throughput when repairing large clusters (100+ nodes).
- Decreasing the min_repair_time property (can be as low as 0) shortens the amount of time between Repair Service tasks.
- Decreasing the time_to_completion_target_percentage throttle property (can be as low as 0) increases the speed of Repair Service tasks.