Performing Cluster Operations¶

Cluster operations include initiating administrative actions on nodes, such as garbage collection, in a Cassandra or DSE cluster, rebalancing a cluster, and managing API requests sent to cluster.

Node Administration Methods
Perform bulk Operations	`POST /{cluster_id}/ops`
Initiate JVM garbage collection on a node.	`GET /{cluster_id}/ops/gc/{node_ip}`
Assign a new token to the node.	`PUT /{cluster_id}/ops/move/{node_ip}`
Drain a node.	`GET /{cluster_id}/ops/drain/{node_ip}`
Decommission a node.	`POST /{cluster_id}/ops/decommission/{node_ip}`
Clean up a keyspace.	`POST /{cluster_id}/ops/cleanup/{node_ip}/{ks_name}`
Flush memtables from a keyspace.	`POST /{cluster_id}/ops/flush/{node_ip}/{ks_name}`
Repair a keyspace.	`POST /{cluster_id}/ops/repair/{node_ip}/{ks_name}`
Compact a keyspace.	`POST /{cluster_id}/ops/compact/{node_ip}/{ks_name}`
Process Management Methods
Start Cassandra/DSE on a node	`POST /{cluster_id}/ops/start/{node_ip}`
Stop Cassandra/DSE on a node	`POST /{cluster_id}/ops/stop/{node_ip}`
Restart Cassandra/DSE on a node	`POST /{cluster_id}/ops/restart/{node_ip}`
Perform a rolling restart of the cluster	`POST /{cluster_id}/ops/restart`
Cluster Rebalancing Methods
List moves to balance a cluster.	`GET /{cluster_id}/ops/rebalance`
Run a list of moves to balance a cluster.	`POST /{cluster_id}/ops/rebalance`
Cluster Services
Get the status of cluster services.	`GET /{cluster_id}/services`
Cluster Repair Service
Turn on the cluster repair service.	`POST /{cluster_id}/services/repair`
Turn off the cluster repair service.	`DELETE /{cluster_id}/services/repair`
Get the status of the repair service.	`GET /{cluster_id}/services/repair`
Get a summary of the repair service progress.	`GET /{cluster_id}/repair-status`
Get details of the repair service progress.	`GET /{cluster_id}/repair-details`
NodeSync Service
Turn on/off NodeSync for tables.	`POST /{cluster_id}/nodesync`
Get the NodeSync status.	`GET /{cluster_id}/nodesync`
Get the summary of NodeSync progress.	`GET /{cluster_id}/nodesync/summary`
Get per-keyspace summary of NodeSync progress.	`GET /{cluster_id}/nodesync/summary/keyspace`
Get per-table summary of NodeSync progress.	`GET /{cluster_id}/nodesync/summary/table`
Get per-table summary of NodeSync progress in keyspace.	`GET /{cluster_id}/nodesync/summary/table/{keyspace}`
Request Management Methods
Get the status of a long-running request.	`GET /request/{request_id}/status`
Cancel a request.	`POST /request/{request_id}/cancel`
List requests of a specific type	`GET /{cluster_id}/request/{request_type}`

Node Administration Methods¶

POST /{cluster_id}/ops¶

Initiate a bulk set of operations on one or more nodes

Body:

A JSON dictionary with the following keys:

ips: List of IPs that represent the nodes the operations will run on:
action: The operation that should be performed on the node. Values

are (cleanup, compact, flush, perform_gc, repair, restart, start, stop)
is_rolling: Whether the jobs are running in a rolling or parallel fashion
sleep: Seconds between each grouping of jobs. Default is 60
args: Arguments in a list, to pass to each operation.

Returns a Request ID.

Example:

curl -X POST http://127.0.0.1:8888/Test_Cluster/ops
    -d '{"ips":["127.0.0.1"],"action":"cleanup", "is_rolling": true, "sleep": 1, "args":["OpsCenter", "events"]}'

GET /{cluster_id}/ops/gc/{node_ip}¶

Initiate JVM garbage collection on a Node.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – IP address of the target Node.

Returns null.

Example:

curl -X GET
  http://127.0.0.1:8888/Test_Cluster/ops/gc/1.2.3.4

PUT /{cluster_id}/ops/move/{node_ip}¶

Assign a new token to the node.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node to be assigned a new token.
Body:	New token to assign to node.

Returns a Request ID.

Example:

curl -X PUT
  http://127.0.0.1:8888/Test_Cluster/ops/move/10.11.12.72
  -d '"85070591730234615865843651857942052864"'

Output:

"72ff69b2-9cf5-4777-a600-9173b3fe7e6a"

GET /{cluster_id}/ops/drain/{node_ip}¶

Initiate a drain operation to flush all memtables from the node.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node to be flushed of memtables.

Returns null.

Example:

curl -X GET
  http://127.0.0.1:8888/Test_Cluster/ops/drain/1.2.3.4

POST /{cluster_id}/ops/decommission/{node_ip}¶

Initiate decommissioning of a node.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node to be decommissioned.

Returns null.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/decommission/1.2.3.4

POST /{cluster_id}/ops/cleanup/{node_ip}/{ks_name}¶

Initiate a cleanup operation for the specified keyspace.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node that initiates cleaning of the keyspace. ks_name – Name of the keyspace to be cleaned. If empty, all keyspaces will be cleaned up
Body:	List of tables to cleanup. If empty, all tables will be cleaned up.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/cleanup/1.2.3.4/Keyspace1
  -d '["ColFam1", "ColFam2"]'

POST /{cluster_id}/ops/flush/{node_ip}/{ks_name}¶

Flush memtables for a keyspace.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node to be flushed of memtables for a keyspace. ks_name – Keyspace of the memtables to be flushed. If empty, all keyspaces will be cleaned up
Body:	List of tables to flush. If empty, all tables will be flushed.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/flush/1.2.3.4/Keyspace1
  -d '["ColFam1", "ColFam2"]'

POST /{cluster_id}/ops/repair/{node_ip}/{ks_name}¶

Initiates repair of a keyspace.

Path arguments:

cluster_id – The ID of a cluster returned from GET /cluster-configs.
node_ip – Node that initiates repair.
ks_name – Keyspace to be repaired.

Body:

A JSON dictionary with the following keys:

is_sequential: A boolean indicating whether to run the repair sequentially or

not, default is true.
is_local: A boolean indicating whether to use only nodes in the same

datacenter during the repair or not. Default is false.
primary_range: Repair just the primary range for that node or else

will repair all ranges. A boolean, default is false.
cfs: List of tables (column families) to repair. If this is empty, all tables

will be repaired.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/repair/1.2.3.4/Keyspace1
  -d '{"is_sequential": false, cfs":["ColFam1", "ColFam2"]}'

POST /{cluster_id}/ops/compact/{node_ip}/{ks_name}¶

Initiates a major compaction on a keyspace.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node that initiates the compaction. ks_name – Keyspace to be compacted. If empty, all keyspaces will be cleaned up
Body:	List of tables to compact. If this is empty, all tables will be compacted.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/compact/1.2.3.4/Keyspace1
  -d '["ColFam1", "ColFam2"]'

Process Management Methods¶

POST /{cluster_id}/ops/start/{node_ip}¶

Start the Cassandra/DSE process on a single node.

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`. node_ip – Node to be started.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/start/10.11.12.72

Output:

"a34814a6-4896-11e2-a563-e0b9a54a6d93"

POST /{cluster_id}/ops/stop/{node_ip}¶

Stop the Cassandra/DSE process on a single node.

Path arguments:

cluster_id – The ID of a cluster returned from GET /cluster-configs.
node_ip – Node to be stopped.

Body:

A JSON dictionary with an optional key:

drain_first: A boolean to first perform a drain operation before stopping a node.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/stop/10.11.12.72
  -d '{"drain_first": true}'

Output:

"c0d81d54-4896-11e2-a563-e0b9a54a6d93"

POST /{cluster_id}/ops/restart/{node_ip}¶

Restart the Cassandra/DSE process on a single node.

Path arguments:

cluster_id – The ID of a cluster returned from GET /cluster-configs.
node_ip – Node to be restarted.

Body:

A JSON dictionary with two optional keys:

wait_for_cassandra: A boolean that waits until DSE is fully started before completing the request asynchronously.
drain_first: A boolean to first perform a drain operation before stopping a node.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/restart/10.11.12.72
  -d '{"wait_for_cassandra": true, "drain_first": true}'

Output:

"e2212500-4896-11e2-a563-e0b9a54a6d93"

POST /{cluster_id}/ops/restart¶

Perform a rolling restart of the entire cluster or a select list of nodes.

Path arguments:

cluster_id – The ID of a cluster returned from GET /cluster-configs.
node_ip – Node to be restarted.

Body:

A JSON dictionary with three optional keys:

sleep: Amount of time in seconds to sleep between restarting each node. Default is 60.
ips: A list of ips to restart. If left empty, all nodes will be restarted (this is the default behavior).
drain_first: A boolean to first perform a drain operation before stopping a node.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/restart

Output:

"e2212500-4896-11e2-a563-e0b9a54a6d93"

Cluster Rebalancing Methods¶

GET /{cluster_id}/ops/rebalance¶

Return a list of proposed moves to run to balance a cluster. Will throw an error if called on a cluster using vnodes

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`.

Returns a list of moves, where each move is a token and the IP address of its assigned node. The result of this call is passed to POST /{cluster_id}/ops/rebalance.

Example

curl http://127.0.0.1:8888/Test_Cluster/ops/rebalance

Output:

[
  [
    "85070591730234615865843651857942052864",
    "10.11.12.152"
  ]
]

POST /{cluster_id}/ops/rebalance¶

Run the specified list of moves to balance a cluster. Will throw an error if called on a cluster using vnodes

Path arguments:	cluster_id – The ID of a cluster returned from `GET /cluster-configs`.
Opt. params:	sleep – An optional number of seconds to wait between each move.
Body:	A list of moves to run to balance this cluster. This is typically the result of `GET /{cluster_id}/ops/rebalance`.

Returns a Request ID for determining the status of, or cancelling, a running rebalance.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/rebalance
  -d
  '[
     [
       "85070591730234615865843651857942052864",
       "10.11.12.152"
     ]
   ]'

Output:

"e330b179-1b9f-40c2-a2f5-d2f3d24aa85c"

Cluster Services¶

GET /{cluster_id}/services¶

Get the status of cluster services.

Returns a dictionary with service names as keys and the status, parameters, and associated activity or progress of the service as the values.

Example

curl "http://localhost:8888/Test_Cluster/services"

{
    "repair": {
        "progress": {
            "completed": 26,
            "total": 256
        },
        "status": {
            "parameters": {
                "time_to_completion": 100000
            },
            "status": true
        }
    }
}

Cluster Repair Service¶

POST /{cluster_id}/services/repair¶

Start the cluster repair service with the given parameters.

Body:

A dictionary of repair service parameters.

time_to_completion: The time in seconds to complete a repair cycle of the entire

cluster. For example, 864000 (10 days).

DELETE /{cluster_id}/services/repair¶: Stop the cluster repair service.

GET /{cluster_id}/services/repair¶

Get the status of the repair service.

Returns a dictionary describing the status and parameters of the service.

Example

curl "http://127.0.0.1:8888/Test_Cluster/services/repair"

{
    "status": true,
    "parameters": {"time_to_completion": 100000}
}

GET /{cluster_id}/repair-status¶

Get a status summary of the repair service progress.

Returns a progress summary for the current repair cycle. Includes statistics on pending, in progress, any errors, and completed repairs in total.

Example

curl "http://127.0.0.1:8888/Test_Cluster/repair-status"

{
    "config": {
        "cluster_stabilization_period": "30",
        "error_logging_window": "86400",
        "ignore_keyspaces": "",
    },
    "status": "active",
    "time_to_completion": 777600
    "overview": {
        "completed": 36,
        "failed": 0,
        "in_progress": 1,
        "remaining": 19,
        "repair_times": {
            "50": 1,
            "75": 1,
            "90": 1,
            "99": 5,
            "average": 1.3611111111111112,
            "max": 7,
            "min": 1
        },
        "total": 56
    },
    "incremental": {
        "completed": 8,
        "completed_bytes": 40000,
        "estimated_time": 0,
        "job_state": "success",
        "last_repair_ts": 0,
        "remaining": 0,
        "remaining_bytes": 0,
        "throughput": 1.0,
        "throughput_bytes": 5000,
        "total": 8,
        "total_bytes": 40000,
        "ttc_remaining": 777329
    },
    "subrange": {
        "completed": 28,
        "completed_bytes": 445648829,
        "estimated_time": 190,
        "job_state": "running",
        "last_repair_ts": 0,
        "remaining": 19,
        "remaining_bytes": 164736194,
        "throughput": 0.6829268292682927,
        "throughput_bytes": 11141095,
        "total": 48,
        "total_bytes": 610390023,
        "ttc_remaining": 777329
    },
    "details": {
        "OpsCenter.backup_reports": {
            "attempts": 0,
            "average_time": 0,
            "state": {
                "aborted": 0,
                "failure": 0,
                "pending": 4,
                "running": 0,
                "success": 0
            },
            "time": 0,
            "type": "incremental"
        },
    },
}

GET /{cluster_id}/repair-details¶

Gets a detailed list of current cycle’s repairs.

Opt. params:	keyspace – Limits results to only the specified keyspace. Optional. table – Limits results to only the specified table. Optional.

Returns a detailed list of every repair and its present status in the current repair cycle.

Example

curl "http://127.0.0.1:8888/Test_Cluster/repair-details?keyspace=myks&table=mytable"

[
    {
        "attempts": 0,
        "executing": false,
        "ksname": "blackhat",
        "last_error": "",
        "node": "127.0.0.4",
        "repair_range": [
            "0",
            "4611686018427387904"
        ],
        "size": 281480,
        "start_ts": 1492099955.84,
        "tables": [
            "cc"
        ],
        "time": 1,
        "type": "subrange"
    },
    {
        "attempts": 0,
        "executing": false,
        "ksname": "OpsCenter",
        "last_error": "",
        "node": "127.0.0.2",
        "size": 5000,
        "start_ts": 1492099102.985,
        "table": "settings",
        "time": 1,
        "type": "incremental"
    },
]

NodeSync Service¶

Terminology:

deadline:: Target for the maximum time between 2 validations of the same data. As long as the deadline is met, all parts of the ring (for the table) are validated at least that often. Deadline could be set via deadline_target_sec property of the table, or inferred from the gc_grace_seconds property. The deadline should always be less than or equal to the grace period. As long as the deadline is met, no data is resurrected due to tombstone purging. NodeSync prioritize segments in order to try to meet the deadline. The next segment to validate at any given time is the one the closest to missing its deadline.
segment:: A segment is a small local token range of a table. NodeSync recursively splits local ranges in half a certain number of times (depth) to create segments. The depth is calculated using the total table size, assuming equal distribution of data. Typically segments cover no more than 200 MB. The token ranges can be no smaller than a single partition, so very large partitions can result in segments larger than the configured size.

POST /{cluster_id}/nodesync¶

Enable or disable NodeSync for specified tables. Please note that there is a limit on maximum number of the tables for which NodeSync could be enabled or disabled in one request. See documentation about nodesync.max_request_tables configuration parameter.

Body:

A dictionary of parameters.

enable: array of table names for which NodeSync should be enabled. Table name could be specified as full name in form keyspace.table, or as wildcards, keyspace.* (all tables in given keyspace), or * (all tables);
disable: array of table names for which NodeSync should be disabled. If table name is specified in both sections, enable wins.

{
  "enable": ["test.t1", "test.t2"],
  "disable":["test.t3"]
}

Returns true if request was accepted, or JSON object describing error.

{
  "brief": "error",
  "message": "Please select fewer tables and try again. To maximize performance, OpsCenter has been configured to limit the number of concurrent updates to 50 tables. The total number of tables to process in this request, after expanding wild card selectors, was 102.",
  "type": "InvalidArguments"
}

GET /{cluster_id}/nodesync¶

Retrieve the NodeSync status.