Performing Cluster Operations

Cluster operations include initiating administrative actions on nodes, such as garbage collection, in a Cassandra or DSE cluster, rebalancing a cluster, and managing API requests sent to cluster.

Node Administration Methods  
Perform bulk Operations POST /{cluster_id}/ops
Initiate JVM garbage collection on a node. GET /{cluster_id}/ops/gc/{node_ip}
Assign a new token to the node. PUT /{cluster_id}/ops/move/{node_ip}
Drain a node. GET /{cluster_id}/ops/drain/{node_ip}
Decommission a node. POST /{cluster_id}/ops/decommission/{node_ip}
Clean up a keyspace. POST /{cluster_id}/ops/cleanup/{node_ip}/{ks_name}
Flush memtables from a keyspace. POST /{cluster_id}/ops/flush/{node_ip}/{ks_name}
Repair a keyspace. POST /{cluster_id}/ops/repair/{node_ip}/{ks_name}
Compact a keyspace. POST /{cluster_id}/ops/compact/{node_ip}/{ks_name}
Process Management Methods  
Start Cassandra/DSE on a node POST /{cluster_id}/ops/start/{node_ip}
Stop Cassandra/DSE on a node POST /{cluster_id}/ops/stop/{node_ip}
Restart Cassandra/DSE on a node POST /{cluster_id}/ops/restart/{node_ip}
Perform a rolling restart of the cluster POST /{cluster_id}/ops/restart
Cluster Rebalancing Methods  
List moves to balance a cluster. GET /{cluster_id}/ops/rebalance
Run a list of moves to balance a cluster. POST /{cluster_id}/ops/rebalance
Cluster Services  
Get the status of cluster services. GET /{cluster_id}/services
Cluster Repair Service  
Turn on the cluster repair service. POST /{cluster_id}/services/repair
Turn off the cluster repair service. DELETE /{cluster_id}/services/repair
Get the status of the repair service. GET /{cluster_id}/services/repair
Get a summary of the repair service progress. GET /{cluster_id}/repair-status
Get details of the repair service progress. GET /{cluster_id}/repair-details
Request Management Methods  
Get the status of a long-running request. GET /request/{request_id}/status
Cancel a request. POST /request/{request_id}/cancel
List requests of a specific type GET /{cluster_id}/request/{request_type}

Node Administration Methods

POST /{cluster_id}/ops

Initiate a bulk set of operations on one or more nodes

Body:

A JSON dictionary with the following keys:

  • ips: List of IPs that represent the nodes the operations will run on:
  • action: The operation that should be performed on the node. Values
    are (cleanup, compact, flush, perform_gc, repair, restart, start, stop)
  • is_rolling: Whether the jobs are running in a rolling or parallel fashion
  • sleep: Seconds between each grouping of jobs. Default is 60
  • args: Arguments in a list, to pass to each operation.

Returns a Request ID.

Example:

curl -X POST http://127.0.0.1:8888/Test_Cluster/ops
    -d '{"ips":["127.0.0.1"],"action":"cleanup", "is_rolling": true, "sleep": 1, "args":["OpsCenter", "events"]}'
GET /{cluster_id}/ops/gc/{node_ip}

Initiate JVM garbage collection on a Node.

Path arguments:

Returns null.

Example:

curl -X GET
  http://127.0.0.1:8888/Test_Cluster/ops/gc/1.2.3.4
PUT /{cluster_id}/ops/move/{node_ip}

Assign a new token to the node.

Path arguments:
Body:

New token to assign to node.

Returns a Request ID.

Example:

curl -X PUT
  http://127.0.0.1:8888/Test_Cluster/ops/move/10.11.12.72
  -d '"85070591730234615865843651857942052864"'

Output:

"72ff69b2-9cf5-4777-a600-9173b3fe7e6a"
GET /{cluster_id}/ops/drain/{node_ip}

Initiate a drain operation to flush all memtables from the node.

Path arguments:

Returns null.

Example:

curl -X GET
  http://127.0.0.1:8888/Test_Cluster/ops/drain/1.2.3.4
POST /{cluster_id}/ops/decommission/{node_ip}

Initiate decommissioning of a node.

Path arguments:

Returns null.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/decommission/1.2.3.4
POST /{cluster_id}/ops/cleanup/{node_ip}/{ks_name}

Initiate a cleanup operation for the specified keyspace.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • node_ipNode that initiates cleaning of the keyspace.
  • ks_name – Name of the keyspace to be cleaned. If empty, all keyspaces will be cleaned up
Body:

List of tables to cleanup. If empty, all tables will be cleaned up.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/cleanup/1.2.3.4/Keyspace1
  -d '["ColFam1", "ColFam2"]'
POST /{cluster_id}/ops/flush/{node_ip}/{ks_name}

Flush memtables for a keyspace.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • node_ipNode to be flushed of memtables for a keyspace.
  • ks_name – Keyspace of the memtables to be flushed. If empty, all keyspaces will be cleaned up
Body:

List of tables to flush. If empty, all tables will be flushed.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/flush/1.2.3.4/Keyspace1
  -d '["ColFam1", "ColFam2"]'
POST /{cluster_id}/ops/repair/{node_ip}/{ks_name}

Initiates repair of a keyspace.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • node_ipNode that initiates repair.
  • ks_name – Keyspace to be repaired.
Body:

A JSON dictionary with the following keys:

  • is_sequential: A boolean indicating whether to run the repair sequentially or
    not, default is true.
  • is_local: A boolean indicating whether to use only nodes in the same
    datacenter during the repair or not. Default is false.
  • primary_range: Repair just the primary range for that node or else
    will repair all ranges. A boolean, default is false.
  • cfs: List of tables (column families) to repair. If this is empty, all tables
    will be repaired.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/repair/1.2.3.4/Keyspace1
  -d '{"is_sequential": false, cfs":["ColFam1", "ColFam2"]}'
POST /{cluster_id}/ops/compact/{node_ip}/{ks_name}

Initiates a major compaction on a keyspace.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • node_ipNode that initiates the compaction.
  • ks_name – Keyspace to be compacted. If empty, all keyspaces will be cleaned up
Body:

List of tables to compact. If this is empty, all tables will be compacted.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/compact/1.2.3.4/Keyspace1
  -d '["ColFam1", "ColFam2"]'

Process Management Methods

POST /{cluster_id}/ops/start/{node_ip}

Start the Cassandra/DSE process on a single node.

Path arguments:

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/start/10.11.12.72

Output:

"a34814a6-4896-11e2-a563-e0b9a54a6d93"
POST /{cluster_id}/ops/stop/{node_ip}

Stop the Cassandra/DSE process on a single node.

Path arguments:
Body:

A JSON dictionary with an optional key:

  • drain_first: A boolean to first perform a drain operation before stopping a node.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/stop/10.11.12.72
  -d '{"drain_first": true}'

Output:

"c0d81d54-4896-11e2-a563-e0b9a54a6d93"
POST /{cluster_id}/ops/restart/{node_ip}

Restart the Cassandra/DSE process on a single node.

Path arguments:
Body:

A JSON dictionary with two optional keys:

  • wait_for_cassandra: A boolean that waits until DSE is fully started before completing the request asynchronously.
  • drain_first: A boolean to first perform a drain operation before stopping a node.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/restart/10.11.12.72
  -d '{"wait_for_cassandra": true, "drain_first": true}'

Output:

"e2212500-4896-11e2-a563-e0b9a54a6d93"
POST /{cluster_id}/ops/restart

Perform a rolling restart of the entire cluster or a select list of nodes.

Path arguments:
Body:

A JSON dictionary with three optional keys:

  • sleep: Amount of time in seconds to sleep between restarting each node. Default is 60.
  • ips: A list of ips to restart. If left empty, all nodes will be restarted (this is the default behavior).
  • drain_first: A boolean to first perform a drain operation before stopping a node.

Returns a Request ID.

Example:

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/restart

Output:

"e2212500-4896-11e2-a563-e0b9a54a6d93"

Cluster Rebalancing Methods

GET /{cluster_id}/ops/rebalance

Return a list of proposed moves to run to balance a cluster. Will throw an error if called on a cluster using vnodes

Path arguments:cluster_id – The ID of a cluster returned from GET /cluster-configs.

Returns a list of moves, where each move is a token and the IP address of its assigned node. The result of this call is passed to POST /{cluster_id}/ops/rebalance.

Example

curl http://127.0.0.1:8888/Test_Cluster/ops/rebalance

Output:

[
  [
    "85070591730234615865843651857942052864",
    "10.11.12.152"
  ]
]
POST /{cluster_id}/ops/rebalance

Run the specified list of moves to balance a cluster. Will throw an error if called on a cluster using vnodes

Path arguments:cluster_id – The ID of a cluster returned from GET /cluster-configs.
Opt. params:sleep – An optional number of seconds to wait between each move.
Body:A list of moves to run to balance this cluster. This is typically the result of GET /{cluster_id}/ops/rebalance.

Returns a Request ID for determining the status of, or cancelling, a running rebalance.

Example

curl -X POST
  http://127.0.0.1:8888/Test_Cluster/ops/rebalance
  -d
  '[
     [
       "85070591730234615865843651857942052864",
       "10.11.12.152"
     ]
   ]'

Output:

"e330b179-1b9f-40c2-a2f5-d2f3d24aa85c"

Cluster Services

GET /{cluster_id}/services

Get the status of cluster services.

Returns a dictionary with service names as keys and the status, parameters, and associated activity or progress of the service as the values.

Example

curl "http://localhost:8888/Test_Cluster/services"
{
    "repair": {
        "progress": {
            "completed": 26,
            "total": 256
        },
        "status": {
            "parameters": {
                "time_to_completion": 100000
            },
            "status": true
        }
    }
}

Cluster Repair Service

POST /{cluster_id}/services/repair

Start the cluster repair service with the given parameters.

Body:

A dictionary of repair service parameters.

  • time_to_completion: The time in seconds to complete a repair cycle of the entire
    cluster. For example, 864000 (10 days).
DELETE /{cluster_id}/services/repair

Stop the cluster repair service.

GET /{cluster_id}/services/repair

Get the status of the repair service.

Returns a dictionary describing the status and parameters of the service.

Example

curl "http://127.0.0.1:8888/Test_Cluster/services/repair"
{
    "status": true,
    "parameters": {"time_to_completion": 100000}
}
GET /{cluster_id}/repair-status

Get a status summary of the repair service progress.

Returns a progress summary for the current repair cycle. Includes statistics on pending, in progress, any errors, and completed repairs in total.

Example

curl "http://127.0.0.1:8888/Test_Cluster/repair-status"
{
    "config": {
        "cluster_stabilization_period": "30",
        "error_logging_window": "86400",
        "ignore_keyspaces": "",
    },
    "status": "active",
    "time_to_completion": 777600
    "overview": {
        "completed": 36,
        "failed": 0,
        "in_progress": 1,
        "remaining": 19,
        "repair_times": {
            "50": 1,
            "75": 1,
            "90": 1,
            "99": 5,
            "average": 1.3611111111111112,
            "max": 7,
            "min": 1
        },
        "total": 56
    },
    "incremental": {
        "completed": 8,
        "completed_bytes": 40000,
        "estimated_time": 0,
        "job_state": "success",
        "last_repair_ts": 0,
        "remaining": 0,
        "remaining_bytes": 0,
        "throughput": 1.0,
        "throughput_bytes": 5000,
        "total": 8,
        "total_bytes": 40000,
        "ttc_remaining": 777329
    },
    "subrange": {
        "completed": 28,
        "completed_bytes": 445648829,
        "estimated_time": 190,
        "job_state": "running",
        "last_repair_ts": 0,
        "remaining": 19,
        "remaining_bytes": 164736194,
        "throughput": 0.6829268292682927,
        "throughput_bytes": 11141095,
        "total": 48,
        "total_bytes": 610390023,
        "ttc_remaining": 777329
    },
    "details": {
        "OpsCenter.backup_reports": {
            "attempts": 0,
            "average_time": 0,
            "state": {
                "aborted": 0,
                "failure": 0,
                "pending": 4,
                "running": 0,
                "success": 0
            },
            "time": 0,
            "type": "incremental"
        },
    },
}
GET /{cluster_id}/repair-details

Gets a detailed list of current cycle’s repairs.

Opt. params:
  • keyspace – Limits results to only the specified keyspace. Optional.
  • table – Limits results to only the specified table. Optional.

Returns a detailed list of every repair and its present status in the current repair cycle.

Example

curl "http://127.0.0.1:8888/Test_Cluster/repair-details?keyspace=myks&table=mytable"
[
    {
        "attempts": 0,
        "executing": false,
        "ksname": "blackhat",
        "last_error": "",
        "node": "127.0.0.4",
        "repair_range": [
            "0",
            "4611686018427387904"
        ],
        "size": 281480,
        "start_ts": 1492099955.84,
        "tables": [
            "cc"
        ],
        "time": 1,
        "type": "subrange"
    },
    {
        "attempts": 0,
        "executing": false,
        "ksname": "OpsCenter",
        "last_error": "",
        "node": "127.0.0.2",
        "size": 5000,
        "start_ts": 1492099102.985,
        "table": "settings",
        "time": 1,
        "type": "incremental"
    },
]

Request Management Methods

Request

Requests are the method that OpsCenter uses to track potentially long-running requests that must be completed asynchronously. When these potentially long-running API calls are made, opscenterd will immediately return a Request ID that can be used to look up the status of the request.

Once a Request is started, you can fetch the status information for it until opscenterd is restarted or a large number of Requests have been started.

A Request status takes the following form:

{
   "id": ID,
   "state": STATE,
   "started": STARTED,
   "finished": FINISHED,
   "cluster_id": CLUSTER_ID,
   "details": DETAILS
}
Data:
  • ID (string) – The unique UUID for this Request. When an operation is potentially long-running, opscented will return this ID immediately.
  • STATE (string) – Either “running”, “success”, or “error”
  • STARTED (int) – A unix timestamp representing when the Request started
  • FINISHED (int) – A unix timestamp representing when the Request finished, or null if it has not finished yet
  • CLUSTER_ID (string) – The name of the cluster that the Request is operating on
  • DETAILS – Typically a string containing a status or error message, but may be a dictionary in the form {<subrequest_id>: <Request>} when the Request holds a collection of subrequests.
Content Types:
  • JSON
GET /request/{request_id}/status

Check the status of an asynchronous request sent to OpsCenter.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • request_id – The ID returned by the API call that triggered the request.

Return a dictionary describing the status of the request.

Example

curl http://127.0.0.1:8888/request/6b6b15aa-df8a-43f1-aab3-efce6b8589e4/status
{
  "status": "running",
  "started": 1334856122,
  "error_message": null,
  "finished": null,
  "moves": [
    {
     "status": null,
     "ip": "10.100.100.100",
     "old": "2",
     "new": "85070591730234615865843651857942052864"
    }
  ],
  "id": "6b6b15aa-df8a-43f1-aab3-efce6b8589e4"
}
POST /request/{request_id}/cancel

Cancel an asynchronous request sent to OpsCenter. Not all requests can be cancelled.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • request_id – The ID returned by the API call that triggered the request.

Returns null.

Example

curl -X POST
  http://127.0.0.1:8888/request/6b6b15aa-df8a-43f1-aab3-efce6b8589e4/cancel

The request is canceled.

GET /{cluster_id}/request/{request_type}

List requests of a particular type. Default is the latest request of that type.

Path arguments:
  • cluster_id – The ID of a cluster returned from GET /cluster-configs.
  • request_type – Either “rolling-restart”, “restore” or “bulk-operations”
Query params:

list_all – A boolean (0 or 1) indicating whether all of the requests should be returned or just the latest. Default is 0 (false).

Returns a Request ID. If list is true, then an array of Request IDs

Example

curl -X GET
  http://127.0.0.1:8888/Test_Cluster/request/rolling-restart
"8f4f71e7-65d3-41a7-bb1a-789af07dbd73"
curl -X GET
  http://127.0.0.1:8888/Test_Cluster/request/rolling-restart?list_all=1
[
    "35dd37a5-4170-4694-9253-faa9532d47b6",
    "8f4f71e7-65d3-41a7-bb1a-789af07dbd73"
]