New features in DSE OpsCenter 6.5

Changes in features, configuration files, metrics, and APIs in OpsCenter 6.5.

New features

The following new and improved features are highlighted for the current DataStax Enterprise (DSE) OpsCenter version 6.5 release.

DataStax Enterprise 6.0 Support for provisioning and monitoring DataStax Enterprise 6.0 clusters, including:
  • Configuring AlwaysOn SQL in Lifecycle Manager
  • Monitoring AlwaysOn SQL nodes in OpsCenter Monitoring
Lifecycle Manager (LCM)
  • Upgrade DSE minor patch versions! Upgrades are supported for minor versions within a major release series, which includes DSE versions 5.0.x, 5.1.x, and 6.0.x and later at this time. Clone a config profile and run an upgrade job to push upgrade versions at the datacenter, rack, or node levels. See Example: Upgrading DSE to a minor release using LCM and Running an upgrade job.
  • Cloning configuration profiles for upgrading DSE patch releases, or for development or testing purposes.
  • Validation for configuration profiles. Any errors in a config profile are highlighted in red text and displayed at the top of the config profile page. Clicking an error brings you to the location of the error within the config profile for correcting the issue.
  • Concurrency Levels: LCM runs jobs at a specified concurrency level. Granularity of the concurrency level trades completing jobs faster at the expense of cluster availability.
  • LCM no longer requires setting up a repository within LCM if your organization has manually configured a DataStax repo externally from LCM.
  • Declarative password management. When authentication is enabled, it is no longer necessary to enter credentials in the Job dialogs every time a job is run. After enabling authentication in a Config Profile, entering credentials at the cluster level is required only one time. Future credential changes are allowed. See Editing a cluster for details on entering credentials.
NodeSync Service Automatically synchronize keyspace and table replicas as a background process. Keep data consistent and monitor status progress with alerts and dashboard graphs based on NodeSync metrics.
New metrics

For a comprehensive list of metrics available in OpsCenter, refer to the OpsCenter Metrics Tooltips Reference.

Backup Service
  • The AWS CLI feature for bulk uploading backups to Amazon S3 has been promoted from an OpsCenter Labs feature to an official production feature. Adjust your use_s3_cli configs from the[labs] section to the [backups] section.
  • Improved backup and restore for DSE Graphs.
Best Practice Service
  • The Check that NodeSync is enabled on all nodes rule has been added: The NodeSync service is intended to run on every node in the cluster. If any nodes are not running NodeSync, the replica data segments for those nodes will not be validated and synchronized, which could potentially result in data loss. This rule ensures that NodeSync is running on every node. To check whether NodeSync is running, use nodetool nodesyncservice enable. The NodeSync Service status is visible from the Nodes and Services areas of OpsCenter Monitoring.

    See Enabling keyspaces and tables for monitoring NodeSync in OpsCenter for additional details.

Updates from OpsCenter 6.1

The following changes are updates from the OpsCenter 6.1 major releases.

Support for multiple user roles using LDAP authentication

Users can have multiple roles when using LDAP authentication. If the list of a user's groups map to more than one role in OpsCenter, the user will be granted each of the listed roles, and their resulting OpsCenter permissions will be the merging of permissions for all of their roles. See Adding a role for an LDAP user.

Important: In OpsCenter 6.1.10 and later and OpsCenter 6.5.3 and later, you must update custom scripts and applications that use the OpsCenter API if you want to use multiple user roles with LDAP authentication. If a custom script or application that uses the OpsCenter API did not account for multiple user roles, and a user has multiple roles, the script or application will fail because the role attribute cannot be found. The single role attribute will be provided for users that have only one role. If your application or script has users with only one role, then updates are not required for continued use.

Compact storage no longer supported

Tables in the OpsCenter schema are no longer created with compact storage because this feature for thrift-compatible tables has been removed in DSE version 6.0.

Warning: Before upgrading to DSE 6.0 and attempting to connect to a cluster with OpsCenter, execute the following CQL command for every table in the OpsCenter keyspace.
ALTER TABLE table.name DROP COMPACT STORAGE

For more information, see migrating from compact storage (for DSE version 5.1.x clusters managed by OpsCenter) or migrating from compact storage (for DSE version 5.0.x clusters managed by OpsCenter). If OpsCenter was not upgraded to 6.5 before upgrading DSE to 6.0, refer to the instructions in this KB article for a workaround until the issue is fixed.

Note: This issue was fixed in OpsCenter 6.5.1.

For DSE versions earlier than 6.0, the OpsCenter Backup Service checks for tables that have compact storage and warns that they cannot be created during a restore.

Dropped messages metrics

Dropped messages metrics updates include:
  • The TP: Dropped Paged Range Reads and TP: Dropped Request Responses metrics have been removed for DSE 6.0 and later.
  • Several metrics regarding dropped messages have had their labels changed from TP: <message type> to Dropped Messages: <message type>.
  • New dropped messages metrics have been added.

Renamed RPC address properties to native transport in LCM UI and API

The RPC address fields in the LCM Add Node and Edit Node dialogs, and the Lifecycle Manager API have been renamed to Native Transport to correspond with the changes for DSE 6.0:
  • The RPC Address field in the LCM UI Add Node and Edit Node dialogs has been renamed to Native Transport (RPC) Address.
  • The Broadcast RPC Address field in the LCM UI Add Node and Edit Node dialogs has been renamed to Native Transport (RPC) Broadcast Address.
  • The rpc-address field in the LCM API has been renamed to native-transport-address.
  • The broadcast-rpc-address field in the LCM API has been renamed to native-transport-broadcast-address.
Note: If using the LCM API directly, update any API clients that reference the renamed fields rpc-address and broadcast-rpc-address.

OpsCenter and DataStax Agent API dynamic updates for agent log level

The log level (debug, info, warn, error) can now be dynamically set at the OpsCenter daemon level using the updating logging level method.
PUT /cluster_id/log/level/log_level

The log level update is not persisted to the log4j.properties configuration or the logback.xml configuration file. Restarting opscenterd or the DataStax Agent returns the agent log level to its original configuration value.

Tip: Update the log level for all agents across a cluster using a cURL command:
curl -X PUT http://127.0.0.1:8888/Test_Cluster/log/level/debug

The response body contains the IP addresses of the nodes whose agent log levels were updated and skipped.

Note: The OpsCenter API version remains at v1 (in contrast with LCM v2).

Change password parameters

/api/v2/actions/install/

The change-default-cassandra-password and cassandra-ldap-password parameters are no longer valid. Supplying these parameters at run job time made many edge cases impossible to detect. Corresponding parameters have been added to the cluster model at the /api/v2/lcm/clusters/ endpoint, where they can be persisted across jobs and facilitate more effective error handling.

Consistent error message fields

The msg field has been removed from all API errors and replaced with the message field. Previously the two fields were used inconsistently. Any API clients that expect to process a msg field must be updated to look for the message field instead.

More restrictive config_profiles (json validation)

/api/v2/lcm/config_profiles/

While the config_profiles API has not technically changed, it has become considerably more strict about the contents of the json field. Requests that previously returned a 200 success code might now fail with a validation error. Many formerly failing requests that have always been invalid or had undefined results are now rejected upfront at input submission time by the system rather than failing or behaving ambiguously later.

When processing POST and PUT requests for config_profiles, the system now verifies the format of the json field against the definitions for the relevant DSE version as specified in /api/v2/lcm/definitions/. The following properties are verified against the definitions:
  • Every key-name must be a valid DSE configuration property.
  • Every value-type must match the type specified in the DSE configuration property.
  • Families of fields that have dependencies must be consistent. One cannot disable a parent field (such as client_encryption_options.enabled) and specify a value for a dependent child field (such as client_encryption_options.keystore).

LCM API version updates

The following changes have been made to the Lifecycle Manager API:
  • Base url version bump from v1 to v2.
  • More strict validation of config_profiles API for both behavior and json content.
  • Job password parameters: Change cassandra user password parameters shifted from the run job level to the add (or edit) cluster level. LDAP password changes at the run job level have been removed.
  • Multiple endpoints replaced msg with message in API errors.

Base URL version change

The base url for LCM has changed from /api/v1/lcm/ to /api/v2/lcm/ to reflect the backwards-incompatible API changes present in the OpsCenter 6.5 release. All API clients must be updated to use the new base url.
Note: Unless a behavior change is described below in this section of the upgrade guide, all endpoint URLs will continue to operate at their new /api/v2/lcm/ location exactly as they did previously.

Declarative password management

The password experience for running jobs and managing the password of the cassandra user has been improved in Lifecycle Manager. Rather than requiring entering the password every time a job was run, the password is now declared at the cluster level. Entering credentials is only required once if an associated Config Profile has internal or password authentication enabled (which is the default behavior). The New DSE password field has been removed from the Job dialogs. Password fields have been added to the Add Cluster and Edit Cluster dialogs for changing the cassandra user password. The improved functionality allows changing the password for the cassandra user at any time, or removing the stored password.

New SSL configuration options

A new SSL configuration option, opscenter_ssl_strict_subject_validation, indicates that if a certificate subject does not match the IP of the server, the OpsCenter SSL agent rejects the certificate. The default option is false, which means the SSL agent attempts subject validation first. If validation fails, the agent logs a warning and retries the connection without subject validation. If set to true, the SSL agent rejects the certificate without retrying validation.

Repair Service new subrange repair configuration option for parallel tasks

The parallel_tasks_update_interval configuration option has been added to the Repair Service. The option determines the length of time before the Repair Service periodically recalculates the required number of parallel tasks to run during a subrange repair cycle. The interval is 120 seconds (2 minutes) by default. For more details, see Setting the maximum for parallel subrange repairs.

Backup Service new phased staging configuration option for commit logs

The On Server commit log storage has changed. Commit logs are still initially moved into the backup_staging_dir, but after the commit logs have been sent to any other configured locations, the commit logs are moved to the directory specified by a backup_storage_dir defined in address.yaml. This approach should resolve a number of problems customers have encountered when restarting agents due to large numbers of On Server commit logs being reprocessed. See Configuring commit log backups for details.

Cassandra read request timeout configuration options

The cluster configuration and DataStax Agent configuration files have new host read request timeout configuration options for both monitored and storage clusters:
All of the values default to nil, which forces the Java driver to use its default value of 12 seconds for the read timeout.
Note: The timeouts are per node. If the node selected to do the read operation hits the timeout, an internal retry policy is set in the Java driver to try the request again.
The new read timeout options for cluster_name.conf:
[cassandra]
host_read_timeout_ms=
          
[storage_cassandra]
host_read_timeout_ms=
The new read timeout options for the DataStax agents in address.yaml:
monitored_dse_host_read_timeout:
storage_dse_host_read_timeout:

New configuration option for OpsCenter failover URL

The new configuration option override_primary_redirect_url for overriding the default URL and port of the OpsCenter primary instance is available in opscenterd.conf.