OpsCenter 6.0.0 Release Notes

Release notes and known issues for the OpsCenter version 6.0.0 release.

Highlights 

  • The opscenterd process now runs on the JVM.
  • Vastly improved the visibility, display, and behavior of agent status and installation.
  • Alerts improvements, including SNMP alerts, HipChat integration and more flexibility.
  • Backup and Restore to and from Local Directories.
  • The new Lifecycle Manager is now available, allowing users to create new clusters with the click of a button, expand existing clusters, and centrally manage configuration for all of their nodes.
  • OpsCenter 6.0 now only supports DataStax Enterprise clusters. Attempts to add a non-DSE cluster error gracefully. Any currently configured non-DSE clusters do not prevent opscenterd from starting up; however, OpsCenter 6.0 does not monitor any non-DSE clusters.

Be sure to check out the New features section for more details.

Compatibility 

To see which versions of DataStax Enterprise are supported with OpsCenter 6.0, see the OpsCenter Compatibility chart.

For upgrade instructions, see the DataStax OpsCenter Upgrade Guide.

Known Issues 

Important: Please be sure to review the list of Known Issues in OpsCenter 6.0 before running on a production cluster.

Core 

  • Added full support for the newly released DataStax Enterprise 5.0. (OPSC-5562)
  • OpsCenter now runs on the JVM. See Configuring the OpsCenter JVM. (OPSC-4915).
  • Python has been removed as a requirement for opscenterd to run. (OPSC-7368)
  • If you are planning on or are currently using the opsc_system_key_tool to add encryption to your opscenter configuration file, note that your key size is limited to 128 bits unless the Java JCE Policy files Extension is installed for your JDK. If you are already using the encryption feature and your system key is currently greater than 128 bits, you must ensure you have this extension installed for the encryption feature to continue functioning properly. (OPSC-5985)
  • There is now a startup.log file that gets created upon startup of the opscenterd process. This log file can contain debug information or stack traces if opscenterd fails to start up before a normal opscenterd.log is created. (OPSC-7565)
  • Improved the encryption strength of passwords stored in passwd.db when using OpsCenter authentication. Passwords are re-encrypted when users log in again. (OPSC-4400)
  • More detailed agent status can now be found in the Agents tab in the Nodes section. (OPSC-7364)
  • The Agents dialog has been replaced by a less obtrusive banner at the top of the UI. (OPSC-7385)
  • The Add Cluster workflow now incorporates the option to install agents automatically or manually. (OPSC-7388, OPSC-7815)
  • The automated agent install process using SSH has been overhauled to be idempotent and more robust. (OPSC-7399)
  • Improved automatic detection of local node properties in the agent; including but not limited to file location, permission detection, and handling. (OPSC-2445)
  • The datastax_agent_monitor process has been removed. The process existed to automatically restart the DataStax agent if it crashed. Users should now use the third-party tool of their choice to accomplish automatically restarting the DataStax agent if so desired. (OPSC-6901)
  • Added stronger validation of configuration options in address.yaml. If invalid properties are found, the agent shuts down. (OPSC-6945)
  • Reduced the verbosity of log messages that stated a message was received from an agent opscenterd could not recognize. These messages are now batched and logged only every 10 minutes by default. (OPSC-7006)
  • Improved error handling when LDAP connection issues are encountered during login. (OPSC-5839)
  • The API timeout for managing a new but existing cluster has been increased from 30 seconds to 5 minutes. The extra time allows clusters with schemas that are still settling to be added successfully. (OPSC-9203)
  • Trace logging was added for all opscenterd incoming stomp messages. (OPSC-8664)
  • Sensitive information is now *REDACTED* from logs rather than excluded completely. (OPSC-6572)
  • The Cluster Report now opens in a new tab rather than automatically downloading a PDF. Users can save and export the page in whatever format is supported by their OS and browsers. (OPSC-7852)
  • Added a timeout to schema agreement to improve the robustness of creating a schema upon startup. (OPSC-8253)
  • Automatic definition file updates now leverage the list of SSL/TLS certificate authorities built-in to the JVM rather than a bundled certificate. (OPSC-6782)
  • The standalone installer (.run for linux, .dmg for OS X) for OpsCenter has been discontinued. (OPSC-9044)
  • Native browser auto-complete has been disabled for cluster username and password fields. (OPSC-7619)
  • Fixed an issue where opscenterd was running as root when installed using an RPM package. The opscenter user is now created upon install. Note that any OpsCenter files on custom paths might need updated permissions. (OPSC-5487)
  • Fixed an issue with the Manage Roles button being disabled when one or more DataStax Enterprise clusters are down. (OPSC-4396)
  • Fixed an issue with the Multiple Versions Detected dialog displaying on the Dashboard when restarting nodes. (OPSC-1407)
  • Fixed an issue with writing multiple stomp_interface properties to address.yaml during failover. (OPSC-5048)
  • Fixed an issue that prevented scrolling the cluster navigation list when necessary. (OPSC-5465)
  • Fixed an issue that required restarting opscenterd after JMX authentication was enabled on a cluster. (OPSC-5524)
  • Fixed an issue causing browsers to incorrectly autofill some credential fields in the Edit Cluster Connection Settings dialog. (OPSC-5886)
  • Fixed an issue with high CPU usage by agents on some cluster topologies. (OPSC-6045)
  • Fixed an issue where updating Edit Cluster Connection Settings through the UI would remove other properties set manually in cluster_name.conf. (OPSC-6078)
  • Fixed an issue causing the startup_sleep property to not be respected. This property controls an optional delay between clusters when opscenterd is starting up to alleviate performance issues. (OPSC-7334)
  • Fixed an issue in tarball installations that excluded opscenter_system_key_tool. (OPSC-7347)
  • Fixed an issue with the Role list not selecting the correct role when editing a user. (OPSC-7861)
  • Fixed an issue where opscenterd would not use the broadcast address to connect to DataStax agents when it was set. (OPSC-7897)
  • Fixed an issue that prevented trying to create the OpsCenter keyspace again if it failed. (OPSC-8336)
  • Fixed an issue editing keyspace replication when datacenter names contained hyphens. (OPSC-6137)
  • Fixed an issue with the repair service causing errors via overlapping repairs. This fixes the "Cannot start multiple repair sessions over the same sstables" error. (OPSC-8202)
  • Fixed an issue where Repair Service would block on replication settings of system_distributed (it was not ignored as with all other system keyspaces). (OPSC-7993)

Monitoring 

  • When upgrading DSE, alerts that exist for metrics that no longer exist in DSE will be automatically deleted. Users will be notified of this and the automatic removal of deprecated graphs on Dashboard presets when the OpsCenter UI is loaded. (OPSC-7763)
  • Non-percentile latency (that is, Read/Write Request Latency) metrics were removed from DSE and also OpsCenter. A placeholder metric generated from the percentile latency metrics will take its place but are calculated a little differently. The average metric is actually the median now, and the minimum and maximum are the actual minimum and maximum occurring latencies instead of the minimum and maximum collected averages. (OPSC-8458)
  • The Nodes section is now shown by default rather than the Dashboard when loading the OpsCenter monitoring UI. (OPSC-8234)
  • Added support for DSE Tiered storage including metrics for storage tiers. (OPSC-7458)
  • Added SNMP integration in alert notifications. (OPSC-309)
  • Added a new Agent Issue alert type. Users can now be proactively alerted when an agent installation or configuration may need attention. (OPSC-1862)
  • Added HipChat integration to alert notifications. (OPSC-2750)
  • Added ability to export metrics directly to Graphite. (OPSC-4499)
  • Global dashboard now displays a warning message when Opscenter cannot connect to a cluster. (OPSC-6966)
  • Threshold information is now included in the body of email alerts. For example: (Current value is 30; threshold is >10). (OPSC-3827)
  • Cluster name has been added as a property in POST URL alert notifications. (OPSC-4786)
  • Multiple recipients can now be specified in email alerts. (OPSC-5193)
  • Preset labels on the dashboard range selector are now marked bold upon selection. (OPSC-5760)
  • Added ability to set up alerts on percentile metrics, such as read and write latency. (OPSC-6791)
  • Creating and updating alerts with a duration set to zero now returns an HTTP 400 response. (OPSC-7474)
  • Added validation to start and end timestamps on /new-metrics API call. (OPSC-7666)
  • Values shown on Thread Pool Stats are now color-coded to bring attention to important values. (OPSC-4641)
  • Fixed an issue causing the Activity listing to reset scroll position upon update in the Activities tab. (OPSC-7519)
  • Fixed issues with and updated schema for maintaining rollup states across agent restarts. (OPSC-4190)
  • Fixed an issue causing zoomed Dashboard graphs to not reflect the proper graph parameters in some cases. (OPSC-7478)
  • Fixed an issue that caused the Down Node alert to not disable properly. (OPSC-7766)
  • Fixed an issue that caused graphs using All Nodes to improperly load. (OPSC-8035)
  • Fixed an issue that caused the Storage Capacity dashboard widget to appear blank. The issue was due to OpsCenter failing to parse mounted filesystems in some environments. (OPSC-8215)
  • Fixed an issue with mini-graph label layouts on the Overview page in Firefox. (OPSC-6024)
  • Fixed UI layout overlap issues when nodes had many tasks running at the same time. (OPSC-5374)
  • Fixed an issue where filters on the Nodes List View would not be cleared until users clicked away from the list in the UI. (OPSC-6399)
  • Fixed an issue with node load decimal precision being too long. (OPSC-7969)
  • Fixed an issue that prevented Search metrics from being available when running Search Analytics workloads. (OPSC-5002)

Backup Service 

  • Added the ability to back up to and restore from a user-defined directory on a local filesystem. (OPSC-5185)
  • Added the ability to back up multiple keyspaces in a single backup. (OPSC-7712)
  • Browsing S3 backups to restore now leverages the default_api_timeout setting for environments where the API call takes longer than 60 seconds. (OPSC-8863)
  • Added new sstableloader_max_heap_size property to agent configuration to increase the MAX_HEAP_SIZE of sstableloader during restore. (OPSC-7225)
  • Improved connection retry handling when checking if a blob exists in S3. (OPSC-7146)
  • Improved error reporting during restore when sstableloader runs out of memory. (OPSC-7180)
  • Exposed new configuration options to increase the history of job statuses stored on the agent. More information about the symptoms and solution can be found here: https://support.datastax.com/hc/en-us/articles/206456076. (OPSC-6917)
  • Data size for backups now displays in all cases. (OPSC-7686)
  • Scheduled backups can no longer be created in the past, which prevents accidental unexpected behavior. (OPSC-7421)
  • Provided clear messaging when attempting to back up very large keyspaces that might exceed currently configured limits. (OPSC-7537)
  • Attempting to restore from a deleted backup is no longer allowed. (OPSC-7647)
  • Improved validation handling for S3 bucket names that are not lowercase. (OPSC-8015)
  • A manual backup deletion has been re-labelled "Deletion Complete" in the Backup Service activity table. The backup report for a deleted backup now contains the heading "Pre-Deletion Summary" and label "This backup has been deleted". (OPSC-8656)
  • Fixed issues when running many backup and restore operations back-to-back. (OPSC-7125)
  • Fixed an issue with lost+found directories causing backups to fail. Only directories for existing tables are scanned. (OPSC-5389)
  • Fixed an issue that caused Data Size to always appear as N/A on the Restore Status dialog. (OPSC-4498)
  • Fixed an issue when restoring to a cluster with client-to-node encryption enabled and a custom keystore path. (OPSC-6692)
  • Fixed error handling when restoring a nonexistent table. (OPSC-7094)
  • Fixed an issue with post-backup scripts causing files to be sent as a JSON blob rather than a file-per-newline. (OPSC-7108)
  • Fixed an issue with some multi-part S3 backups failing when using S3 server-side encryption. (OPSC-7247)
  • Fixed consistency issues with PIT restores. (OPSC-7639)
  • Fixed an issue that created some duplicate directories during backups. (OPSC-7655)
  • Fixed an issue when backing up DataStax Enterprise encrypted tables multiple times. (OPSC-7709)
  • Fixed an issue in cleaning up destination after a Restore. (OPSC-7767)
  • Fixed an issue with the display of Restore dialog pushing buttons beyond view. (OPSC-7858)
  • Fixed the display of the Restore Report dialog after an S3 bucket has been removed. (OPSC-8212)
  • Fixed an issue that caused the agent config property backup_file_queue_max to not be respected. (OPSC-8868)
  • Fixed issues with the Backup Service and Best Practice Service that corrupted schedules and expected jobs to fire in the past, which caused those services to run more aggressively than they should. (OPSC-7350)

Diagnostic Tarball 

  • The cluster_name.conf file is now included in the diagnostic tarball. (OPSC-7157)
  • The agent address.yaml configuration is now included in the diagnostic tarball. (OPSC-7177)
  • Changed the way a new browser tab is opened for the Diagnostics Tarball as a workaround to popup blockers in some browsers. (OPSC-8869)
  • Added a diagnostic_tarball_download_timeout config property that allows users to increase the timeout for downloading information from a single node. (OPSC-8891)
  • Fixed an issue with the diagnostic tarball not collecting cqlsh output for clusters running DSE 4.7 or later. (OPSC-7053)

Performance Service 

  • Those that use Lifecycle Manager and want to persist Performance Object settings in DSE are notified to make those changes manually in Lifecycle Manager rather than through the Performance Service. The Performance Service still applies any changes via JMX, but the changes do not persist after a DSE restart. (OPSC-8355)
  • Fixed some issues with link rendering. (OPSC-5787, OPSC-7404)
  • Fixed an issue that truncated the titles of some graphs in the Performance Service. (OPSC-5788)
  • Fixed an issue that caused stack overflows in some Performance Service edge cases. Most users would not see the symptoms of this issue. (OPSC-7648)

Best Practice Service 

  • Fixed an issue with properly showing disabled Best Practice Service rules. (OPSC-5779)
  • Fixed an issue with the "Security superuser has default setting" Best Practice Service rule that prevented the rule from warning properly. (OPSC-7281)
  • Fixed issues with the Backup Service and Best Practice Service that corrupted schedules and expected jobs to fire in the past, which caused those services to run more aggressively than they should. (OPSC-7350)

Known Issues 

The following are new issues that exist in OpsCenter 6.0.x versions. Each item has a link to more details including workarounds when available. These issues will be addressed in future releases where possible. If you have any questions, please contact DataStax Support for assistance.

  • Users may observe a large number of log messages about requests to /pit-cleanup if there are a large number of existing commitlogs in the staging directory. (OPSC-8349)
  • Insufficient permissions on the staging directory can cause the agent to exhaust inotify watches on the system over time. (OPSC-10732)
  • Users will see an ungraceful error+stack trace in opscenterd.log if accessing a cluster through the UI/API that no longer exists. The error message contains "ERROR: Unhandled error in Deferred: There are no clusters with name or ID...". This error message is harmless. (OPSC-8819)
  • Enabling SNMP alerts may cause opscenterd to hang on startup in some slower environments. (OPSC-9314; see More Details)
  • Failure to follow the required prerequisite instructions to install Oracle Java SE Runtime Environment 8 (JRE or JDK) before installing OpsCenter 6.0 on Ubuntu 16.04 results in installation of OpenJDK 9, which is not currently supported. (OPSC-10778)
  • Kerberos authentication will not work when the rpc_address setting in cassandra.yaml is 0.0.0.0. The symptom of this issue is reflected in the Agent Status view, which shows the storage database as up but the monitored database as down according to the agent. (OPSC-11217)
  • For DSE versions 5.0 and later, object permissions currently are not persisted with an OpsCenter backup and thus are not re-applied when that backup is restored. As a result, users must manually manage object permissions externally from OpsCenter. For more details (no workaround available at this time), see the KB support article. (OPSC-11015)
  • (Applicable to OpsCenter version 6.0.10) When running DSE nodes that use two different network interfaces to separate client traffic from internode traffic, the OpsCenter agent will fail to establish a STOMP connection. For more details, please see the KB support article. (OPSC-13016)
  • Lifecycle Manager (LCM)
    • Under certain circumstances, OpsCenter Lifecycle Manager may fail to install java unless the OpsCenter version being used is at least 6.0.11 in the 6.0.x series. For more details, please see the KB support article. (OPSC-13332)
    • Lifecycle Manager is not currently compatible with DSE Configuration Encryption. See Encrypted DSE configuration values for more details. (OPSC-7529)
    • OPSC-8851, in 6.0.2, Improved resiliency of Lifecycle Manager in situations where there is high latency between the OpsCenter daemon and nodes in the cluster. This release improves upon the problem; however, there are still known issues in high latency scenarios that will be addressed in a future release. (OPSC-9853)
    • DSE Graph properties (DSE 5.0.1+ only): DSE Graph configuration in dse.yaml, which is configurable through LCM Config Profiles. All Graph properties in dse.yaml can be managed through the LCM UI with the exception of gremlin_server.serializers and gremlin_server.scriptEngines. If you are using LCM and need to customize these properties, be sure to leverage the LCM API to make the changes. Future changes to the Config Profile via the LCM UI will retain properties set through the API.
    • When configuring credentials in a Repository, special characters such as #, $, and so forth are supported, but non-ascii unicode characters are not. (OPSC-8921)

Known Issues Fixed in OpsCenter 6.0.8 

Known Issues Fixed in OpsCenter 6.0.6 

  • When modifying the Config Profile of an existing cluster in Lifecycle Manager, the Cluster Connection Settings in OpsCenter are now automatically updated after running a configure job. (OPSC-8544)
  • Fixed an issue where OpsCenter failed to retrieve the diagnostic tarballs from the agents if SSL was enabled between OpsCenter and the agents. (OPSC-10701)

Known Issues Fixed in OpsCenter 6.0.5 

  • Fixed LCM repository authentication bug when special characters exist in credentials (as with most DataStax Academy usernames). (OPSC-10817)

Known Issues Fixed in OpsCenter 6.0.3 

  • Any encrypted config values generated since Opscenter 6.0 (and prior to fix in 6.0.3) will need to be re-encrypted. (OPSC-10244).
  • Backups/Restores will not work with keyspace names longer than 32 characters on DSE 4.7 or 4.8. (OPSC-9563)
  • The Repair Service fails to auto-restart after a node is decommissioned. Manually starting the Repair Service resolves this issue. (OPSC-9244)
  • Some items in the Lifecycle Manager UI may not automatically update if they are modified outside of the current UI session; for example, via the API directly or in another UI session. If multiple users might be concurrently modifying the same cluster, please be sure to refresh the UI before making any changes. (OPSC-9306)

Known Issues Fixed in OpsCenter 6.0.2 

  • g1-gc-opts in cassandra-env.sh are not immediately editable when using G1 garbage collection by default. To workaround this issue, change the garbage collector to something other than G1 and back again. (OPSC-9556)
  • Max heap size in cassandra-env.sh is not editable using the LCM UI. To workaround this issue, users can set the -Xmx and -Xms JVM properties directly via additional-jvm-opts further down on the cassandra-env.sh section in the Config Properties page. (OPSC-9546)
  • The use_tls setting in email alerts does not currently work as expected. Users can still configure email alerts to work with TLS-enabled servers by setting use_ssl=1 and use_tls=0. Please contact DataStax Support if you have any issues. (OPSC-9451)
  • Automatic definition file updates are not dynamically reloaded for new versions of DSE. If you see an error for "Unsupported or invalid version of DSE" in the UI, try restarting opscenterd. (OPSC-9468)
  • Some users may see intermittent job failures with an IncompleteRead error (OPSC-8851; see More Details)
  • The LCM UI has some rendering issues in older versions of Safari (<=8). The workaround is to use a newer version of Safari or another supported browser. (OPSC-9123)

Known Issues Fixed in OpsCenter 6.0.1 

  • opscenterd fails to properly resolve relative symlinks to Java. (OPSC-9344; see More Details)
  • When installing an agent on a node for the first time, address.yaml is owned by the root user. The only OpsCenter functionality this affects directly is Automatic Failover, which will not work until ownership or permissions are updated. If the agent has previously been installed on the node, ownership is not affected. (OPSC-9336; see More Details)
  • S3 and Local FS backups fail for keyspaces leveraging the new Materialized Views feature in DSE 5.0. On Server backups are not affected. (OPSC-9328; see More Details)
  • Users must ensure tables that leverage the new User Defined Aggregates and User Defined Functions features in DSE 5.0 exist prior to running a restore. OpsCenter cannot automatically re-create these tables, but can successfully restore the data to existing tables. (OPSC-9261; see More Details)