Best Practice Rules Reference
Reference of available rules in the Best Practice Service.
Reference of available rules in the Best Practice Service organized in alphabetical order by each Advisor section.
Backup Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Auto Snapshot not enabled | Checks to make sure auto snapshot isn't turned off in production. | High | Node | Daily | Info | 
| Auto snapshot is not enabled and can lead to data loss on truncation or drop.
                  Please update your cassandra.yaml to enable auto_snapshotand prevent data loss.Tip: Use LCM Config Profiles to enable
                      auto_snapshot in the Snapshots
                    section of cassandra.yaml. The
                      auto_snapshot setting is enabled by default in LCM
                    config profiles. | |||||
| Commit Log Archiving Setting Enabled Consistency Note: This rule
                    is available in OpsCenter versions 6.1 and later. | Commit Log Archiving has been turned off due to inconsistent settings for all nodes in the cluster. | High | Node, Cluster | Hourly | Alert | 
| Commit Log Archiving is not enabled for all nodes within the cluster, which can result in data loss when performing a Point-in-Time restore. Turn Commit Log Archiving on again so that all nodes in the cluster have the enabled setting consistent for Commit Log Archiving. | 
Config Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| NodeSync Not Running Note: This rule is available in OpsCenter
                    versions 6.5 and later. | The NodeSync service is intended to be running on every node in the cluster. If any nodes are not running NodeSync, the data segments for which those nodes are replicas will not be validated and synchronized. | High | Node | Daily | Alert | 
| Ensure NodeSync is running on every node. Enable manually using nodetool nodesyncservice enable. See Enabling keyspaces and tables for monitoring NodeSync in OpsCenter. The NodeSync Service is enabled by default. If it was disabled for some reason, enable the NodeSync Service again. | |||||
| Repair service not enabled | Verifies that the repair service is enabled. | High | Cluster | Daily | Info | 
| Running regular repair ensures data consistency across a cluster. Enable the repair service. | |||||
| Repair service not configured correctly | Verifies that the repair service is configured correctly for your cluster. For more information, see basic, advanced, and expert repair configuration. | High | Cluster | Daily | Info | 
| It is recommended to enable the OpsCenter repair service to run within the
                  smallest gc_gracewindow configured on your
                    cluster. | |||||
| Security not enabled for DataStax agents | Checks that OpsCenter authentication is enabled in conjunction with SSL between daemon and agent. | High | Cluster | Daily | Alert | 
| Please enable SSL for communicating with agents. | |||||
| Swap space is enabled | Checks that you do not have swap space enabled on any node. Swap space should not be used in a production environment. | Medium | Node | Daily | Alert | 
| Please disable swap space. | |||||
| Seed node configuration | In each DC, there should be at least two seed nodes present, if there are at least two nodes present in the DC. IPs should be used rather than hostnames. All nodes should have the same seed list. | Low | Node, Cluster | Daily | Alert | 
| To correct this, please use the same seed list of IPs on all nodes. | 
Network Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Different Listen and RPC Addresses | Checks that if there are multiple network interfaces that Cassandra has been
                  configured to use separate networks for listen and rpc address. Note: When the
                       listen_addressfield in cassandra.yaml
                    file is left blank, OpsCenter agents default to the same listen address as DSE
                    in OpsCenter version 6.1.2 and later. | Medium | Node | Daily | Info | 
| Multiple networks have been detected but you are using the same network for client and internal customer communication. | 
OpsCenter Config Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| OpsCenter Failover Enabled | DataStax recommends configuring OpsCenter failover for high availability. | Low | OpsC | Daily | Alert | 
| There is no backup OpsCenter configured. Please enable failover for OpsCenter. | 
OS Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Clocks in cluster out of sync | Checks that clocks across the cluster are in sync within a 2 second tolerance. | High | Node, Cluster | Daily | Alert | 
| The total drift across cluster exceeds the tolerance of 2 seconds; please
                  sync clocks on your nodes. Warning: Clock drift can cause issues when
                    LCM attempts to generate SSL
                      certificates. Keeping clocks synchronized is critical to ensure
                    accurate timestamps for database operations and logging. | |||||
| Cassandra-user and agent-user match | Checks that cassandra and agent are run as the same user. | High | Node | Daily | Alert | 
| Cassandra and agent are not run as the same user. Please ensure that Cassandra and agent are run as the same user. | |||||
| Clocks in UTC | Checks that clocks across the nodes are in Coordinated Universal Time (UTC). | Low | Node | Daily | Alert | 
| All the nodes are not in Coordinated Universal Time (UTC). Please ensure that all nodes are in UTC. | |||||
| Require Oracle Java | Checks to make sure that Oracle Java is being used on the node. | Medium | Node | Daily | Alert | 
| Unsupported JDK is in use on the node. Oracle/Sun Hotspot JDK is the
                  preferred JDK to use and well-tested in DataStax Enterprise. Switch to Oracle
                  Hotspot JDK if you're currently using OpenJDK (as the default Java environment
                  coming from the Linux OS). Tip: Use LCM Config Profiles to manage Java
                  installations. | 
Performance Advisor
Rules for read and write to node performance (Performance Advisor not to be confused with
        the Performance Services).
    Tip: Use
            LCM Config Profiles to adjust request
          timeout settings in cassandra.yaml settings and run a configuration job.
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Read request timeout not optimal | Checks that the read request timeout on your nodes is not set above recommended values. | Medium | Node | Daily | Alert | 
| Significantly increasing the read request timeout on your nodes is not recommended. Please update cassandra.yaml on your nodes and lower the value of read_request_timeout_in_ms. | |||||
| Write request timeout not optimal | Checks that the write request timeout on your nodes is not set above recommended values. | ||||
| Significantly increasing the write request timeout on your nodes is not recommended. Please update cassandra.yaml on your nodes and lower the value of write_request_timeout_in_ms. | |||||
| Range request timeout not optimal | Checks that the range request timeout on your nodes is not set above recommended values. | Medium | Node | Daily | Alert | 
| Significantly increasing the range request timeout on your nodes is not recommended. Please update cassandra.yaml on your nodes and lower the value of range_request_timeout_in_ms. | 
Performance Service - Slow Queries Advisor
For more information, see Slow Queries in the Performance Service.
    | Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Use prepared statements | Prepared statements reduce the workload on the coordinator by removing the overhead of parsing the query. | Medium | Cluster | Hourly | Info | 
| Use prepared statements for your queries. | |||||
| Avoid ALLOW FILTERING | Checks that ALLOW FILTERING is not used in queries. | Medium | Cluster | Hourly | Info | 
| ALLOW FILTERING causes a query to scan all data within a token range, which might be desired with analytic workloads but is not recommended for non-analytic workloads. ALLOW FILTERING can cause long running queries and consume excessive system resources. If using ALLOW FILTERING outside of an analytics workload, please consider a new data model based on the query pattern instead. | |||||
| Avoid using large batches | Using large batches seems like an optimization but doing so puts extra load on the coordinator, which can cause hotspots in the cluster. Queries run faster after breaking large batches into individual queries and distributing them to different nodes. | Medium | Cluster | Hourly | Info | 
| Break the batches into individual queries and distribute them to different nodes. | |||||
| Use counter instead of count | A count(*) query can be expensive, even with smaller limits. | Medium | Cluster | Hourly | Info | 
| Replace the logic with a counter you maintain. | |||||
| Minimize keys in IN clause | Huge IN clauses give the impression of a singular query but the clauses actually execute as multiple queries. | Medium | Cluster | Hourly | Info | 
| Make individual async queries distributed amongst more coordinators. | 
Performance Service - Table Metrics Advisor
For more information, see Table Metrics in the Performance Service.
    | Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Wide partitions | Checks for excessively wide partitions. Excessively wide partitions have a negative impact on performance and are not recommended. A partition is considered to be wide when the size is greater than 100 MB. | Low | Node, Cluster | Hourly | Alert | 
| Excessively wide partitions have a negative impact on performance and are not recommended. Consider remodeling your data to break up wide partitions. | |||||
| Secondary indexes cardinality | Checks for secondary indexes with too many distinct values. | Low | Node, Cluster | Hourly | Alert | 
| High-cardinality secondary indexes can have a negative impact on system performance. Consider denormalizing the indexed data. | |||||
| Tombstone count | Number of tombstones processed during reads. | Low | Node, Cluster | Hourly | Alert | 
| Too many tombstones can cause a degradation of performance. This can even lead to query failures. | |||||
| Compaction Strategy | The compaction strategy you use should be based on your data and environment. This Best Practice rule is set to run so that you are aware of the importance of choosing a compaction strategy. If you have already chosen the correct compaction strategy based on your environment, please disable this rule if you do not want to see a reminder about compaction strategy again. | Low | Cluster | Hourly | Alert | 
| Choose the compaction strategy that best fits your data and environment. See Compaction strategies. | 
Performance Service - Thread Pools Advisor
For more information, see Thread Pool Statistics in the Performance Service.
    | Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Read Stage | Number of pending reads. | Low | Node | Hourly | Alert | 
| Too many pending reads, which could be related to disk problems, poor tuning,
                  or cluster overload. Consider adding new nodes, tuning the system, and revisiting
                  your data model. If not CPU or IO bound, try increasing concurrent_reads. | |||||
| Mutation Stage | Number of pending mutations. | Low | Node | Hourly | Alert | 
| Too many pending mutations; which could be related to disk problems, poor
                  tuning, or cluster overload. Please consider adding new nodes, tuning the system,
                  and revisiting your data model. If not CPU or IO bound, try increasing concurrent_writes. | |||||
| ReplicateOnWriteStage Stress | Be careful when using CL.ONE counter increments because it has an async task, which involves a read, kicked off to run after the increment is completed. Too many processes in this pool will begin to block writes. | Medium | Node | Hourly | Info | 
| Reduce the use of CL.ONE counter increments or upgrade to Cassandra 2.1 or higher. | 
Replication Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Replication factor out of bounds | Checks that your cluster does not have a replication factor higher than it can support. | Info | Cluster | Daily | Info | 
| Lists keyspaces that have a total RF higher than the number of nodes. Please update the replication factor for the appropriate keyspaces, or add additional nodes to your cluster. | |||||
| SimpleSnitch usage found | Checks to make sure SimpleSnitch isn't used in production. | Medium | Node | Daily | Info | 
| SimpleSnitch is not recommended for production clusters because it does not recognize datacenter or rack information. Please update the snitch to a topology-enabled snitch. | |||||
| SimpleStrategy keyspace usage found | Checks that you are not using SimpleStrategy for any keyspaces in a multi-datacenter environment. | Medium | Cluster | Daily | Alert | 
| Please update the replication strategies of the relevant keyspace(s) to use NetworkTopologyStrategy. | 
Search Advisor
Advice for Solr search nodes. For more information, see DSE Search.
    | Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Vnodes enabled on Search nodes | Checks that vnodes are not in use on DataStax Enterprise search nodes for version 4.8 and below, or checks that there are either 16 or 32 vnodes on DataStax Enterprise search nodes for version 5.0 and above. | High | Node | Daily | Alert | 
| Please replace the current search nodes that have vnodes enabled with nodes without vnodes for version 4.8 and below, or with nodes with the correct number of vnodes for version 5.0 and up. | |||||
| Search nodes enabled with bad autocommit | Checks to see if a running Solr node has autocommit within 5-10 seconds. | Medium | Cluster | Daily | Alert | 
| Please modify your autocommit threshold to within 5-10 seconds. | |||||
| Search nodes enabled with query result cache | Checks to see if a running Solr node has query result cache disabled. | Medium | Cluster | Daily | Alert | 
| Please modify your Solr config query to disable the queryResultCache. | |||||
| Search nodes with bad filter cache size | Checks to see if filter cache size is optimized for a running Solr node. | Medium | Cluster | Daily | Alert | 
| Please modify your filter cache sizeattribute to 128 if
                  using solr.LRUCache. Otherwise, if using solr.search.SolrFilterCache, modify thehighWaterMarkMBattribute to 256. | |||||
| Search nodes enabled with row cache | Checks to see if a Solr node has row cache enabled. | Medium | Node | Daily | Alert | 
| For optimizing memory use for DSE search with Solr, the row cache should be
                  disabled. Edit the cassandra.yaml file and disable the row
                    cache. Tip: If using LCM, adjust the value in the
                      Caches pane of cassandra.yaml in the
                    appropriate LCM Config
                      Profiles and run a configure
                      job. | |||||
| Search nodes have default key cache size | Checks to see if a Solr node has key cache set to default size. | Medium | Node | Daily | Alert | 
| For optimizing memory use for DSE search with Solr, the key cache size should
                  be set to its default size. Edit the cassandra.yaml file and
                  ensure the key cache size is set to the recommended default size. Tip: If using LCM, adjust the value in the
                      Caches pane of cassandra.yaml in the
                    appropriate LCM Config
                      Profiles and run a configure
                      job. | |||||
| Search nodes have improper heap size | Checks to see if a Solr node has enough heap space. | Medium | Node | Daily | Alert | 
| For optimizing memory use for DSE search with Solr, the heap should be set to at least 14GB. Set the Solr node max heap to at least 14GB. | 
Security Advisor
| Rule | Description/Recommendation | Importance | Scope | Interval (default) | Alert Level | 
|---|---|---|---|---|---|
| Security keyspace not properly replicated | Checks that the auth keyspace is replicated correctly when using PasswordAuthenticator. | High | Node, Cluster | Daily | Alert | 
| Please increase the replication of the system_authkeyspace. | |||||
| Security superuser has default setting | Checks that the default cassandra superuser and password has been changed from the default. | High | Cluster | Daily | Alert | 
| Security superuser has default setting. Please update the password for the
                  user 'cassandra'. Tip: Change the default password for the cassandra
                    user in the Edit Cluster dialog of LCM
                    for OpsCenter versions 6.5 and later. | |||||
| Improper Security authentication setting | Checks that the cassandra authentication is enabled and not set to AllowAllAuthenticator. | Medium | Node | Daily | Alert | 
| AllowAllAuthenticator performs no security checks and is not recommended.
                  Please update cassandra.yaml on your nodes and change
                  authenticator from org.apache.cassandra.auth.AllowAllAuthenticator to
                    org.apache.cassandra.auth.PasswordAuthenticator. Tip: Change the
                    authenticator in the Security pane of
                      casssandra.yaml in the appropriate LCM Config
                  Profiles. | |||||
| Incorrect OpsCenter authentication setting | Checks that the OpsCenter authentication is not set to the default if you are using DatastaxEnterpriseAuth. | High | Cluster | Daily | Alert | 
| Please change the default password of the admin user for OpsCenter authentication. | |||||
| Sensitive Config Value Encryption | It is recommended to enable encryption of sensitive config values in cassandra.yaml. | Medium | Node | Daily | Info | 
| Config value encryption is not enabled. The rule failed on the following
                  nodes: listed failed nodes. In dse.yaml, set  config_encryption_activeto true and use
                    dsetool encryptconfigvalue to create encrypted config values for the sensitive
                    fields. For more information, see config_encryption_active and Transparent data encryption.Tip: If using LCM, adjust the
                        dse.yaml in the Encryption
                        settings pane of the appropriate LCM Config Profiles. | 
