General System/OS checks for DSE/Cassandra clusters

Disclaimer

This document provides general recommendations for DataStax Enterprise (DSE) and Apache Cassandra™. This document requires either basic DSE or Cassandra knowledge, or both. This document does not replace the official documentation.

Each use case can be different. The numbers provided here are not hard limits, but general recommendations to remain in a comfortable/safe zone. The numbers may need to be adjusted depending on your requirements.

Applying all the recommendations does not offer a strict guarantee on the cluster health; breaking a rule is not necessarily an issue for a specific use case.

This document does not cover troubleshooting of specific problems.

This document recommends a number of the system-level checks to execute when troubleshooting or planning a new setup of Cassandra or DSE clusters. DataStax documentation includes chapters on recommended production settings (DSE 6.8 \| 5.1). The DSE Troubleshooting guide includes a section on troubleshooting Linux-related issues and DataStax Support contains a knowledge base that should be checked for troubleshooting information. Most system-level recommendations are applicable to both DSE and Cassandra.

While it is possible to perform listed checks on individual nodes of the cluster, it may not be effective for multiple servers. In the latter case, it is easier to use diagnostic collection scripts that collect a lot of information that this document lists.

Cassandra, beginning with version 3.0, and DSE with Cassandra, beginning with version 5.0, perform some of the following checks on every system start.

You can identify any misconfiguration by looking in the logs for lines similar to the following:

Infrastructure checks

Check network

Check network-related errors. For example, ifconfig shows a number of packets with errors, and more. If you see a significant number (several percent of the total packets), for RX and TX rows:
- errors - total number of packets received with errors (CRC, overruns, and so on). This could indicate symptoms for malfunctioning network hardware, either server or elsewhere in the network.
- overruns - number of received packets that experienced FIFO overruns, which is caused by the rate at which a buffer gets full and the kernel is unable to empty. You may need to tune the TCP parameters of the Linux kernel, as described in [Check OS Messages].
- carrier (only for TX) - number of packets that have experienced loss of carriers. This could be a symptom for a flapping link.
Check network latency and throughput between nodes, and between client and cluster. You can do this with a ping (simple case), iperf, mtr, iftop, or other tools. Inside the datacenter network, latency should not be more than 1ms (millisecond).
Check connectivity between Cassandra/DSE nodes (DSE 6.8 | 6.0 | 5.1), OpsCenter (6.8 | 6.5 | 6.1) and clients on specific ports used for communication to make sure that firewall does not block communication. Use nc -zv host port to check if connections exist to specific ports.
Check that the Linux kernel is configured with optimal parameters (DSE 6.8 | 5.1) for networking.

Check disks

Check that SAN/NAS are not used for Cassandra data.
Check for disks type - spinning vs. SSD (for example, smartctl).
Check that no errors are reported by S.M.A.R.T. (Examples are using smartmontools or errors visible in the system logs).
Check if RAID is used. Do not use RAID-5, RAID-6, or their variants, such as RAID-50 or RAID-60, each of which has worse performance characteristics. Cassandra natively support JBOD configurations, although RAID-0, RAID-1, or RAID-10 can be used as well.
Check disk latencies using tools like iostat (or better, iostat-cli). Look for %iowait, avgrq-sz, avgqu-sz, await, r_await, and w_await.
- await shows the average time in ms spent by an I/O request computed from its very beginning to its end. High numbers (> 5ms) show that I/O system can’t cope with the required throughput.
- r_await and w_await both show where the bottleneck is in read/write operations.
- avgqu-sz shows average queue length of the requests issued to the given disk device. For SSD disks it should be smaller than 10. It could be higher for HDDs.
When using rotational disks, check that you are using separate disks for Cassandra data and commit logs. Be sure to separate them from the system disks.
Check that DSE Search data is placed on a separate disk (even on SSD).
Check that the disk settings for the Host OS (readahead/scheduler…) are set correctly, as defined for DSE 6.8 | 5.1.

By reducing the readahead setting, you can improve Cassandra performance.

These settings are different for SSDs and spinning disks.
For Cassandra, use a file system format that provides good performance, such as ext4 or xfs. These file systems provide support for big files and journaling, and perform better than other file systems such as ext2 and ext3.

Check CPUs

Use lscpu or check /proc/cpuinfo to determine the CPU model, number of cores, frequency, and other information.
Check that the CPU scaling governor for DSE (6.8 | 5.1) is set to performance and NUMA zone_reclaim_mode is disabled.
On the systems with multiple physical CPUs, check with lstopo or numactl to ensure that all PCI components are in NUMA mode.
Check /proc/interrupts for interrupts between the different CPUs, and that interrupts are not assigned to the same CPU. This can lead to overloading of a specific CPU. If necessary, tune the SMP affinity for specific interrupt requests.
Check CPU stats reported by iostat, vmstat, and so on. A lot of steal time or CPU-ready time suggests that your virtual machine is either over-allocated or that you have noisy neighbors (other services that are running on the same server as your virtual machine), or both.

Operating system-level checks

For DSE, check that you are using a supported operating system. It is very important to ensure that compatible components are used.

For Cassandra, no official list of supported platforms exists, but the requirements for DSE are the same as for Cassandra. You can use these settings as a base.

System configuration checks

Make sure that the recommended production settings (DSE 6.8 | 5.1) are applied:

Make sure that Hugepage defragmentation (DSE 6.8 |https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configRecommendedSettings.html#CheckJavaHugepagessettings[5.1]) is disabled. If Hugepage defragmentation is enabled, it can lead to unexpected pauses in garbage collection.
Check that Swap (DSE 6.8 | 5.1) is disabled.
Set system clocks to TSC when it is available:
Check that system clocks (DSE 6.8 | 5.1) are synchronized on all nodes in the Cassandra cluster and on all clients that use Cassandra.
Ensure that resource limits (DSE 6.8 | 5.1), such as the maximum number of open files, maximum number of processes, and memory lock limit are configured correctly in /etc/security/limits.conf or a related file:
1. Allow at least 100,000`open files (the `nofile parameter).
2. Allow at least 32,000 processes (the nproc parameter).
3. Set the address space (as parameter) and maximum locked-in-memory address space (the memlock parameter) to unlimited.
Set the maximum number of memory map areas per process (vm.max_map_count parameter in /etc/sysctl.conf) to 1048575 or a higher value.
Use newer versions of the Linux kernel. Kernels starting with version 3.13 have enhanced SSD support and provide better overall performance.

Make sure that the settings are persistent after a reboot. For example:

Linux kernel parameters are specified in /etc/sysctl.conf file and automatically loaded from this file on reboot
swap is disabled via /etc/fstab
resource limits are configured via /etc/security/limits.conf

Check OS messages

Search system messages for any entries about out-of-memory (OOM) killer activity, segfaults, disk problems, network issues (typically, TCP SYNC flood), and so on. Execute the following command to get system messages. The -T flag enforces a human-readable timestamp.

dmsg -T

See 10 tips about 'dmesg' command for Linux Geeks.

Then search for following lines:

For OOM killer: Out of memory: Killed process NNN …, and check if it was caused by a Cassandra/DSE process.
When process segfaults, which is shown as name[proc_id]: segfault at …, check if it was caused by a Cassandra or DSE process.
For TCP SYNC flood (TCP: Possible SYN flooding on port NNN, especially if the given port belongs to Cassandra or DSE), follow the instructions specific for your Linux distribution to solve the problem. For example, see RedHat).
For disk problems, search for strings containing disk_id: failed command… or disk_id: exception…, where disk_id could be ata7.00 or similar, depending on the disk type.

Permissions of the `/tmp` folder

The /tmp folder must be executable as it is used for extraction and loading of the native code. If the /tmp folder is not executable, then add the following two statements to the jvm.options file:

-Dio.netty.native.workdir=<path>
-Djna.tmpdir=<path>