General System/OS checks for DSE/Cassandra clusters

General System and OS checks.

Disclaimer

This document gives a general recommendation for DataStax Enterprise (DSE) and Cassandra and requires basic DSE/Cassandra knowledge. It doesn’t replace the official documentation.

Each use case can be different. The numbers provided here are not hard limits, but general recommendations to remain in a comfortable/safe zone; they may need to be adjusted depending your requirements.

Applying all the recommendations doesn’t offer a strict guarantee on the cluster health; breaking a rule isn’t necessarily an issue for a specific use case.

This document doesn’t cover troubleshooting of specific problems.

This document recommends a number of the system-level checks to execute when troubleshooting or planning a new setup of Cassandra or DSE clusters. DataStax documentation includes chapters on recommended production settings (DSE 5.1 | 6.0 | 6.7 | 6.8). The DSE Troubleshooting guide includes a section on troubleshooting Linux-related issues and DataStax Support contains a knowledge base that should be checked for troubleshooting information Most system-level recommendations are applicable to both DSE and Cassandra.

While it’s possible to perform listed checks on individual nodes of the cluster, it may not be effective for multiple servers. In this case, it's easier to use diagnostic collection scripts that collect a lot of information listed in this document.

Please note that Cassandra, (since version 3.0 and DSE since 5.0) performs some of the checks described below on every start. You can identify any misconfiguration by looking to the logs for lines like this:



Infrastructure checks

Check network

  • Check network-related errors. For example, ifconfig shows a number of packets with errors and more. If you see significant number (several percent of the total packets) for RX and TX rows:
    • errors - total number of packets received with errors (CRC, overruns, and so on). This could indicate symptoms for malfunctioning network hardware, either server or elsewhere in the network.
    • overruns - number of received packets that experienced FIFO overruns, which is caused by the rate at which a buffer gets full and the kernel is unable to empty. You may need to tune TCP parameters of Linux kernel, as described below.
    • carrier (only for TX) - number of packets that have experienced loss of carriers. This could be a symptom for a flapping link.
  • Check network latency and throughput between nodes, between client and cluster. You can do this with a ping (simple case), iperf, mtr, iftop, or other tools. Inside the datacenter network, latency shouldn’t be more than 1ms (millisecond).
  • Check connectivity between Cassandra/DSE nodes (DSE 5.1 | 6.0 | 6.7 | 6.8), OpsCenter (6.1 | 6.5 | 6.7 | 6.8) and clients on specific ports used for communication to make sure that firewall doesn’t block communication. Use nc -zv host port to check if connections exist to specific ports.
  • Check that Linux kernel is configured with optimal parameters (DSE 5.1 | 6.0 | 6.7 | 6.8) for networking.

Check disks

  • Check that SAN/NAS are not used for Cassandra data.
  • Check for disks type - spinning vs SSD (for example smartctl).
  • Check that no errors are reported by S.M.A.R.T. (for example, using smartmontools) or visible in the system logs
  • Check if RAID is used. Don’t use RAID-5, RAID-6, or their variants, such as RAID-50/RAID-60, which has worse performance characteristics. Cassandra natively support JBOD configurations, although RAID-0, RAID-1, or RAID-10 can be used as well.
  • Check disk latencies using tools like iostat (or better iostat-cli). Look for %iowait, avgrq-sz, avgqu-sz, await, r_await, and w_await.
    • await shows the average time in ms spent by an I/O request computed from its very beginning to its end. High numbers (> 5ms) show that I/O system can’t cope with the required throughput.
    • r_await and w_await shows where the bottleneck is in read/write operations.
    • avgqu-sz shows average queue length of the requests issued to the given disk device. For SSD disks it should be smaller than 10; it could be higher for HDDs.
  • When using rotational disks, check that you’re using separate disks for Cassandra data and commit logs. Be sure to separate them from the system disks.
  • Check that DSE Search data is placed on a separate disk (even on SSD).
  • Check that disk settings for DSE/Cassandra (DSE 5.1 | 6.0 | 6.7 | 6.8) data (readahead/scheduler…) are set correctly. It's important to note that these settings may be different for SSDs and spinning disks. The readahead setting is very important for Cassandra’s performance, as lower than default numbers will prevent from reading data that are not necessary used.
  • For Cassandra, use a filesystem format that provides good performance, such as, ext4, or xfs. These filesystems provide support for big files and journaling, and perform better compared to other filesystems, such as ext2 and ext3.

Check CPUs

  • Use lscpu or check /proc/cpuinfo to determine the CPU model, number of cores, frequency, and other information.
  • Check that CPU scaling governor (DSE 5.1 | 6.0 | 6.7 | 6.8) is set to performance and NUMA zone_reclaim_mode is disabled.
  • On the systems with multiple physical CPUs, check with lstopo or numactl to ensure that all PCI components are in NUMA mode.
  • Check /proc/interruptsfor interrupts between the different CPUs, and not assigned to the same one. This can lead to overloading of specific CPU. If necessary, tune the SMP affinity for specific interrupt requests.
  • Check CPU stats reported by iostat, vmstat, and so on. A lot of steal time or CPU ready time suggests that your virtual machine is over allocated and/or that you have noisy neighbors (other services that are running on the same server as your virtual machine).

Operating system-level checks

For DSE1, check that you’re using a supported operating system. It’s very important to ensure that compatible components are used.

System configuration checks

Make sure that the recommended production settings (DSE 5.1 | 6.0 | 6.7 | 6.8) are applied:
  1. Make sure that Hugepage defragmentation (DSE 5.1 | 6.0 | 6.7 |6.8) is disabled; if not, it can lead to unexpected pauses in garbage collection.
  2. Check that Swap (DSE 5.1 | 6.0 | 6.7 | 6.8) is disabled.
  3. Set system clocks to TSC when it's available:

  4. Check that system clocks (DSE 5.1 | 6.0 | 6.7 | 6.8) are synchronized on all nodes in the Cassandra cluster and on all clients that use Cassandra.
  5. Ensure that resource limits (DSE 5.1 | 6.0 | 6.7 | 6.8), such as the maximum number of open files, maximum number of processes, and memory lock limit are configured correctly in /etc/security/limits.conf or related file:
    1. Allow at least 100,000 open files (the nofile parameter).
    2. Allow at least 32,000 processes (the nproc parameter).
    3. Set the address space (as parameter) and maximum locked-in-memory address space (the memlock parameter) to unlimited.
  6. Set the maximum number of memory map areas per process (vm.max_map_count parameter in /etc/sysctl.conf) to 1048575 or higher value.
  7. Use newer versions of Linux kernel. Kernels greater or equal to 3.13 have enhanced SSD support and provide better overall performance.

Make sure that the settings are persistent after a reboot. For example, Linux kernel parameters are specified in /etc/sysctl.conf file and automatically loaded from this file on reboot, swap is disabled via /etc/fstab, and resource limits are configured via /etc/security/limits.conf.

1For Cassandra, no official list of supported platforms exists, but the requirements for DSE are the same as for Cassandra. You can use these settings as a base.

Check OS messages

Search system messages for any entries about out-of-memory (OOM) killer activity, segfaults, disk problems, network issues (typically, TCP SYNC flood), and so on. Execute the following command to get system messages. The -T flag enforces human-readable timestamp.

Then search for following lines:
  • For OOM killer: Out of memory: Killed process NNN …, and check if it was caused by a Cassandra/DSE process.
  • When process segfaults, which is shown as name[proc_id]: segfault at …, check if it was caused by Cassandra/DSE process.
  • For TCP SYNC flood (TCP: Possible SYN flooding on port NNN, especially if the given port belongs to Cassandra or DSE), follow the instructions specific for your Linux distribution to solve the problem. For example, see RedHat).
  • For disk problems, search for strings containing disk_id: failed command... or disk_id: exception…, where disk_id could be ata7.00, or similar, depending on the disk type.

Permissions of the /tmp folder

The /tmp folder must be executable as it’s used for extraction and loading of the native code. If not, add -Dio.netty.native.workdir=<path> and -Djna.tmpdir=<path> to the jvm.options file.