Recommended production settings

DataStax recommends the following settings for using Hyper-Converged Database (HCD) in production environments.

Depending on your environment, some of the following settings might not persist after reboot. Check with your system administrator to ensure these settings are viable for your environment.

Java Virtual Machine

Hyper-Converged Database (HCD) is built to run on Java 11.

The Technology Compatibility Kit (TCK) for Java Standard Edition (Java SE) is a suite of tests that checks Java implementations for compliance with the JSR specification. The equivalent Java Compatibility Kit (JCK) checks for compliance of Java implementations based on OpenJDK.

Configure your operating system to use the latest build of a TCK- or JCK-certified versions of Java 11. For example, OpenJDK 11 and Oracle Java SE 11 (JRE or JDK).

HCD does not support earlier Java versions, for example Java 8 or 9 or later versions, for example Java 17 or 21.

Heap

The default JVM garbage collector for HCD is G1 GC.

If the heap size is not explicitly set, cassandra-env.sh automatically allocates between a quarter (¼) to half (½), capped at 8GB, to the JVM heap. However, G1 GC works best with larger heap sizes:

System memory	Recommended heap size	Applies to
32 GB	24 GB	Pure Cassandra OLTP workloads
64 GB	24 GB	Pure Cassandra OLTP workloads HCD with vector search
Greater than 64 GB	31 GB	HCD with vector search

System memory

Recommended heap size

Applies to

32 GB

24 GB

Pure Cassandra OLTP workloads

64 GB

24 GB

Pure Cassandra OLTP workloads

HCD with vector search

Greater than 64 GB

31 GB

HCD with vector search

Do not set the young generation size with -Xmn.

G1 GC dynamically adjusts the young generation size to meet the GC pause target -XX:MaxGCPauseMillis (JVM default is 200ms). A fixed young gen size overrides the GC pause target.

System clock

Use Network Time Protocol (NTP) to synchronize the clocks on all HCD nodes and application servers.

Synchronizing clocks is required because HCD overwrites a column only if there is another version whose timestamp is more recent, which can happen when machines are in different locations.

HCD timestamps are encoded as microseconds because UNIX Epoch time does not include timezone information. The timestamp for all writes in HCD is Universal Time Coordinated (UTC). DataStax recommends converting to local time only when generating output to be read by humans.

Make sure NTP is installed and operational to prevent clock-drift.

Linux distribution NTP package

Linux distribution	NTP package
Debian systems	Use `timedatectl` and `timesyncd` installed by default as part of `systemd`. Alternatively, use `chrony`.
RHEL-based systems	Use `chrony`.

Debian systems

Use timedatectl and timesyncd installed by default as part of systemd.

Alternatively, use chrony.

RHEL-based systems

Use chrony.

Kernel

Configure the following kernel parameters for optimal traffic and user limits.

Run the following command to view all current Linux kernel settings:

sudo sysctl -a

TCP settings

keepalive

During low traffic periods, a firewall configured with an idle connection timeout can close connections to local nodes and nodes in other data centers. To prevent connections between nodes from timing out, set the following network kernel settings:

Set the following TCP keepalive timeout values:

sudo sysctl -w \
net.ipv4.tcp_keepalive_time=60 \
net.ipv4.tcp_keepalive_probes=3 \
net.ipv4.tcp_keepalive_intvl=10

These values set the TCP keepalive timeout to 60 seconds with 3 probes, 10 seconds gap between each. The settings detect dead TCP connections after 90 seconds (60 + 10 + 10 + 10). When the additional traffic is negligible, it is safe to persist these TCP keepalive timeout settings. For more information, see streaming_keep_alive_period_in_secs.

In addition to the TCP keepalive settings, you can prevent reset connections during streaming by tuning the streaming_keep_alive_period_in_secs setting in cassandra.yaml.

Concurrent connections

Change the following settings to handle thousands of concurrent connections used by the database:

sudo sysctl -w \
net.core.rmem_max=16777216 \
net.core.wmem_max=16777216 \
net.core.rmem_default=16777216 \
net.core.wmem_default=16777216 \
net.core.optmem_max=40960 \
net.ipv4.tcp_rmem='4096 87380 16777216' \
net.ipv4.tcp_wmem='4096 65536 16777216'

Persist settings

To persist the kernel settings across server reboots, add the following values to the /etc/sysctl.conf file:

net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=40960
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

Load the settings using one of the following commands:

sudo sysctl -p /etc/sysctl.conf

sudo sysctl -p /etc/sysctl.d/*.conf

To confirm the user limits are applied to the HCD process, run the following command where <pid> is the process ID of the currently running HCD process:
```
cat /proc/<pid>/limits
```

Set user resource limits

Use the ulimit -a command to view the current limits. Although limits can also be temporarily set using this command, DataStax recommends making the changes permanent.

For Debian-based systems

Edit the /etc/pam.d/su file and uncomment the following line to enable the pam_limits.so module:

session    required   pam_limits.so

This change to the PAM configuration file ensures that the system reads the files in the /etc/security/limits.d directory.

If you run HCD as root, some Linux distributions (such as Ubuntu) require setting the limits for the root user explicitly instead of the cassandra user:

root - memlock unlimited
root - nofile 1048576
root - nproc 32768
root - as unlimited

For RHEL-based systems

Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:

cassandra_user - nproc 32768

For all systems

Add the following line to /etc/sysctl.conf:
```
vm.max_map_count = 1048575
```

Set the following limits for the HCD user (OS user cassandra) in /etc/security/limits.d/cassandra.conf:

cassandra - memlock unlimited
cassandra - nofile 1048576
cassandra - nproc 32768
cassandra - as unlimited

Reboot the server or run the following command to make all changes take effect:
```
sudo sysctl -p
```

Performance throttles

Disable the following settings, which can cause issues with performance.

CPU frequency scaling

Recent Linux systems include a feature called CPU frequency scaling or CPU speed scaling. This feature allows a server’s clock speed to be dynamically adjusted so that the server can run at lower clock speeds when the demand or load is low. This change reduces the server’s power consumption and heat output, which significantly impacts cooling costs. Unfortunately, this behavior has a detrimental effect on servers running HCD because throughput can be capped at a lower rate.

On most Linux systems, a CPUfreq governor manages the scaling of frequencies based on defined rules. The default ondemand governor switches the clock frequency to maximum when demand is high, and switches to the lowest frequency when the system is idle.

Do not use governors that lower the CPU frequency. To ensure optimal performance, reconfigure all CPUs to use the performance governor, which locks the frequency at maximum.

The performance governor will not switch frequencies, which means that power savings will be bypassed to always run at maximum throughput. On most systems, run the following command to set the governor:

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    [ -f $CPUFREQ ] || continue
    echo -n performance > $CPUFREQ
done

If this directory does not exist on your system, refer to one of the following pages based on your operating system:

Debian-based systems: cpufreq-set utility
RHEL-based systems: CPUfreq setup

For more information, see High server load and latency when CPU frequency scaling is enabled in the DataStax Help Center.

Zone reclaim mode

The Linux kernel can be inconsistent in enabling/disabling zone_reclaim_mode, which can result in odd performance problems.

To ensure that zone_reclaim_mode is disabled:

echo 0 > /proc/sys/vm/zone_reclaim_mode

Swap

Failure to disable swap entirely can severely lower performance. Because the database has multiple replicas and transparent failover, it is preferable for a replica to be killed immediately when memory is low rather than go into swap. This allows traffic to be immediately redirected to a functioning replica instead of continuing to hit the replica that has high latency due to swapping. If your system has a lot of DRAM, swapping still lowers performance significantly because the OS swaps out executable code so that more DRAM is available for caching disks.

If you insist on using swap, you can set vm.swappiness=1. This allows the kernel swap out the absolute least used parts.

sudo swapoff --all

To make this change permanent, remove all swap file entries from /etc/fstab.

Disk drives

The default disk configurations on most Linux distributions are not optimal. Follow these steps to optimize settings for your Solid State Drives (SSDs) or spinning disks.

Complete the optimization settings for either SSDs or spinning disks. Do not complete both procedures for either storage type.

Optimize SSDs

Complete the following steps to ensure the best settings for SSDs.

Ensure that the SysFS rotational flag is set to false (zero).

This overrides any detection by the operating system to ensure the drive is considered an SSD.
Apply the same rotational flag setting for any block devices created from SSD storage, such as mdarrays.

Determine your devices by running lsblk:

lsblk

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    253:0    0  32G  0 disk
|
|-sda1 253:1    0   8M  0 part
|-sda2 253:2    0  32G  0 part /

In this example, the current devices are sda1 and sda2.

Set the IO scheduler to either mq-deadline or none for each of the listed devices.

The none scheduler is the right choice when the target block device is an array of SSDs behind a high-end IO controller that performs IO optimization.
```
echo none > /sys/block/<device_name>/queue/scheduler
```
The mq-deadline scheduler optimizes requests to minimize IO latency. If in doubt, use the mq-deadline scheduler.
```
echo mq-deadline > /sys/block/<device_name>/queue/scheduler
```
Set the nr_requests value to indicate the maximum number of read and write requests that can be queued. For large machines, the recommended queue size is 128. Otherwise, set the queue size to 32. For example:
```
echo 128 > /sys/block/<device_name>/queue/nr_requests
```
Set the readahead value for the block device to 8 KB.

This setting tells the operating system not to read extra bytes, which can increase IO time and pollute the cache with bytes that weren’t requested by the user.

The recommended readahead setting for RAID on SSDs is the same as that for SSDs that are not in a RAID configuration.

Add the following lines to /etc/rc.local:
```
touch /var/lock/subsys/local
echo 0 > /sys/class/block/sda/queue/rotational
echo 8 > /sys/class/block/sda/queue/read_ahead_kb
```

Optimize spinning disks

Check to ensure readahead value is not set to 65536:
```
sudo blockdev --report /dev/<spinning_disk>
```
Set the readahead to 128, which is the recommended value:
```
sudo blockdev --setra 128 /dev/<spinning_disk>
```

Transparent Hugepages

Many modern Linux distributions ship with the Transparent Hugepages (THP) feature enabled by default.

When Linux uses THP, the kernel tries to allocate memory in large chunks (usually 2MB), rather than 4K. This allocation can improve performance by reducing the number of pages the CPU must track. However, some applications still allocate memory based on 4K pages, which can cause noticeable performance problems when Linux tries to defragment 2MB pages.

DataStax recommends disabling defrag for THP:

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

See the following references for more information:

Amy Tobey’s blog post TL;DR: Cassandra, Java, Huge Pages
RedHat bug report #879801 khugepaged eating 100% CPU

Recommended production settings

Java Virtual Machine

Heap

System clock

Kernel

TCP settings

keepalive

Concurrent connections

Persist settings

Set user resource limits

For Debian-based systems

For RHEL-based systems

For all systems

Performance throttles

CPU frequency scaling

Zone reclaim mode

Swap

Disk drives

Optimize SSDs

Optimize spinning disks

Transparent Hugepages

Was this helpful?

Give Feedback