Recommended production settings

DataStax recommends the following settings for using DataStax Enterprise in production environments.

Depending on your environment, some of the following settings might not be persisted after reboot. Check with your system administrator to ensure they are viable for your environment.

Run the following command to view all current Linux kernel settings:

sudo sysctl -a

Use the Preflight check tool to run a collection of tests on a DataStax Enterprise node to detect and fix node configurations. The tool can detect and optionally fix many invalid or suboptimal configuration settings, such as user resource limits, swap, and disk settings.

Install the latest Java Virtual Machine

Configure your operating system to use the latest build of a Technology Compatibility Kit (TCK) Certified OpenJDK version 8. For example, OpenJDK 8 (1.8.0_151 minimum). Java 9 is not supported.

Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8. This change is due to the end of public updates for Oracle JRE/JDK 8.

See the installation instructions for your operating system:

Synchronize clocks

Synchronize the clocks on all nodes and application servers.

Use Network Time Protocol (NTP) to synchronize the clocks on all nodes and application servers.

Synchronizing clocks is required because DataStax Enterprise (DSE) overwrites a column only if there is another version whose timestamp is more recent, which can happen when machines are in different locations.

DSE timestamps are encoded as microseconds because UNIX Epoch time does not include timezone information. The timestamp for all writes in DSE is Universal Time Coordinated (UTC). DataStax recommends converting to local time only when generating output to be read by humans.

Install NTP for your operating system:

Operating system Command

Operating system	Command
Debian-based system	`sudo apt-get install ntpdate`
RHEL-based system¹	`sudo yum install ntpdate`
¹On RHEL 7 and later, chrony is the default network time protocol daemon. The configuration file for chrony is located in `/etc/chrony.conf` on these systems.

Debian-based system

sudo apt-get install ntpdate

RHEL-based system¹

sudo yum install ntpdate

¹On RHEL 7 and later, chrony is the default network time protocol daemon. The configuration file for chrony is located in /etc/chrony.conf on these systems.

Start the NTP service on all nodes:
```
sudo service ntp start -x
```
Run the ntupdate command to synchronize clocks:
```
sudo ntpdate 1.ro.pool.ntp.org
```
Verify that your NTP configuration is working:
```
ntpstat
```

Set kernel parameters

Configure the following kernel parameters for optimal traffic and user limits.

TCP settings

During low traffic intervals, a firewall configured with an idle connection timeout can close connections to local nodes and nodes in other data centers. To prevent connections between nodes from timing out, set the following network kernel settings:

Set the following TCP keepalive timeout values:
```
sudo sysctl -w \
net.ipv4.tcp_keepalive_time=60 \
net.ipv4.tcp_keepalive_probes=3 \
net.ipv4.tcp_keepalive_intvl=10
```
These values set the TCP keepalive timeout to 60 seconds with 3 probes, 10 seconds gap between each. The settings detect dead TCP connections after 90 seconds (60 + 10 + 10 + 10). The additional traffic is negligible, and permanently leaving these settings is not an issue. See Firewall idle connection timeout causes nodes to lose communication during low traffic times on Linux.

Change the following settings to handle thousands of concurrent connections used by the database:

sudo sysctl -w \
net.core.rmem_max=16777216 \
net.core.wmem_max=16777216 \
net.core.rmem_default=16777216 \
net.core.wmem_default=16777216 \
net.core.optmem_max=40960 \
net.ipv4.tcp_rmem='4096 87380 16777216' \
net.ipv4.tcp_wmem='4096 65536 16777216'

Instead of changing the system TCP settings, you can prevent reset connections during streaming by tuning the streaming_keep_alive_period_in_secs setting in cassandra.yaml.

The location of the cassandra.yaml file depends on the type of installation:

Installation Type Location

Installation Type	Location
Package installations + Installer-Services installations	`/etc/dse/cassandra/cassandra.yaml`
Tarball installations + Installer-No Services installations	`installation_location/resources/cassandra/conf/cassandra.yaml`

Package installations + Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations + Installer-No Services installations

installation_location/resources/cassandra/conf/cassandra.yaml

Set user resource limits

Use the ulimit -a command to view the current limits. Although limits can also be temporarily set using this command, DataStax recommends making the changes permanent.

For more information, see Recommended production settings.

Debian-based systems

Edit the /etc/pam.d/su file and uncomment the following line to enable the pam_limits.so module:
```
session    required   pam_limits.so
```
This change to the PAM configuration file ensures that the system reads the files in the /etc/security/limits.d directory.
If you run DSE as root, some Linux distributions (such as Ubuntu), require setting the limits for the root user explicitly instead of using cassandra_user:
```
root - memlock unlimited
root - nofile 1048576
root - nproc 32768
root - as unlimited
```

RHEL-based systems

Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:
```
cassandra_user - nproc 32768
```

All systems

Add the following line to /etc/sysctl.conf:
```
vm.max_map_count = 1048575
```
Open the configuration file for your installation type:

Installation type Configuration file

Tarball installation

/etc/security/limits.conf

Package installation

/etc/security/limits.d/cassandra.conf

Installation type	Configuration file
Tarball installation	`/etc/security/limits.conf`
Package installation	`/etc/security/limits.d/cassandra.conf`

Configure the following settings for the <cassandra_user> in the configuration file:

<cassandra_user> - memlock unlimited
<cassandra_user> - nofile 1048576
<cassandra_user> - nproc 32768
<cassandra_user> - as unlimited

Reboot the server or run the following command to make all changes take effect:
```
sudo sysctl -p
```

Persist updated settings

Add the following values to the /etc/sysctl.conf file:

net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=40960
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

Load the settings using one of the following commands:

sudo sysctl -p /etc/sysctl.conf

sudo sysctl -p /etc/sysctl.d/*.conf

To confirm the user limits are applied to the DSE process, run the following command where pid is the process ID of the currently running DSE process:
```
cat /proc/pid/limits
```

Disable settings that impact performance

Disable the following settings, which can cause issues with performance.

Disable CPU frequency scaling

Recent Linux systems include a feature called CPU frequency scaling or CPU speed scaling. This feature allows a server’s clock speed to be dynamically adjusted so that the server can run at lower clock speeds when the demand or load is low. This change reduces the server’s power consumption and heat output, which significantly impacts cooling costs. Unfortunately, this behavior has a detrimental effect on servers running DSE, because throughput can be capped at a lower rate.

On most Linux systems, a CPUfreq governor manages the scaling of frequencies based on defined rules. The default ondemand governor switches the clock frequency to maximum when demand is high, and switches to the lowest frequency when the system is idle.

Do not use governors that lower the CPU frequency. To ensure optimal performance, reconfigure all CPUs to use the performance governor, which locks the frequency at maximum.

The performance governor will not switch frequencies, which means that power savings will be bypassed to always run at maximum throughput. On most systems, run the following command to set the governor:

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    [ -f $CPUFREQ ] || continue
    echo -n performance > $CPUFREQ
done

If this directory does not exist on your system, refer to one of the following pages based on your operating system:

Debian-based systems: cpufreq-set command on Debian systems
RHEL-based systems: CPUfreq setup on RHEL systems

For more information, see High server load and latency when CPU frequency scaling is enabled in the DataStax Help Center.

Disable `zone_reclaim_mode` on NUMA systems

The Linux kernel can be inconsistent in enabling/disabling zone_reclaim_mode, which can result in odd performance problems.

To ensure that zone_reclaim_mode is disabled:

echo 0 > /proc/sys/vm/zone_reclaim_mode

For more information, see Peculiar Linux kernel performance problem on NUMA systems.

Disable swap

Failure to disable swap entirely can severely lower performance. Because the database has multiple replicas and transparent failover, it is preferable for a replica to be killed immediately when memory is low rather than go into swap. This allows traffic to be immediately redirected to a functioning replica instead of continuing to hit the replica that has high latency due to swapping. If your system has a lot of DRAM, swapping still lowers performance significantly because the OS swaps out executable code so that more DRAM is available for caching disks.

If you insist on using swap, you can set vm.swappiness=1. This allows the kernel swap out the absolute least used parts.

sudo swapoff --all

To make this change permanent, remove all swap file entries from /etc/fstab.

For more information, see Nodes seem to freeze after some period of time.

Optimize disk settings

The default disk configurations on most Linux distributions are not optimal. Follow these steps to optimize settings for your Solid State Drives (SSDs) or spinning disks.

Complete the optimization settings for either SSDs or spinning disks. Do not complete both procedures for either storage type.

Optimize SSDs

Complete the following steps to ensure the best settings for SSDs.

Ensure that the SysFS rotational flag is set to false (zero).

This overrides any detection by the operating system to ensure the drive is considered an SSD.
Apply the same rotational flag setting for any block devices created from SSD storage, such as mdarrays.

Determine your devices by running lsblk:

lsblk

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    253:0    0  32G  0 disk
|
|-sda1 253:1    0   8M  0 part
|-sda2 253:2    0  32G  0 part /

In this example, the current devices are sda1 and sda2.

Set the IO scheduler to either deadline or noop for each of the listed devices:

For example:
```
echo deadline > /sys/block/device_name/queue/scheduler
```
where device_name is the name of the device you want to apply settings for.
- The deadline scheduler optimizes requests to minimize IO latency. If in doubt, use the deadline scheduler.
  echo deadline > /sys/block/device_name/queue/scheduler
- The noop scheduler is the right choice when the target block device is an array of SSDs behind a high-end IO controller that performs IO optimization.
  echo noop > /sys/block/device_name/queue/scheduler

Set the nr_requests value to indicate the maximum number of read and write requests that can be queued:

Machine size	Value
Large machines	[source,bash] ---- echo 128 sys/block/device_name/queue/nr_requests ----
Small machines	[source,bash] ---- echo 32 sys/block/device_name/queue/nr_requests ----

Machine size

Value

Large machines

[source,bash] ---- echo 128 sys/block/device_name/queue/nr_requests ----

Small machines

[source,bash] ---- echo 32 sys/block/device_name/queue/nr_requests ----

Set the readahead value for the block device to 8 KB.

This setting tells the operating system not to read extra bytes, which can increase IO time and pollute the cache with bytes that weren’t requested by the user.

The recommended readahead setting for RAID on SSDs is the same as that for SSDs that are not being used in a RAID installation.
1. Open /etc/rc.local for editing.
2. Add the following lines to set the readahead on startup:
  touch /var/lock/subsys/local echo 0 > /sys/class/block/sda/queue/rotational echo 8 > /sys/class/block/sda/queue/read_ahead_kb
3. Save and close /etc/rc.local.

Optimize spinning disks

The readahead value specified with the blockdev command doesn’t persist after a reboot. DataStax recommends that you use a run-level init.d script to set readahead during the boot process.

Make sure readahead isn’t set to 65536:

sudo blockdev --report /dev/<spinning_disk>

Set readahead to the DataStax recommended value of 256 (512-byte sectors):
```
sudo blockdev --setra 256 /dev/<spinning_disk>
```

DataStax recommends a 128 KB readahead for spinning disks. To set the recommended readahead, you must first set the blockdev readahead to 256, as blockdev --setra sets the readahead in 512-byte sectors (see the blockdev utility). Use the following formula to calculate the actual readahead, in bytes:

RA * SSZ = readahead in bytes
256 * 512 B sector size = 131074 B / 1024 = 128 KB

Set the heap size for Java garbage collection

The location of the jvm.options file depends on the type of installation:

Installation Type Location

Installation Type	Location
Package installations + Installer-Services installations	`/etc/dse/cassandra/jvm.options`
Tarball installations + Installer-No Services installations	`installation_location/resources/cassandra/conf/jvm.options`

Package installations + Installer-Services installations

/etc/dse/cassandra/jvm.options

Tarball installations + Installer-No Services installations

installation_location/resources/cassandra/conf/jvm.options

The default JVM garbage collection (GC) for DSE 5.1 is G1.

DataStax does not recommend using G1 when using Java 7. This is due to a problem with class unloading in G1. In Java 7, PermGen fills up indefinitely until a full GC is performed.

Heap size is usually between ¼ and ½ of system memory. Do not devote all memory to heap because it is also used for offheap cache and file system cache.

See Tuning Java resources for more information on tuning the Java Virtual Machine (JVM).

If you want to use Concurrent-Mark-Sweep (CMS) garbage collection, contact the DataStax Services team for configuration help. Tuning Java resources provides details on circumstances where CMS is recommended, though using CMS requires time, expertise, and repeated testing to achieve optimal results.

The easiest way to determine the optimum heap size for your environment is:

Set the MAX_HEAP_SIZE in the jvm.options file to a high arbitrary value on a single node.
View the heap used by that node:
- Enable GC logging and check the logs to see trends.
- Use List view in OpsCenter.
Use the value for setting the heap size in the cluster.

This method decreases performance for the test node, but generally does not significantly reduce cluster performance.

If you don’t see improved performance, contact the DataStax Services team for additional help in tuning the JVM.

Check Java Hugepages settings

Many modern Linux distributions ship with the Transparent Hugepages feature enabled by default. When Linux uses Transparent Hugepages, the kernel tries to allocate memory in large chunks (usually 2MB), rather than 4K. This allocation can improve performance by reducing the number of pages the CPU must track. However, some applications still allocate memory based on 4K pages, which can cause noticeable performance problems when Linux tries to defragment 2MB pages.

For more information, see the Cassandra Java Huge Pages blog and this RedHat bug report.

To solve this problem, disable defrag for Transparent Hugepages:

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

For more information, including a temporary fix, see No DSE processing but high CPU usage.

Recommended production settings

Install the latest Java Virtual Machine

Synchronize clocks

Set kernel parameters

TCP settings

Set user resource limits

Debian-based systems

RHEL-based systems

All systems

Persist updated settings

Disable settings that impact performance

Disable CPU frequency scaling

Disable `zone_reclaim_mode` on NUMA systems

Disable swap

Optimize disk settings

Optimize SSDs

Optimize spinning disks

Set the heap size for Java garbage collection

Check Java Hugepages settings

Was this helpful?

Give Feedback