DataStax Enterprise recommended settings for Docker

Follow the recommended guidance and settings for using DataStax Enterprise (DSE) with Docker.

To ensure your success when using Docker, follow the recommended guidance and settings for using DataStax Enterprise (DSE) with Docker.

Important: Although DataStax provides the following guidance, adaptations of these instructions might be necessary depending on the deployment. It is highly recommended to rigorously test the use cases under consideration before deploying a DSE installation on Docker in production environments.

General guidance

DSE achieves resilience and high availability through a cluster of nodes that replicate data across the cluster. This replication ensures that if any individual node fails, access to data is not lost and performance is maintained. However, in a containerized environment, running multiple DSE nodes on the same physical hardware will introduce a single point of failure.
Important: To avoid a single point of failure, run only a single DataStax container on a DSE cluster per Docker host. If running multiple DataStax containers on a single Docker host, ensure that the containers are in different DSE clusters.

Software versions

DataStax Agent versions

The official DSE images include the latest DataStax Agent version at the time of official image build. If you require a version of the DataStax Agent that differs from the one included with the official image, you must build an image that includes the required versions.

Using specific DSE versions

The DSE version using the tag latest changes with each release. DataStax recommends using a specific DSE version in the docker run command to avoid mixing DSE versions.

For example, use the following command to use DSE 6.01:

docker run datastax/dse-server:6.0.1
Building images

To build an image, clone the DataStax Docker Github repository, make any necessary changes (such as versions), and follow the instructions in the repository for building the image using Gradle.

Hardware settings

Docker container resource requirements

For minimum container resource requirements, follow the guidance in Selecting hardware for DataStax Enterprise implementations for production environments.

Optimizing SSDs

The default SSD configurations on most Linux distributions are not optimal. To ensure the best settings, see the recommended production settings to optimize SSDs for DSE 6.7 | DSE 6.0 | DSE 5.1.

Optimizing settings for RAID on SSD

The optimum readahead setting for RAID on SSDs (in Amazon EC2) is 8 KB, the same as it is for non-RAID SSDs. For details, see Optimizing SSDs.

Optimizing RAID settings for spinning disks on the host

Typically, a readahead of 128 is recommended.

Check to ensure setra is not set to 65536:

sudo blockdev --report /dev/spinning_disk

To set setra:

sudo blockdev --setra 128 /dev/spinning_disk

System settings

Synchronizing clocks

Because time is not namespaced in the Linux kernel, containers share the clock with the Docker host machine. Ensure that clocks are synchronized on the host machines and containers by configuring NTP or other methods on the host machines.

Disabling swap

Swapping must be disabled for performance and node stability. Run the following command on the Docker host to disable swap. The Docker host passes this setting to the container.

See Disabling swap for more information.

sudo swapoff --all
Disabling CPU frequency sequencing on the Docker host

To ensure optimal performance, do not use governors that lower the CPU frequency. Instead, reconfigure all CPUs to use the performance governor on the Docker hosts.

See Disabling CPU frequency scaling for DSE 6.7 | DSE 6.0 | DSE 5.1.

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    [ -f $CPUFREQ ] || continue
    echo -n performance > $CPUFREQ
done
Disabling THP on the Docker host

THP can cause performance issues in DSE when it defragments 4 K chunks into 2 MB chunks. To disable defrag, run the following command on the Docker host:

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

See Checking the Linux Hugepage settings.

Increasing user resource limits

All containers by default inherit user limits from the Docker daemon. In production environments, DSE expects the following changes to ulimit:

ulimit -n 100000 # nofile: max number of open files
ulimit -l unlimited # memlock: maximum locked-in-memory address space
  1. Run the following command to check the Docker daemon defaults for ulimits:
    docker run --rm ubuntu /bin/bash -c 'ulimit -a'
  2. To set ulimit for Docker containers, run the docker run command with the following ulimit options:
    --ulimit nofile=100000:100000 --ulimit nproc=32768 --ulimit memlock=-1:-1

DSE tries to lock memory using mlock. When running in Docker, that capability is disabled. To enable mlock, add the following option to the docker run command:

--cap-add=IPC_LOCK

On the Docker host, check the value of vm.max_map_count, which should be set to 1048575.

cat /proc/sys/vm/max_map_count

To set the value of vm.max_map_count, add the following line to /etc/sysctl.conf, and then run sysctl -p to propagate the changes.

vi /etc/sysctl.conf
vm.max_map_count = 1048575
sudo sysctl -p

See Setting user resource limits.

Configuring heap settings

For each container in production environments, explicitly set the JVM heap size using the JVM_EXTRA_OPTS environment variable with the docker run command.

For example, to use 16 GB for the JVM heap, run the docker run command with the following option:

docker run -e JVM_EXTRA_OPTS="-Xms16g -Xmx16g"

Storage and resource requirements

Mounting configuration volumes

For advanced configuration management, DataStax provides a mechanism for modifying configurations without replacing or customizing DataStax Docker containers. When any of the approved configuration files are mounted to a host volume, the files are mapped automatically within the container. See Using the DSE configuration volume.

Mapping node data to a local folder on the host

The DSE Docker container writes all node-specific data in the directories under /var/lib/cassandra/ by default. To persist this data, map the data directories inside the container to a directory on the host file system using the -v option with the docker run command, or by using a volume driver.

For example, to mount the DSE data volume to the /dse/data directory on the Docker host, run the docker run command with the following option:
docker run -v /dse/data:/var/lib/cassandra

Hosting the /var/lib/cassandra directory outside the container with the -v option allows the container to be deleted and recreated without losing data. See Persisting data.

Configuring storage drivers

If using the Docker devicemapper storage driver, do not use the default loop-lvm mode, which is only appropriate for testing. Instead, configure docker-engine to use direct-lvm mode, which is suitable for production environments.

Network considerations

Configuring network settings

Because the default network settings in Docker (via Linux bridge) slows networking considerably, do not use these network settings in production environments. Instead, use docker host networking by adding the --network host option to the docker run command, or use a plugin that can manage IP ranges across clusters of hosts. The host networking limits the number of DSE nodes per Docker host to one, which is the recommended configuration to use in production.

docker run -d --network host --name container_name
Configuring ports

DSE communicates on many different ports. Account for these ports when binding ports to the Docker host.