DataStax Enterprise (DSE) recommended settings for Docker
To ensure your success when using Docker, follow the recommended guidance and settings for using DataStax Enterprise (DSE) with Docker.
Although DataStax provides the following guidance, adaptations of these instructions might be required depending on the deployment. It is highly recommended to rigorously test the use cases under consideration before deploying a DataStax installation on Docker in production environments. |
General guidance
DSE achieves resilience and high availability through a cluster of nodes that replicate data across the cluster. This replication ensures that if any individual node fails, access to data is not lost and performance is maintained. However, in a containerized environment, running multiple DSE nodes on the same physical hardware will introduce a single point of failure.
To avoid a single point of failure, only run a single DataStax container on a DSE cluster per Docker host. If running multiple DataStax containers on a single Docker host, ensure that the containers are in different DSE clusters. |
Software versions
The official DataStax images include the latest DataStax Agent version at the time of official image build. If you require a version of the DataStax Agent that differs from the one included with the official image, you must build an image that includes the required versions.
Hardware settings
- Docker container resource requirements
-
For minimum container resource requirements, follow the capacity planning guidance for selecting hardware for production environments.
- Optimizing SSDs
-
The default SSD configurations on most Linux distributions are not optimal. To ensure the best settings, see the recommended production settings to optimize SSDs:
- Optimizing settings for RAID on SSD
-
The optimum
readahead
setting for RAID on SSDs (in Amazon EC2) is 8 KB, the same as it is for non-RAID SSDs. For details, see Optimize SSDs.Optimizing RAID settings for spinning disks on the host
Typically, a
readahead
of 128 is recommended. Check to ensuresetra
is not set to 65536:sudo blockdev --report /dev/spinning_disk
To set setra:
sudo blockdev --setra 128 /dev/spinning_disk
System settings
- Synchronizing clocks
-
Because time is not namespaced in the Linux kernel, containers share the clock with the Docker host machine. Ensure that clocks are synchronized on the host machines and containers by configuring NTP or other methods on the host machines.
- Disabling swap
-
Swapping must be disabled for performance and node stability. Run the following command on the Docker host to disable swap. The Docker host passes this setting to the container.
sudo swapoff --all
To disable swap per container, see Preventing a container from using SWAP in the Docker documentation. To make this change permanent, remove all swap file entries from
/etc/fstab
.For more information, see the following:
- Disabling CPU frequency sequencing on the Docker host
-
To ensure optimal performance, don’t use governors that lower the CPU frequency. Instead, reconfigure all CPUs to use the
performance
governor on the Docker hosts.for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do [ -f $CPUFREQ ] || continue echo -n performance > $CPUFREQ done
For more information, see the following:
- Disabling THP on the Docker host
-
THP can cause performance issues in DSE when it defragments 4 K chunks into 2 MB chunks. To disable
defrag
, run the following command on the Docker host:echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
For more information, see the following:
- Increasing user resource limits
-
All containers by default inherit user limits from the Docker daemon. In production environments, DSE expects the following changes to
ulimit
:ulimit -n 100000 # nofile: max number of open files ulimit -l unlimited # memlock: maximum locked-in-memory address space
Run the following command to check the Docker daemon defaults for
ulimits
:docker run --rm ubuntu /bin/BASH -c 'ulimit -a'
To set
ulimit
for Docker containers, run thedocker run
command with the followingulimit
options:--ulimit nofile=100000:100000 --ulimit nproc=32768 --ulimit memlock=-1:-1
DSE tries to lock memory using
mlock
. When running in Docker, that capability is disabled. To enablemlock
, add the following option to thedocker run
command:--cap-add=IPC_LOCK
On the Docker host, check the value of
vm.max_map_count
, which should be set to 1048575:cat /proc/sys/vm/max_map_count
To set the value of
vm.max_map_count
, add the following line to/etc/sysctl.conf
:vm.max_map_count = 1048575
Then, run
sysctl -p
to propagate the changes.For more information, see the following:
- Configuring heap settings
-
For each container in production environments, explicitly set the JVM heap size using the
JVM_EXTRA_OPTS
environment variable with thedocker run
command.For example, to use 16 GB for the JVM heap, run the
docker run
command with the following option:docker run -e JVM_EXTRA_OPTS="-Xms16g -Xmx16g"
Storage and resource requirements
- Mounting configuration volumes
-
For advanced configuration management, DataStax provides a mechanism for modifying configurations without replacing or customizing DataStax Docker containers. When any of the approved configuration files are mounted to a host volume, the files are mapped automatically within the container. See Using the DSE configuration volume.
- Mapping node data to a local folder on the host
-
The DSE Docker container writes all node-specific data in the directories under
/var/lib/cassandra/
by default. To persist this data, map the data directories inside the container to a directory on the host file system using the-v
option with thedocker run
command, or by using a volume driver.For example, to mount the DSE data volume to the
/dse/data
directory on the Docker host, run thedocker run
command with the following option:docker run -v /dse/data:/var/lib/cassandra
Hosting the
/var/lib/cassandra
directory outside the container with the-v
option allows the container to be deleted and recreated without losing data. See Persisting data with volumes and directories. - Configuring storage drivers
-
If using the Docker
devicemapper
storage driver, do not use the defaultloop-lvm
mode, which is only appropriate for testing. Instead, configuredocker-engine
to use direct-lvm mode, which is suitable for production environments. - Resources allocated to Linux VM in Docker for Windows
-
When running Docker for Windows, the default resources allocated to the Linux VM running docker are 2 GB RAM and 2 CPUs. Adjust these resources as appropriate to meet the requirements for your containers. For more information, see the Docker documentation.
Host networking
Because the default network settings in Docker (through Linux bridge) slows networking considerably, don’t use these network settings in production environments.
Instead, use Docker host networking by adding the --network host
option to the docker run
command, or use a plugin that can manage IP ranges across clusters of hosts.
The host networking limits the number of nodes per Docker host to one, which is the recommended configuration to use in production.
docker run -d --network host --name container_name
Ports
Communication occurs on many different ports. Account for required communication and security for these ports when binding ports to the Docker host:
To allow remote hosts to access a DSE node, DSE OpsCenter, or DataStax Studio container, map the DSE public port to a host port using the docker run
command with the -p
option.
For example, to allow access to a DSE node from a browser on a remote host, open port 8888
:
docker run -e DS_LICENSE=accept --name my-opscenter -p 8888:8888 \
-d datastax/dse-opscenter
When mapping a container port to a local host port, ensure the host port is not already in use by another container or the host.
See also
-
Use Environment variables to change the configuration at runtime.
DSE uses the default values defined for the environment variables unless explicitly set at runtime. Custom configuration files override the default or explicitly set environment variables.
-
Use the DSE configuration volume to get configuration files from a mounted host directory without replacing or customizing configuration file in the container.
-
Data volumes can be mounted to persist data beyond the lifetime of a container.