Recommended settings for DataStax Enterprise (DSE) Docker containers

DataStax provides the following general recommendations for running DataStax Enterprise (DSE) Docker containers. You might need to adapt the settings for your use case or environment. Test these settings in isolation before applying them in production.

Container architecture for replication and high availability

DSE achieves resilience and high availability through a group of nodes that replicate data across the cluster. This replication ensures that if any individual node fails, access to data is not lost and performance is maintained. However, in a containerized environment, running multiple DSE nodes on the same physical hardware will introduce a single point of failure.

To avoid a single point of failure, only run a single DataStax container on a DSE cluster per Docker host. If running multiple DataStax containers on a single Docker host, ensure that the containers are in different DSE clusters.

Hardware and system settings

Docker container resource requirements

For minimum container resource requirements, follow the capacity planning guidance for selecting hardware for production environments.

Optimize disk settings

The default SSD configurations on most Linux distributions are not optimal. For recommended settings, see Optimize SSDs.

The optimum readahead setting for RAID on SSDs (in Amazon EC2) is 8 KB, the same as it is for non-RAID SSDs. To optimize RAID settings for spinning disks on the host, DataStax recommends readahead of 128 KB. For more information, see Optimize spinning disks.

Synchronize clocks

Because time is not namespaced in the Linux kernel, containers share the clock with the Docker host machine. Ensure that clocks are synchronized on the host machines and containers by configuring NTP or other methods on the host machines.

Disable swap

Swapping must be disabled for performance and node stability.

If you disable swap on the Docker host, the host passes that setting to each container. For more information and instructions, see Disable swap.

Alternatively, to disable swap for specific containers, see Preventing a container from using SWAP.

Disable CPU frequency scaling

To ensure optimal performance, don’t use governors that lower the CPU frequency. Instead, reconfigure all CPUs to use the performance governor on the Docker hosts. For more information and instructions, see Disable CPU frequency scaling.

Disable THP on the Docker host

Disable defrag on the Docker host to avoid performance issues caused by Transparent Hugepages (THP) defragmentation of 4K chunks into 2 MB chunks. For more information and instructions, see Check Java Hugepages settings.

Increase user resource limits

All containers by default inherit user limits from the Docker daemon. In production environments, DSE expects the following ulimit settings:

ulimit -n 100000 # nofile: max number of open files
ulimit -l unlimited # memlock: maximum locked-in-memory address space

To configure user resource limits for Docker containers, do the following:

Check the Docker daemon defaults for ulimits:

docker run --rm ubuntu /bin/BASH -c 'ulimit -a'

Configure ulimit when starting Docker containers by appending ulimit options to the docker run command:
```
--ulimit nofile=100000:100000 --ulimit nproc=32768 --ulimit memlock=-1:-1
```
Enable mlock by appending the --cap-add option to the docker run command:
```
--cap-add=IPC_LOCK
```
This is required because DSE tries to lock memory using mlock, but Docker disables memory lock by default.
On the Docker host, get the value of vm.max_map_count:
```
cat /proc/sys/vm/max_map_count
```
If it isn’t set to 1048575, add the following line to /etc/sysctl.conf:
```
vm.max_map_count = 1048575
```
Run sysctl -p to propagate the changes.

For more information, see Set user resource limits.

Configure heap settings

For each container in production environments, explicitly set the JVM heap size using the JVM_EXTRA_OPTS environment variable with the docker run command. For example, to use 16 GB for the JVM heap, use the following option:

docker run -e JVM_EXTRA_OPTS="-Xms16g -Xmx16g"

If not explicitly set, DSE sets the heap to 25 percent of the physical RAM of the Docker host, which is not optimal for performance and stability.

Host storage and resource recommendations

Mount volumes to load custom configuration files: Use the DSE configuration volume to load custom configuration files without creating a custom image.
Mount volumes to persist data: To avoid data loss when deleting and recreating containers, mount volumes to persist container data.

The DSE Docker container writes all node-specific data in the directories under /var/lib/cassandra/ by default. To persist this data, you must map the data directories inside the container to a directory on the host file system. To do this, you can use the -v option with the docker run command, or use a volume driver.
Docker storage driver mode: If you use the Docker devicemapper storage driver, don’t use the default loop-lvm mode, which is only appropriate for testing. Instead, configure docker-engine to use direct-lvm mode, which is suitable for production environments.
Docker host VM resources for macOS and Microsoft Windows: On macOS and Windows, the default resources allocated to the Linux VM that runs Docker are generally insufficient for running DataStax containers, particularly in production or with simulated production-level workloads. Adjust these resources as appropriate to meet the requirements for your containers. For more information, see the Docker documentation.

Host networking

Because the default network settings in Docker (through Linux bridge) slows networking considerably, don’t use the default network settings in production environments.

Instead, use Docker host networking. This limits the number of nodes per Docker host to one, which is the recommended configuration to use in production.

To enable Docker host networking, append the --network host option to the docker run command, or use a plugin that can manage IP ranges across clusters of hosts:

docker run -d --network host --name container_name

Ports

Communication occurs on many different ports. Account for required communication and security for DSE ports when binding ports to the Docker host.

To allow remote hosts to access a DSE, DSE OpsCenter, or DataStax Studio container, map the DSE public port to a host port using the -p option with the docker run command.

For example, to allow access to a DSE OpsCenter container from a browser on a remote host, open port 8888:

docker run -e DS_LICENSE=accept --name my-opscenter -p 8888:8888 \
-d datastax/dse-opscenter

When mapping a container port to a local host port, make sure the host port is not in use by another container or the host.