Capacity planning and hardware selection for Apache Cassandra implementations

General guidelines

Follow these guidelines when choosing hardware for your Apache Cassandra™ database:

  • Hardware choices depends on your particular use case. The right balance of CPUs, memory, disks, number of nodes, and network are vastly different for environments with static data that are accessed infrequently than for volatile data that is accessed frequently.

  • The suggested guidelines are the minimum required. You may need to increase CPU capacity, memory, and disk space beyond the recommended minimums.

  • Be sure to read Anti-patterns for important information about SAN storage, NAS devices, and NFS.

  • Thoroughly test your configuration before deployment.

In addition, for optimal performance in Linux environments, DataStax recommends using the most recent version of the Linux operating system. More recent versions of Linux handle highly concurrent workloads more efficiently.

Memory

The more memory a Cassandra node has, the better its read performance. More RAM also allows memory tables (memtables) to hold more recently written data. Larger memtables lead to a fewer number of SSTables being flushed to disk, more data being held in the operating system page cache, and fewer files being scanned from disk during a read operation. The ideal amount of RAM depends on the anticipated size of your hot data.

Environment type System memory Heap

Cassandra production

32 GB

8 GB

Development (non-load testing)

8 GB

4 GB

16 GB

8 GB

CPUs

Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound. All writes go to the commit log, but the database is so efficient in writing that the CPU is the limiting factor. The DataStaxCassandra database is highly concurrent and uses as many CPU cores as available. Recommendations:

  • Minimum dedicated hardware per node for production: 16-core CPU processors (logical).

  • Dedicated hardware in development in non-loading testing environments: 2-core CPU processors (logical) are sufficient.

Disk space

Disk space depends on usage, so it’s important to understand the mechanism. The database writes data to disk when appending data to the commit log for durability and when flushing memtables to SSTable data files for persistent storage. The commit log has a different access pattern (read/writes ratio) than the pattern for reading data from SSTables. This is more important for spinning disks than for Solid State Drives (SSDs).

SSTables are periodically compacted. Compaction improves performance by merging and rewriting data and discarding old data. However, depending on the type of compaction and size of the compactions, disk utilization and data directory volume temporarily increase during compaction. For this reason, be sure to leave an adequate amount of free disk space available on a node.

The following table provides guidelines for the minimum disk space requirements based on the compaction strategy:

DateTieredCompactionStrategy (DTCS) is deprecated in Cassandra 3.0.8/3.8.

Compaction strategy Minimum requirements

SizeTieredCompactionStrategy (STCS)

Sum of all the SSTables compacting must be smaller than the remaining disk space.

Worst-case: 50% of free disk space. This scenario can occur in a manual compaction where all SSTables are merged into one large SSTable.

LeveledCompactionStrategy (LCS)

Generally 10%.

Worst-case: 50% when the Level 0 backlog exceeds 32 SSTables (LCS uses STCS for Level 0).

TimeWindowCompactionStrategy (TWCS)

TWCS requirements are similar to STCS. TWCS requires approximately 50% extra disk space for the total size of SSTables in the last created bucket. To ensure adequate disk space, determine the size of the largest bucket ever generated and add 50% extra disk space.

Additional Cassandra resources:

Estimating usable disk capacity

To estimate how much data your nodes can hold, calculate the usable disk capacity per node and then multiply that by the number of nodes in your cluster. For a production cluster, DataStax recommends separating the commit log and the data directories onto different disks.

  1. Start with the raw capacity of the physical disks:

    raw_capacity = disk_size * number_of_data_disks
  2. Calculate the usable disk space accounting for file system formatting overhead (roughly 10 percent):

    formatted_disk_space = (raw_capacity * 0.9)
  3. Calculate the recommended working disk capacity:

    usable_disk_space = formatted_disk_space * (0.5 to 0.8)

During normal operations, the database routinely requires disk capacity for compaction and repair operations. For optimal performance and cluster health, DataStax recommends not filling your disks to capacity, but running at 50% to 80% capacity. See table compaction options (CQL 3.x | CQL 3.3 | CQL 3.1) and size of the compactions.

Minimum disk space recommendations

Test thoroughly before deploying to production.

DataStax highly recommends testing with the NoSQLBench or the cassandra-stress tool (4.x | 3.x | 3.0 | 2.2 | 2.1) at your desired configuration. Be sure to test common administrative operations, such as bootstrap, repair, and failure, to make certain your hardware selections meet your business needs. See testing your cluster before production.

  • Capacity per node (node density)

    Node capacity is highly dependent on the environment. Determining node density depends on many factors, including:

    • Data frequency change and access frequency.

    • Using HDDs or SSDs.

    • Storage speed and whether the storage is local.

    • SLAs (service-level agreements) and ability to handle outages.

    • Data compression.

    • Compaction strategy: choice of compaction strategy (3.x | 3.0 | 2.2 | 2.1) depends of whether the workload is write-intensive or read-intensive or time dependent. See Disk space above.

    • Network performance: remote links can limit storage bandwidth and increase latency.

    • Replication factor: See About data distribution and replication (3.x | 3.0 | 2.2 | 2.1). To avoid problems, DataStax recommends keeping data per node near or below 1TB. Exceeding this value has the following effects:

    • Extremely long times (days) for bootstrapping new nodes.

    • Impacts maintenance (day-to-day operations), such as recovering, adding, and replacing nodes.

    • Reduces efficiency when running repairs.

    • Significantly extends the time it takes to expand datacenters.

    • Substantially increases compactions per node. Higher capacity nodes works best with static data and low access rates.

      Additional data density is possible. Contact the DataStax Services team or the DataStax Luna team to determine if the workload and hardware being used is appropriate for higher densities.

  • Capacity and I/O

    When choosing disks for your nodes, consider both capacity (how much data you plan to store) and I/O (the write/read throughput rate). Some workloads are best served by using less expensive SATA disks and scaling disk capacity and I/O by adding more nodes (with more RAM).

  • Number of disks - HDD

    DataStax recommends using at least two disks per node; one for the commit log and the other for the data directories. At a minimum, the commit log should be on its own partition.

  • Commit log disk - HDD

    The hard disk drive does not need to be large, but it should be fast enough to receive all of the write operations as appends (sequential I/O).

  • Commit log disk - SSD

    Unlike spinning disks, there is less of a penalty for sharing commit logs and data directories on a SSD than there is on a HDD. DataStax recommends separating commit logs and data for highest performance and resiliency.

  • Data disks

    Use one or more disks per node. Make sure they are large enough for the data volume and fast enough to satisfy reads that are not cached in memory and to keep up with compaction.

  • RAID on data disks

    It is generally not necessary to use RAID for the following reasons:

    • Data is replicated across the cluster based on the replication factor that is chosen.

    • Cassandra includes a JBOD (Just a Bunch Of Disks) feature for disk management. The database responds according to your availability and/or consistency requirements to a disk failure. The database either stops the affected node or blacklists the failed drive. This feature allows you to deploy nodes with large disk arrays without the overhead of RAID 10. You can configure the database to stop the affected node or blacklist the drive according to your availability and/or consistency requirements. See also "Recovering from a single disk failure using JBOD" (3.x | 3.0 | 2.2 | 2.1).

  • RAID on the commit log disk

    Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need extra redundancy, use RAID 1.

  • Extended File Systems

    DataStax recommends deploying on XFS or ext4. On ext2 or ext3, the maximum file size is 2TB even using a 64-bit kernel. On ext4 the filesystem limitations is 16TB and not to be confused with Cassandra disk limits.

    Because the database can use almost half of your disk space for a single file when using SizeTieredCompactionStrategy (STCS), use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and are essentially unlimited on 64-bit.

Estimating partition size

For efficient operation, partitions must be sized within certain limits. Two measures of a partition’s size are the number of values in a partition and the partition size on disk. The practical limit of cells per partition is 2 billion. Sizing the disk space is more complex, and involves the number of rows and the number of columns, primary key columns and static columns in each table. Each application has different efficiency parameters, but a good rule-of-thumb is to keep the number of rows per partition below 100,000 items and the partition size under 100 MB.

Network

Minimum recommended bandwidth: 1000 Mb/s (gigabit).

A distributed data store puts load on the network to handle read and write requests and replication of data across nodes. Be sure that your network can handle inter-node traffic without bottlenecks. DataStax recommends binding your interfaces to separate Network Interface Cards (NIC). You can use public or private NICs depending on your requirements.

The database efficiently routes requests to replicas that are geographically closest to the coordinator node and chooses a replica in the same rack when possible. The database always chooses replicas located in the same datacenter over replicas in a remote datacenter.

Firewall

If using a firewall, make sure that nodes within a cluster can communicate with each other.

The following ports must be open to allow bi-directional communication between nodes. Configure the firewall running on nodes in your cluster accordingly. Without the following open ports, nodes act as a standalone database server and do not join the cluster.

Public Port
Port number Description

22

SSH port

Inter-node Ports
Port number Description

7000

Inter-node cluster communication

7001

SSL inter-node cluster communication

7199

JMX monitoring port

Client Ports
Port number Description

9042

Client port

9160

Client port (Thrift)

9142

Default for native_transport_port_ssl (3.x | 3.0), useful when both encrypted and unencrypted connections are required

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com