Choosing the appropriate hardware for Apache Cassandra depends on selecting the right balance of memory, CPU, disks, number of nodes, and network.
Choosing the appropriate hardware for Apache Cassandra depends on your use case. The right balance of memory, CPU, disks, number of nodes, and network are vastly different for environments with static data that are accessed infrequently than for volatile data that is accessed frequently. Be sure to read Anti-patterns in Apache Cassandra™ for important information about SAN storage, NAS devices, and NFS.
For testing use the cassandra-stress tool at your desired configuration. Be sure to test common administrative operations, such as bootstrap, repair, and failure, to make certain your hardware selections meet your business needs. See Testing your cluster before production.
The more memory a Cassandra node has, the better read performance. More RAM also allows memory tables (memtables) to hold more recently written data. Larger memtables lead to a fewer number of SSTables being flushed to disk, more data held in the operating system page cache, and fewer files to scan from disk during a read. The ideal amount of RAM depends on the anticipated size of your hot data.
For both dedicated hardware and virtual environments, the recommended memory is
Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound. (All writes go to the commit log, but Cassandra is so efficient in writing that the CPU is the limiting factor.) Cassandra is highly concurrent and uses as many CPU cores as available. Recommendations:
- Dedicated hardware in production environments: 16-core CPU processors are the current price-performance sweet spot.
- Dedicated hardware in development in non-loading testing environments: 2-core CPU processors are sufficient.
SSDs are recommended for Cassandra nodes. The NAND Flash chips that power SSDs provide extremely low-latency response times for random reads while supplying ample sequential write performance for compaction operations. In recent years, drive manufacturers have improved overall endurance, usually in conjunction with spare (unexposed) capacity. Additionally, because PBW/DWPD ratings are probabilistic estimates based on worst case scenarios, such as random write workloads, and because Cassandra does only large sequential writes, drives significantly exceed their endurance ratings. However, it is important to plan for drive failures and have spares available. A large variety of SSDs are available from server vendors and third-party drive manufacturers.
- If drives are quickly available, buy the cheapest drives that provide the performance you want.
- If it is more challenging to swap the drives, consider higher endurance models, possibly starting in the mid range, and then choose replacements of higher or lower endurance based on the failure rates of the initial model chosen.
For additional help in determining the most cost-effective option for a given deployment and workload, Consult the Cassandra community for more information.
Disk space depends on usage, so it's important to understand the mechanism. Cassandra writes data to disk when appending data to the commitlog for durability and when flushing memtables to SSTable data files for persistent storage. The commit log has a different access pattern (read/writes ratio) than the pattern for accessing data from SSTables. This is more important for spinning disks than for SSDs.
SSTables are periodically compacted. Compaction improves performance by merging and rewriting data and discarding old data. However, depending on the type of compaction and size of the compactions, during compaction disk utilization and data directory volume temporarily increases. For this reason, be sure to leave an adequate amount of free disk space available on a node.
|SizeTieredCompactionStrategy (STCS)||Sum of all the SSTables compacting must be smaller than the remaining disk
Note: Worst case: 50% of free disk space. This scenario can occur in a manual compaction where all SSTables are merged into one giant SSTable.
|LeveledCompactionStrategy (LCS)||Generally 10%. Worse case: 50% if the Level 0 backlog exceeds 32 SSTables (LCS uses STCS for Level 0).|
|DateTieredCompactionStrategy (DSCS)||Amount of space equal to the amount of data that the system writes during the time period specified by the table's max_window_size_seconds property.|
- Capacity per node (node density)
- Determining node density depends on many factors. Some factors to consider are:
- Use case: for example, whether the data changes frequently or infrequently and how often is it accessed.
- Compaction strategy: choice of compaction strategy depends of whether the workload is write- or read-intensive or time dependent. See Disk space above.
- Network performance: remote links can be a limiting factor for many use cases and expected bandwidth.
- Storage speed and whether the storage is local or not.
- Replication factor: See Data distribution.
- SLAs and ability to handle outages.
With SSDs, you can use a maximum of 3 to 5 TB per node of disk space for uncompressed data depending on the SLA (service-level agreement), access frequency, and frequency of data updates. However, using this much capacity is dependant on the environment and works best with static data and low access rates. Regardless of the environment, performance is impacted for maintenance (day-to-day operations) and activities such as adding and replacing nodes. In certain situations, some operations can take days.
- Per compaction strategy for most, but not all workloads:
- STCS (SizeTieredCompactionStrategy):
- HDD: 500 GB
- SSD: 2 TB
- LCS (LeveledCompactionStrategy):
- HDD: 500 GB
- SSD: 600 GB
- DTCS (DateTieredCompactionStrategy): (assuming the DTCS is the best fit for the
- Highly dependant on data model. Consult the Cassandra community.
- TWCS (TimeWindowCompactionStrategy):
- Highly dependant on data model. Contact the Cassandra community.
- STCS (SizeTieredCompactionStrategy):
- Search node capacity
- For best performance, use a maximum of 500 GB capacity for search nodes. If you require higher capacity, perform extensive testing or consulting assistance before deployment. See Capacity planning for DSE Search.
- Capacity and I/O
- When choosing disks for your nodes, consider both capacity (how much data you plan to store) and I/O (the write/read throughput rate). Some workloads are best served by using less expensive SATA disks and scaling disk capacity and I/O by adding more nodes (with more RAM).
- Number of disks - SATA
- Ideally Cassandra need at least two disks per node, one for the commit log and the other for the data directories. At a minimum the commit log should be on its own partition.
- Commit log disk - SATA
- The disk need not be large, but it should be fast enough to receive all of your writes as appends (sequential I/O).
- Commit log disk - SSD
- Unlike spinning disks, it's alright to store both commit logs and SSTables are on the same mount point.
- Data disks
- Use one or more disks per node and make sure they are large enough for the data volume and fast enough to both satisfy reads that are not cached in memory and to keep up with compaction.
- RAID on data disks
- It is generally not necessary to use RAID for the following reasons:
- Data is replicated across the cluster based on the replication factor you've chosen.
- Cassandra includes a JBOD (Just a bunch of disks) feature to take care of disk management. Because Cassandra responds according to your availability/consistency requirements to a disk failure either by stopping the affected node or by blacklisting the failed drive, you can deploy nodes with large disk arrays without the overhead of RAID 10. You can configure Cassandra to stop the affected node or blacklist the drive according to your availability/consistency requirements. Also see Recovering using JBOD.
- RAID on the commit log disk
- Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need extra redundancy, use RAID 1.
- Extended file systems
- It is recommended to deploy Cassandra on XFS or ext4. On ext2 or ext3, the maximum
file size is 2 TB even using a 64-bit kernel. On ext4 it is 16 TB.
Because Cassandra can use almost half your disk space for a single file when using
SizeTieredCompactionStrategy, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16 TB max on a 32-bit kernel, and essentially unlimited on 64-bit.
Since Cassandra is a distributed data store, it puts load on the network to handle read/write requests and replication of data across nodes. Be sure that your network can handle inter-node traffic without bottlenecks. It is recommended binding your interfaces to separate Network Interface Cards (NIC). You can use public or private NICs depending on your requirements.
- Recommended bandwidth is 1000 Mb/s (gigabit) or greater.
- Thrift/native protocols use the rpc_address.
- Cassandra's internal storage protocol uses the listen_address.
Cassandra efficiently routes requests to replicas that are geographically closest to the coordinator node and chooses a replica in the same rack when possible. Cassandra will always choose replicas located in the same datacenter over replicas in a remote datacenter.
If using a firewall, make sure that nodes within a cluster can reach each other. See Configuring firewall port access: