Estimating usable disk capacity for Apache Cassandra 

Determining how much data your Apache Cassandra nodes can hold.

Attention: DataStax Enterprise customers, see the topics in Planning and testing DataStax Enterprise deployments in the DataStax Enterprise documentation instead.

To estimate how much data your Apache Cassandra™ nodes can hold, calculate the usable disk capacity per node and then multiply that by the number of nodes in your cluster. Typically in a production cluster, the commit log and data directories are on different disks.

Procedure

  1. Start with the raw capacity of the physical disks:
    raw_capacity = disk_size * number_of_data_disks
  2. Calculate the usable disk space accounting for file system formatting overhead (roughly 10 percent):
    formatted_disk_space = (raw_capacity * 0.9)
  3. Calculate the recommended working disk capacity:

    During normal operations, Cassandra routinely requires disk capacity for compaction and repair operations. For optimal performance and cluster health, it is recommended to not fill your disks to capacity, but running at 50% to 80% capacity depending on the compaction strategy and size of the compactions.

    usable_disk_space = formatted_disk_space * (0.5 to 0.8)