Estimating usable disk capacity 

Determining how much data your Cassandra nodes can hold.

Attention: DataStax Enterprise customers. Do not use the topics in this section. See Planning and testing DataStax Enterprise deployments.

To estimate how much data your Apache Cassandra™ nodes can hold, calculate the usable disk capacity per node and then multiply that by the number of nodes in your cluster. Typically in a production cluster, the commit log and data directories are on different disks.

Procedure

  1. Start with the raw capacity of the physical disks:
    raw_capacity = disk_size * number_of_data_disks
  2. Calculate the usable disk space accounting for file system formatting overhead (roughly 10 percent):
    formatted_disk_space = (raw_capacity * 0.9)
  3. Calculate the recommended working disk capacity:

    During normal operations, Cassandra routinely requires disk capacity for compaction and repair operations. For optimal performance and cluster health, it is recommended to not fill your disks to capacity, but running at 50% to 80% capacity depending on the compaction strategy and size of the compactions.

    usable_disk_space = formatted_disk_space * (0.5 to 0.8)