How off-heap components affect reads
Factors for increasing the data handling capacity per node.
- Bloom filter
- Compression offsets map
- Partition summary
Of the components in memory, only the partition key cache is a fixed size. Other components grow as the data set grows.
Bloom filter
The Bloom filter grows to approximately 1-2 GB per billion partitions. In the extreme case, you can have one partition per row, so you can easily have billions of these entries on a single machine. The Bloom filter is tunable if you want to trade memory for performance.
Partition summary
By default, the partition summary is a sample of the partition index. You configure sample frequency by changing the index_interval property in the table definition, also if you want to trade memory for performance.
Compression offsets
The compression offset map grows to 1-3 GB per terabyte compressed. The more you compress data, the greater number of compressed blocks you have and the larger the compression offset table. Compression is enabled by default even though going through the compression offset map consumes CPU resources. Having compression enabled makes the page cache more effective, and typically, almost always pays off.