Tuning Bloom filters
DataStax Enterprise uses Bloom filters to determine whether an SSTable has data for a particular partition. Bloom filters are unused for range scans, but are used for index scans. Bloom filters are probabilistic sets that allow you to trade memory for accuracy. This means that higher Bloom filter attribute settings bloom_filter_fp_chance use less memory, but will result in more disk I/O if the SSTables are highly fragmented. Bloom filter settings range from 0 to 1.0 (disabled). The default value of bloom_filter_fp_chance depends on the compaction strategy.
The LeveledCompactionStrategy (LCS) uses a higher default value (0.1) than the SizeTieredCompactionStrategy (STCS), which has a default of 0.01. Memory savings are nonlinear; going from 0.01 to 0.1 saves about one third of the memory. SSTables using LCS contain a relatively smaller ranges of keys than those using STCS, which facilitates efficient exclusion of the SSTables even without a bloom filter; however, adding a small bloom filter helps when there are many levels in LCS.
The settings you choose depend the type of workload. For example, to run an analytics application that heavily scans a particular table, you would want to inhibit the Bloom filter on the table by setting it high.
To view the observed Bloom filters false positive rate and the number of SSTables consulted per read use tablestats in the nodetool utility.
Bloom filters are stored off-heap so you don’t need include it when determining the -Xmx settings (the maximum memory size that the heap can reach for the JVM).
To change the bloom filter property on a table, use CQL. For example:
ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.1;
After updating the value of bloom_filter_fp_chance on a table, Bloom filters need to be regenerated in one of these ways:
-
Upgrade the SSTables to compute new bloom filters:
-
Force all SSTables to be rewritten
nodetool upgradesstables -a
-
Force upgrade of target SSTables
nodetool upgradesstables -a keyspace_name table_name
If the SSTables are already on the current version, the nodetool upgradesstables command returns immediately and no action is taken. You must use the
-a
command argument to force the SSTable upgrade. -
You do not have to restart DataStax Enterprise after regenerating SSTables.