Tuning Bloom filters

DataStax Enterprise uses Bloom filters to determine whether an SSTable has data for a particular partition. Bloom filters are unused for range scans, but are used for index scans. Bloom filters are probabilistic sets that allow you to trade memory for accuracy. This means that higher Bloom filter attribute settings bloom_filter_fp_chance use less memory, but will result in more disk I/O if the SSTables are highly fragmented. Bloom filter settings range from 0 to 1.0 (disabled). The default value of bloom_filter_fp_chance depends on the compaction strategy.

The LeveledCompactionStrategy (LCS) uses a higher default value (0.1) than theSizeTieredCompactionStrategy (STCS), which has a default of 0.01. Memory savings are nonlinear; going from 0.01 to 0.1 saves about one third of the memory. SSTables using LCS contain a relatively smaller ranges of keys than those using STCS, which facilitates efficient exclusion of the SSTables even without a bloom filter; however, adding a small bloom filter helps when there are many levels in LCS.

The settings you choose depend the type of workload. For example, to run an analytics application that heavily scans a particular table, you would want to inhibit the Bloom filter on the table by setting it high.

To view the observed Bloom filters false positive rate and the number of SSTables consulted per read use tablestats in the nodetool utility.

Bloom filters are stored off-heap so you don’t need include it when determining the -Xmx settings (the maximum memory size that the heap can reach for the JVM).

To change the bloom filter property on a table, use CQL. For example:

ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.1;

After updating the value of bloom_filter_fp_chance on a table, Bloom filters need to be regenerated in one of these ways:

  • Initiate compaction

  • Upgrade the SSTables to compute new bloom filters:

    • Force all SSTables to be rewritten

      nodetool upgradesstables -a
    • Force upgrade of target SSTables

      nodetool upgradesstables -a keyspace_name table_name

    If the SSTables are already on the current version, the nodetool upgradesstables command returns immediately and no action is taken. You must use the -a command argument to force the SSTable upgrade.

You do not have to restart DataStax Enterprise after regenerating SSTables.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com