Using predicate push down on search indexes in Spark SQL
Search predicate push down allows queries in SearchAnalytics datacenters to use Solr-indexed columns in Spark SQL queries.
To enable Search predicate push down, set the
spark.sql.dse.search.enableOptimization property to
spark.sql.dse.search.enableOptimization is set to
When in auto mode the predicate push down will do a COUNT operation against the Search indices both with and without the predicate filters applied. If the number of records with the predicate filter is less than the result of the following formula:
spark.sql.dse.search.autoRatio * <the total number of records>
the optimization occurs automatically.
spark.sql.dse.search.autoRatio is user configurable.
The default value is 0.03.
The performance of DSE Search is directly related to the number of records returned in a query. Requests which require a large portion of the dataset are likely better served by a full table scan without using predicate push downs.
To enable Solr predicate push down on a Scala dataset:
val solrEnabledDataSet = spark.read .format("org.apache.spark.sql.cassandra") .options(Map( "keyspace" -> "ks", "table" -> "tab", "spark.sql.dse.search.enableOptimization" -> "on") .load()
To create a temporary table in Spark SQL with Solr predicate push down enabled:
CREATE TEMPORARY TABLE temp USING org.apache.spark.sql.cassandra OPTIONS ( table "tab", keyspace "ks", spark.sql.dse.search.enableOptimization "on");
spark.sql.dse.search.enableOptimization property globally by adding it to the server configuration file.
The optimizer works on the push down level so only predicates which are being pushed to the source can be optimized.
explain command to see exactly what predicates are being pushed to the
val query = spark.sql("<query>") query.explain
The optimization plans for a query using predicate push downs are logged by setting the
org.apache.spark.sql.SolrPredicateRules logger to
DEBUG in the Spark logging configuration files.
<logger name="org.apache.spark.sql.SolrPredicateRules" level="DEBUG"/>