Using predicate push down in Spark SQL

Solr predicate push down allows queries in SearchAnalytics datacenters to use Solr-indexed columns in Spark SQL queries. To enable Solr predicate push down, set the spark.sql.dse.solr.enable_optimization property to true either on a global or per-table or per-dataset basis.

The performance of DSE Search is directly related to the number of records returned in a query. Requests which require a large portion of the dataset are likely better served by a full table scan without using predicate push downs.

To enable Solr predicate push down on a Scala dataset:

val solrEnabledDataSet = spark.read
    .format("org.apache.spark.sql.cassandra")
    .options(Map(
        "keyspace" -> "ks",
        "table" -> "tab",
        "spark.sql.dse.solr.enable_optimization" -> "true")
    .load()

To create a temporary table in Spark SQL with Solr predicate push down enabled:

CREATE TEMPORARY TABLE temp USING org.apache.spark.sql.cassandra OPTIONS (
table "tab",
keyspace "ks",
spark.sql.dse.solr.enable_optimization "true");

Set the spark.sql.dse.solr.enable_optimization property globally by adding it to the server configuration file.

The optimizer works on the push down level so only predicates which are being pushed to the source can be optimized. Use the explain command to see exactly what predicates are being pushed to the CassandraSourceRelation.

val query = spark.sql("query")
query.explain

Logging optimization plans

The optimization plans for a query using predicate push downs are logged by setting the org.apache.spark.sql.SolrPredicateRules logger to DEBUG in the Spark logging configuration files.

<logger name="org.apache.spark.sql.SolrPredicateRules" level="DEBUG"/>

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com