Running the Wikipedia demo with SearchAnalytics

The Wikipedia Solr demo can be run on a SearchAnalytics node to retrieve Spark RDDs using Solr queries.

The following instructions describe how to use Solr queries in the Spark console on SearchAnalytics nodes using the Wikipedia demo.

Prerequisites

You must have created a new SearchAnalytics cluster as described in the single datacenter deployment scenario.

Procedure

  1. Start the node or nodes in SearchAnalytics mode.
  2. Ensure that the cluster is running correctly by running dsetool ring. The node type should be SearchAnalytics.
    dsetool ring
    The default location of the dsetool command depends on the type of installation:
    Package installations /usr/bin/dsetool
    Installer-Services installations /usr/bin/dsetool
    Installer-No Services and Tarball installations install_location/bin/dsetool
  3. In a terminal, go to the Wikipedia demo directory.
    The default wikipedia demo location depends on the type of installation:
    Installer-No Services and Tarball installations install_location/demos/wikipedia
    Installer-Services and Package installations /usr/share/dse/demos/wikipedia
    cd /usr/share/dse/demos/wikipedia
  4. Add the schema by running the 1-add-schema.sh script.
    ./1-add-schema.sh
  5. Create the Solr indexes.
    ./2-index.sh
  6. Start the Spark console.
    dse spark
  7. Create an RDD based on the wiki.solr table.
    scala> val table = sc.cassandraTable("wiki","solr")
    table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
  8. Run a query using the title Solr index and collect the results.
    scala> val result = table.select("id","title").where("solr_query='title:Boroph*'").collect
    Equivalent JSON query:
    where("solr_query='{"q": "title:Boroph*"}'")
    result:
        Array[com.datastax.spark.connector.CassandraRow] = Array(
            CassandraRow{id: 23729958, title: Borophagus parvus},
            CassandraRow{id: 23730195, title: Borophagus dudleyi}, 
            CassandraRow{id: 23730528, title: Borophagus hilli}, 
            CassandraRow{id: 23730810, title: Borophagus diversidens}, 
            CassandraRow{id: 23730974, title: Borophagus littoralis}, 
            CassandraRow{id: 23731282, title: Borophagus orc},
            CassandraRow{id: 23731616, title: Borophagus pugnator}, 
            CassandraRow{id: 23732450, title: Borophagus secundus})

What's next

For details on using Solr query syntax in CQL, see CQL queries.