Running the Wikipedia demo with SearchAnalytics

The Wikipedia Solr demo can be run on a SearchAnalytics node to retrieve Spark RDDs using search queries.

The following instructions describe how to use search queries in the Spark console on SearchAnalytics nodes using the Wikipedia demo.

Prerequisites

You must have created a new SearchAnalytics datacenter as described in the single datacenter deployment scenario.

Procedure

  1. Start the node or nodes in SearchAnalytics mode.
  2. Ensure that the cluster is running correctly by running dsetool ring. The node type should be SearchAnalytics.

    Package installations: dsetool ring

    Tarball installations: installation_location/bin/dsetool ring

  3. In a terminal, go to the Wikipedia demo directory.
    The default wikipedia demo location depends on the type of installation:
    • Package installations: /usr/share/dse/demos/wikipedia
    • Tarball installations: installation_location/demos/wikipedia
    cd /usr/share/dse/demos/wikipedia
  4. Add the schema by running the 1-add-schema.sh script.
    ./1-add-schema.sh
  5. Create the search indexes.
    ./2-index.sh
  6. Start the Spark console.
    dse spark
  7. Create an RDD based on the wiki.solr table.
    scala> val table = sc.cassandraTable("wiki","solr")
    table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
  8. Run a query using the title Solr index and collect the results.
    scala> val result = table.select("id","title").where("solr_query='title:Boroph*'").collect
    Equivalent JSON query:
    where("solr_query='{"q": "title:Boroph*"}'")
    result:
        Array[com.datastax.spark.connector.CassandraRow] = Array(
            CassandraRow{id: 23729958, title: Borophagus parvus},
            CassandraRow{id: 23730195, title: Borophagus dudleyi}, 
            CassandraRow{id: 23730528, title: Borophagus hilli}, 
            CassandraRow{id: 23730810, title: Borophagus diversidens}, 
            CassandraRow{id: 23730974, title: Borophagus littoralis}, 
            CassandraRow{id: 23731282, title: Borophagus orc},
            CassandraRow{id: 23731616, title: Borophagus pugnator}, 
            CassandraRow{id: 23732450, title: Borophagus secundus})

What's next

For details on using search query syntax in CQL, see CQL queries.