Running the Wikipedia demo with SearchAnalytics
The Wikipedia Solr demo can be run on a SearchAnalytics node to retrieve Spark RDDs using Solr queries.
The following instructions describe how to use Solr queries in the Spark console on SearchAnalytics nodes using the Wikipedia demo.
Prerequisites
You must have created a new SearchAnalytics cluster as described in the single datacenter deployment scenario.
Procedure
-
Start the node or nodes in SearchAnalytics mode.
- Packages/Services: See Starting DataStax Enterprise as a service.
- Tarball/No Services: See Starting DataStax Enterprise as a stand-alone process.
-
Ensure that the cluster is running correctly by running
dsetool ring. The node type should be
SearchAnalytics
.dsetool ring
The default location of thedsetool
command depends on the type of installation:Package installations /usr/bin/dsetool Installer-Services installations /usr/bin/dsetool Installer-No Services and Tarball installations install_location/bin/dsetool -
In a terminal, go to the Wikipedia demo directory.
The default wikipedia demo location depends on the type of installation:
Installer-No Services and Tarball installations install_location/demos/wikipedia Installer-Services and Package installations /usr/share/dse/demos/wikipedia cd /usr/share/dse/demos/wikipedia
-
Add the schema by running the 1-add-schema.sh
script.
./1-add-schema.sh
-
Create the Solr indexes.
./2-index.sh
-
Start the Spark console.
dse spark
-
Create an RDD based on the
wiki.solr
table.scala> val table = sc.cassandraTable("wiki","solr")
table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
-
Run a query using the title Solr index and collect the results.
scala> val result = table.select("id","title").where("solr_query='title:Boroph*'").collect
Equivalent JSON query:where("solr_query='{"q": "title:Boroph*"}'")
result: Array[com.datastax.spark.connector.CassandraRow] = Array( CassandraRow{id: 23729958, title: Borophagus parvus}, CassandraRow{id: 23730195, title: Borophagus dudleyi}, CassandraRow{id: 23730528, title: Borophagus hilli}, CassandraRow{id: 23730810, title: Borophagus diversidens}, CassandraRow{id: 23730974, title: Borophagus littoralis}, CassandraRow{id: 23731282, title: Borophagus orc}, CassandraRow{id: 23731616, title: Borophagus pugnator}, CassandraRow{id: 23732450, title: Borophagus secundus})
What's next
For details on using Solr query syntax in CQL, see CQL queries.