Running the Wikipedia demo with SearchAnalytics
The Wikipedia Solr demo can be run on a SearchAnalytics node to retrieve Spark RDDs using search queries.
The following instructions describe how to use search queries in the Spark console on SearchAnalytics nodes using the Wikipedia demo.
Prerequisites
You must have created a new SearchAnalytics datacenter as described in the single datacenter deployment scenario.
Procedure
-
Start the node or nodes in SearchAnalytics mode.
- Package installations: See Starting DataStax Enterprise as a service.
- Package installations:See Starting DataStax Enterprise as a stand-alone process.
-
Ensure that the cluster is running correctly by running
dsetool ring
. The node type should beSearchAnalytics
.Package installations:
dsetool ring
Tarball installations:
installation_location/bin/dsetool ring
-
In a terminal, go to the Wikipedia demo directory.
The default wikipedia demo location depends on the type of installation:
- Package installations: /usr/share/dse/demos/wikipedia
- Tarball installations: installation_location/demos/wikipedia
cd /usr/share/dse/demos/wikipedia
-
Add the schema by running the 1-add-schema.sh
script.
./1-add-schema.sh
-
Create the search indexes.
./2-index.sh
-
Start the Spark console.
dse spark
-
Create an RDD based on the
wiki.solr
table.scala> val table = sc.cassandraTable("wiki","solr")
table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
-
Run a query using the title Solr index and collect the results.
scala> val result = table.select("id","title").where("solr_query='title:Boroph*'").collect
Equivalent JSON query:where("solr_query='{"q": "title:Boroph*"}'")
result: Array[com.datastax.spark.connector.CassandraRow] = Array( CassandraRow{id: 23729958, title: Borophagus parvus}, CassandraRow{id: 23730195, title: Borophagus dudleyi}, CassandraRow{id: 23730528, title: Borophagus hilli}, CassandraRow{id: 23730810, title: Borophagus diversidens}, CassandraRow{id: 23730974, title: Borophagus littoralis}, CassandraRow{id: 23731282, title: Borophagus orc}, CassandraRow{id: 23731616, title: Borophagus pugnator}, CassandraRow{id: 23732450, title: Borophagus secundus})
What's next
For details on using search query syntax in CQL, see CQL queries.