Running the Wikipedia demo with SearchAnalytics

The Wikipedia Solr demo can be run on a SearchAnalytics node to retrieve Spark RDDs using search queries.

About this task

The following instructions describe how to use search queries in the Spark console on SearchAnalytics nodes using the Wikipedia demo.

Prerequisites

You must have created a new SearchAnalytics datacenter as described in the single datacenter deployment scenario.

Procedure

  1. Start the node or nodes in SearchAnalytics mode.

  2. Ensure that the cluster is running correctly by running dsetool ring. The node type should be SearchAnalytics.

    Package installations: dsetool ring

    Tarball installations:installation\_location/bin/dsetool ring

  3. In a terminal, go to the Wikipedia demo directory.

    The default wikipedia demo location depends on the type of installation:

    • Package installations: /usr/share/dse/demos/wikipedia

    • Tarball installations: installation_location/demos/wikipedia

      cd /usr/share/dse/demos/wikipedia
  4. Add the schema by running the 1-add-schema.sh script.

    ./1-add-schema.sh
  5. Create the search indexes.

    ./2-index.sh
  6. Start the Spark console.

    dse spark
  7. Create an RDD based on the wiki.solr table.

    scala> val table = sc.cassandraTable("wiki","solr")
    table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
  8. Run a query using the title Solr index and collect the results.

    scala> val result = table.select("id","title").where("solr_query='title:Boroph*'").collect

    Equivalent JSON query:

    where("solr_query='{"q": "title:Boroph*"}'")
    result:
        Array[com.datastax.spark.connector.CassandraRow] = Array(
            CassandraRow{id: 23729958, title: Borophagus parvus},
            CassandraRow{id: 23730195, title: Borophagus dudleyi},
            CassandraRow{id: 23730528, title: Borophagus hilli},
            CassandraRow{id: 23730810, title: Borophagus diversidens},
            CassandraRow{id: 23730974, title: Borophagus littoralis},
            CassandraRow{id: 23731282, title: Borophagus orc},
            CassandraRow{id: 23731616, title: Borophagus pugnator},
            CassandraRow{id: 23732450, title: Borophagus secundus})

What’s next

For details on using search query syntax in CQL, see CQL queries.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com