Running the Wikipedia demo with SearchAnalytics

The following instructions describe how to use search queries in the Spark console on SearchAnalytics nodes using the Wikipedia demo.

You must have created a new SearchAnalytics datacenter as described in the single datacenter deployment scenario.

  1. Start the node or nodes in SearchAnalytics mode.

  2. Ensure that the cluster is running correctly by running dsetool ring. The node type should be SearchAnalytics.

    Package and Installer-Services installations: dsetool ring

    Tarball and Installer-No Services installations:`installation_location/bin/dsetool ring`

  3. In a terminal, go to the Wikipedia demo directory.

    The default wikipedia demo location depends on the type of installation:

    • Package installations and Installer-Services: /usr/share/dse/demos/wikipedia

    • Tarball installations and Installer-No Services: installation_location/demos/wikipedia

      $ cd /usr/share/dse/demos/wikipedia
  4. Add the schema by running the 1-add-schema.sh script.

    $ ./1-add-schema.sh
  5. Create the search indexes.

    $ ./2-index.sh
  6. Start the Spark console.

    $ dse spark
  7. Create an RDD based on the wiki.solr table.

    $ scala> val table = sc.cassandraTable("wiki","solr")
    $ table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
  8. Run a query using the title Solr index and collect the results.

    $ scala> val result = table.select("id","title").where("solr_query='title:Boroph*'").collect

    Equivalent JSON query:

    $ where("solr_query='{"q": "title:Boroph*"}'")
    result:
        Array[com.datastax.spark.connector.CassandraRow] = Array(
            CassandraRow{id: 23729958, title: Borophagus parvus},
            CassandraRow{id: 23730195, title: Borophagus dudleyi},
            CassandraRow{id: 23730528, title: Borophagus hilli},
            CassandraRow{id: 23730810, title: Borophagus diversidens},
            CassandraRow{id: 23730974, title: Borophagus littoralis},
            CassandraRow{id: 23731282, title: Borophagus orc},
            CassandraRow{id: 23731616, title: Borophagus pugnator},
            CassandraRow{id: 23732450, title: Borophagus secundus})

For details on using search query syntax in CQL, see Search index syntax.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com