Running Wikipedia demo using DSE Search

Run the Wikipedia demo using DSE Search on a single node to download Wikipedia articles, create a CQL table, store the articles, and index the articles in Solr.

The following instructions describe how to run the Wikipedia demo on a single node. You run scripts that download 3,000 Wikipedia articles, create a CQL table, store the articles, and index the articles in Solr. The demo includes a web interface for querying the articles. You can also use the Solr HTTP API or CQL to query the articles.

The scripts that you run in this demo are written to set up the localhost and fail if the default interface of the node is not 127.0.0.1.

Procedure

  1. Start DataStax Enterprise as a Solr node if you haven't already done so.
  2. When using cqlsh, pagination is on by default. Queries with small result sets will see increased performance when paging is turned off, so use the CQL PAGING command to disable pagination:
    PAGING OFF
  3. Go to the wikipedia demo directory.
    • Installer-Services and Package installations: $ cd /usr/share/dse/demos/wikipedia
    • Installer-No Services and Tarball installations: $ cd install_location/demos/wikipedia
  4. Upload the schema by running the add schema script. On Linux, for example:
    ./1-add-schema.sh
    The script posts solrconfig.xml and schema.xml to these locations:
    • http://localhost:8983/solr/resource/wiki.solr/solrconfig.xml
    • http://localhost:8983/solr/resource/wiki.solr/schema.xml

    The script also creates the Solr index and core. The wiki.solr part of the URL creates the keyspace (wiki) and the column family (solr) in Cassandra.

  5. Index the articles contained in the wikipedia-sample.bz2 file in the demo directory by running the index script.
    ./2-index.sh --wikifile wikipedia-sample.bz2
    Three thousand articles load.
  6. Open the Solr Admin tool.
    Be sure to enter the trailing "/".
    http://localhost:8983/solr/

  7. Inspect the schema. In the Solr Admin, select wiki.solr from the Core Selector drop-down. Click the Schema in the vertical navigation bar.

    You can use the Solr Admin to query the Wikipedia database in Cassandra. You can also use the Solr HTTP API or cqlsh to query the database.
  8. Start cqlsh, and use the wiki keyspace. Execute a CQL select statement using the solr_query expression to find the titles in the table named solr that begin with the letters natio:
    USE wiki;
    
    SELECT title FROM solr WHERE solr_query='title:natio*';
    The output, sorted in lexical order, appears:
     title
    --------------------------------------------------------------------------
                                         Kenya national under-20 football team
                                           Bolivia national football team 2002
                                      Israel men's national inline hockey team
      List of French born footballers who have played for other national teams
                                           Bolivia national football team 1999
                                           Bolivia national football team 2001
                                           Bolivia national football team 2000

    Using CQL, you can enclose the Solr query string in single quotation marks. For example, after running the Solr Demo, you can use these Solr query strings:

    Type of Query Example Description
    Field search 'title:natio* AND Kenya' You can use multiple fields defined in the schema: 'title:natio* AND body:CarlosAragonés'
    Wildcard search 'Ken?a' Use ? or * for single or multi-character searches.
    Fuzzy search 'Kenya~' Use with caution, many hits can occur.
    Phrase search '"American football player"' Searches for the phrase enclosed in double quotation marks.
    Proximity search '"football Bolivia"~10' Searches for football and Bolivia within 10 words of each other.
    Range searches 'title:[football TO soccer}' Supports both inclusive and exclusive bounds using square brackets and curly braces, respectively.
    Term boosting '"football"^4 "soccer"' By default, the boost factor is 1. Must be a positive number.
    Boolean operator '+Macedonian football' AND, +, OR, NOT and - can be used.
    Grouping '(football OR soccer) AND Carlos Aragonés' Use parentheses to group clauses.
    Field grouping 'title:(+football +"Bolivia")' Use parentheses to group multiple clauses into one field.
  9. To see the sample Wikipedia search UI, open your web browser and go to this URL:
    http://localhost:8983/demos/wikipedia

  10. To search in the bodies of the articles, enter a word in the Search field, and press Enter.