Term and phrase searches using the wikipedia demo
The Wikipedia demo scripts automatically download 3,000+ Wikipedia articles, create a CQL keyspace and table, insert the articles, and create a search index on both the title and body columns.
Prerequisites
The demo scripts connect to the localhost on the Solr port.
Ensure that the Solr interface and port 127.0.0.1:8983
are accessible.
Procedure
-
Start DataStax Enterprise as a search node.
-
Go to
<installation_directory>/demos/wikipedia
. -
Run the script to add the wikipedia schema:
./1-add-schema.sh
This script creates the
wiki
keyspace with a single tablesolr
. -
To use the demo in a cluster that has more than one node, change the keyspace replication from SimpleStrategy to NetworkTopologyStrategy, and set the factor to 1 in each datacenter:
cqlsh -e 'ALTER KEYSPACE wiki WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'Cassandra' : 1, 'Solr' : 1};
In this example, the cluster has two datacenters,
Cassandra
andSolr
. Datacenter names are case-sensitive. -
Load the data and index the table using the second script (
2-index.sh
)../2-index.sh --wikifile wikipedia-sample.bz2
3,000 articles are loaded into the
solr
table and then indexed.Start indexing wikipedia... ------------> config properties: docs.file = wikipedia-sample.bz2 keep.image.only.docs = false ------------------------------- Indexed 1000 Indexed 2000 Indexed 3000 Finished Visit http://localhost:8983/demos/wikipedia/ to see data
-
Verify that the data was successfully loaded into the keyspace/table:
cqlsh -e 'DESC KEYSPACE wiki; SELECT count(*) FROM wiki.solr;'
The results show the details of the keyspace, table schema, index settings, and number of articles.
CREATE KEYSPACE wiki WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true; CREATE TABLE wiki.solr ( id text PRIMARY KEY, body text, date text, solr_query text, title text ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99PERCENTILE'; CREATE CUSTOM INDEX wiki_solr_solr_query_index ON wiki.solr (solr_query) USING 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex'; count ------- 3579 (1 rows) Warnings : Aggregation query used without partition key
-
Start
cqlsh
using the wiki keyspace.cqlsh -k wiki
CQL shell session starts on the localhost in the
wiki
keyspace.Connected to pw-search at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.0.1805 | DSE 5.1.3 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh:wiki>
-
Disable paging, for faster query results on small data sets:
PAGING off
Paging is turned off only for the session. Paging is enabled after a restart. Use a cqlshrc file to change the default startup parameters for cqlsh.
Disabled Query paging.
-
Display the solr table search index schema:
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON solr;
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.TextField" name="TextField"> <analyzer> <tokenizer class="solr.WikipediaTokenizerFactory"/> </analyzer> </fieldType> <fieldType class="org.apache.solr.schema.StrField" name="StrField"/> </types> <fields> <field indexed="true" multiValued="false" name="body" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="title" stored="true" type="TextField"/> <field docValues="true" indexed="true" multiValued="false" name="id" stored="true" type="StrField"/> <field docValues="true" indexed="true" multiValued="false" name="date" stored="true" type="StrField"/> </fields> <uniqueKey>id</uniqueKey> </schema>
-
Execute queries against the table using the index:
-
Return the titles of articles that contain the word national:
SELECT title FROM solr WHERE solr_query='title:national';
Seven records are returned.
title -------------------------------------------------------------------------- Bolivia national football team 1999 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 2001 Bolivia national football team 2002 Israel men's national inline hockey team List of French born footballers who have played for other national teams (7 rows)
-
- Using secure cluster
-
Information about running the Wikipedia demo on a secure cluster.