Indexing a column for different analysis
DSE Search supports indexing a CQL table column for different types of analysis using the Solr copyField
directive.
For a complete explanation, see the Solr Reference Guide Copying fields. |
When specified during search index creation, DSE automatically defines a new index string field and sets up the data copy. The new field is not stored in the database or returned in query results.
Restriction: Copying from/to the same dynamic field and setting the maximum number of characters (maxChars
) in the copyField definition are unsupported.
The following example uses copy fields to copy various CQL columns, such as a twitter name and email, to a multiValued field. You can then query the multiValued field using a term to search for all columns in a single query.
The DSE per-segment filter cache is moved off-heap by using native memory to reduce on-heap memory consumption and garbage collection overhead. |
The off-heap filter cache is enabled by default. To disable, pass the offheap JVM system property at startup time: -Dsolr.offheap.enable.
Procedure
-
Create a keyspace using the replication strategy and replication factor that makes sense for your environment. The following example is for a single node test cluster:
CREATE KEYSPACE user_info WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
-
Create a table:
CREATE TABLE user_info.users ( id text PRIMARY KEY, name text, email text, skype text, irc text, twitter text ) ;
-
Insert some data:
INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES ('user1', 'john smith', 'jsmith@abc.com', 'johnsmith', 'smitty', '@johnsmith'); INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES ('user2', 'elizabeth doe', 'lizzy@swbell.net', 'roadwarriorliz', 'elizdoe', '@edoe576'); INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES ('user3', 'dan graham', 'etnaboy1@aol.com', 'danielgra', 'dgraham', '@dannyboy'); INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES ('user4', 'john smith', 'jonsmit@fyc.com', 'johnsmith', 'jsmith345', '@johnrsmith'); INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES ('user5', 'john smith', 'jds@adeck.net', 'jdsmith', 'jdansmith', '@smithjd999'); INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES ('user6', 'dan graham', 'hacker@legalb.com', 'dangrah', 'dgraham', '@graham222');
-
Create a search index on the table:
CREATE SEARCH INDEX ON user_info.users;
-
Create a field that is only in the index that will contain all the data:
ALTER SEARCH INDEX SCHEMA ON user_info.users ADD fields.field[ @name='all', @type='StrField', @multiValued='true'];
-
Use
copyField
to copy the data from all the CQL columns into the newall
field of the index:ALTER SEARCH INDEX SCHEMA ON user_info.users ADD copyField[@source='id', @dest='all']; ALTER SEARCH INDEX SCHEMA ON user_info.users ADD copyField[@source='name', @dest='all']; ALTER SEARCH INDEX SCHEMA ON user_info.users ADD copyField[@source='email', @dest='all']; ALTER SEARCH INDEX SCHEMA ON user_info.users ADD copyField[@source='skype', @dest='all']; ALTER SEARCH INDEX SCHEMA ON user_info.users ADD copyField[@source='irc', @dest='all']; ALTER SEARCH INDEX SCHEMA ON user_info.users ADD copyField[@source='twitter', @dest='all'];
-
To allow faceting on the name column, set
docValues
totrue
:ALTER SEARCH INDEX SCHEMA ON user_info.users SET fields.field[@name='name']@docValues='true';
-
Reload the schema to make the pending changes active:
RELOAD SEARCH INDEX ON user_info.users;
-
Rebuild the index to apply the new schema to the existing data:
REBUILD SEARCH INDEX ON user_info.users;
-
Filter the query using the index to return all records that contain
smitty
in any of the columns.SELECT * FROM user_info.users WHERE solr_query = 'all:smitty';
The output is:
id | email | irc | name | skype | solr_query | twitter -------+----------------+--------+------------+-----------+------------+------------ user1 | jsmith@abc.com | smitty | john smith | johnsmith | null | @johnsmith (1 rows)
-
Get a count of unique names (skip nulls):
SELECT name FROM user_info.users WHERE solr_query= '{"q":"*","facet":{"field":"name","mincount":"1"}}';
At the bottom of the output, the facet results appear: 3 instances of john smith, 2 instances of dan graham, and 1 instance of elizabeth doe.
facet_fields ------------------------------------------------------------ {"name":{"john smith":3,"dan graham":2,"elizabeth doe":1}} (1 rows)