Indexing a column for different analysis

DSE Search supports indexing a CQL table column for different types of analysis using the Solr copyField directive.

For a complete explanation, see the Solr Reference Guide Copying fields.

When specified during search index creation, DSE automatically defines a new index string field and sets up the data copy. The new field is not stored in the database or returned in query results.

Copying from/to the same dynamic field and setting the maximum number of characters (maxChars) in the copyField definition are unsupported.

The following example uses copy fields to copy various CQL columns, such as a twitter name and email, to a multiValued field. You can then query the multiValued field using a term to search for all columns in a single query.

Procedure

Create a keyspace using the replication strategy and replication factor that makes sense for your environment. The following example is for a single node test cluster:
```
CREATE KEYSPACE user_info
  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
```

Create a table:

CREATE TABLE user_info.users (
  id text PRIMARY KEY,
  name text,
  email text,
  skype text,
  irc text,
  twitter text
) ;

Insert some data:

INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES
  ('user1', 'john smith', 'jsmith@abc.com', 'johnsmith', 'smitty', '@johnsmith');

INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES
  ('user2', 'elizabeth doe', 'lizzy@swbell.net', 'roadwarriorliz', 'elizdoe',  '@edoe576');

INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES
  ('user3', 'dan graham', 'etnaboy1@aol.com', 'danielgra', 'dgraham', '@dannyboy');

INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES
  ('user4', 'john smith', 'jonsmit@fyc.com', 'johnsmith', 'jsmith345', '@johnrsmith');

INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES
  ('user5', 'john smith', 'jds@adeck.net', 'jdsmith', 'jdansmith',  '@smithjd999');

INSERT INTO user_info.users (id, name, email, skype, irc, twitter) VALUES
  ('user6', 'dan graham', 'hacker@legalb.com', 'dangrah', 'dgraham', '@graham222');

Create a search index on the table:
```
CREATE SEARCH INDEX ON user_info.users;
```

Create a field that is only in the index that contains all the data:

ALTER SEARCH INDEX SCHEMA ON user_info.users
ADD fields.field[ @name='all',
                  @type='StrField',
                  @multiValued='true'];

Use copyField to copy the data from all the CQL columns into the new all field of the index:

ALTER SEARCH INDEX SCHEMA ON user_info.users
   ADD copyField[@source='id', @dest='all'];
ALTER SEARCH INDEX SCHEMA ON user_info.users
   ADD copyField[@source='name', @dest='all'];
ALTER SEARCH INDEX SCHEMA ON user_info.users
  ADD copyField[@source='email', @dest='all'];
ALTER SEARCH INDEX SCHEMA ON user_info.users
  ADD copyField[@source='skype', @dest='all'];
ALTER SEARCH INDEX SCHEMA ON user_info.users
  ADD copyField[@source='irc', @dest='all'];
ALTER SEARCH INDEX SCHEMA ON user_info.users
  ADD copyField[@source='twitter', @dest='all'];

To allow faceting on the name column, set docValues to true:

ALTER SEARCH INDEX SCHEMA ON user_info.users
  SET fields.field[@name='name']@docValues='true';

Reload the schema to make the pending changes active:
```
RELOAD SEARCH INDEX ON user_info.users;
```
Rebuild the index to apply the new schema to the existing data:
```
REBUILD SEARCH INDEX ON user_info.users;
```

Filter the query using the index to return all records that contain smitty in any of the columns.

SELECT * FROM user_info.users WHERE solr_query = 'all:smitty';

The output is:

 id    | email          | irc    | name       | skype     | solr_query | twitter
-------+----------------+--------+------------+-----------+------------+------------
 user1 | jsmith@abc.com | smitty | john smith | johnsmith |       null | @johnsmith

(1 rows)

Get a count of unique names (skip nulls):

SELECT name FROM user_info.users
   WHERE solr_query= '{"q":"*","facet":{"field":"name","mincount":"1"}}';

At the bottom of the output, the facet results appear: 3 instances of john smith, 2 instances of dan graham, and 1 instance of elizabeth doe:

 facet_fields
------------------------------------------------------------
 {"name":{"john smith":3,"dan graham":2,"elizabeth doe":1}}

(1 rows)

Indexing a column for different analysis

Procedure

Was this helpful?

Give Feedback