Using LowerCaseStrField with search indexes
DataStax Enterprise 5.1.15 introduces a custom field type, LowerCaseStrField, which provides the following features:
-
Converts the data into lowercase and correctly stores the lowercase data in
docValues. -
Converts the query values to lowercase.
|
You cannot apply |
DataStax advises against using TextField with solr.KeywordTokenizer and solr.LowerCaseFilterFactory.
Unintended search results could occur because the raw data was not stored as lowercase in docValues, contrary to expectations.
Instead, use the custom LowerCaseStrField type as described in this topic.
For example, to use LowerCaseStrField on a field in a new index:
cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace { lowerCase : true };"
The command creates a search index with birthplace using the LowerCaseStrField field type.
The field type is added automatically.
To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:
dsetool get_core_schema healthcare.health_data
cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="StrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group" type="TrieIntField"/>
...
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
To add a new field to an existing index schema with the LowerCaseStrField field type, you can:
-
Use the
ALTER SEARCH INDEX SCHEMAcommand incqlsh -
Or you can display the current schema with dsetool
get_core_schema; edit the XML manually; and usedsetool write_resourceto update the schema by specifying your edited schema XML. Refer todsetool get_core_schemaanddsetool write_resource.
For example, in cqlsh, the following command adds the LowerCaseStrField field type to the new field medicalNotes if it does not exist:
ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;
DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;
|
No matter which command you choose, using cqlsh or dsetool, be sure to |
RELOAD SEARCH INDEX ON healthcare.health_data;
REBUILD SEARCH INDEX ON healthcare.health_data;
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="medicalNotes" type="LowerCaseStrField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
|
There is a workaround to apply |
Example:
ALTER SEARCH INDEX SCHEMA ON <table> ADD lowerCaseString key_column_copy;
ALTER SEARCH INDEX SCHEMA ON <table> ADD copyField[@source='key_column', @dest='key_column_copy'];
RELOAD SEARCH INDEX ON <table>;
REBUILD SEARCH INDEX ON <table>;
The search query is case insensitive. All queries are converted to lowercase and return the same result. For example, searches for the following values return the same result:
-
name -
Name -
NAME