Using LowerCaseStrField with search indexes
DataStax Enterprise 5.1.15 introduces a custom field type, LowerCaseStrField
, which provides the following features:
-
Converts the data into lowercase and correctly stores the lowercase data in
docValues
. -
Converts the query values to lowercase.
You cannot apply |
DataStax advises against using TextField
with solr.KeywordTokenizer
and solr.LowerCaseFilterFactory
.
Unintended search results could occur because the raw data was not stored as lowercase in docValues
, contrary to expectations.
Instead, use the custom LowerCaseStrField
type as described in this topic.
For example, to use LowerCaseStrField
on a field in a new index:
cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace { lowerCase : true };"
The command creates a search index with birthplace
using the LowerCaseStrField
field type.
The field type is added automatically.
To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:
dsetool get_core_schema healthcare.health_data
cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="StrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group" type="TrieIntField"/>
...
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
To add a new field to an existing index schema with the LowerCaseStrField
field type, you can:
-
Use the
ALTER SEARCH INDEX SCHEMA
command incqlsh
-
Or you can display the current schema with dsetool
get_core_schema
; edit the XML manually; and usedsetool write_resource
to update the schema by specifying your edited schema XML. Refer todsetool get_core_schema
anddsetool write_resource
.
For example, in cqlsh
, the following command adds the LowerCaseStrField
field type to the new field medicalNotes
if it does not exist:
ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;
DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;
No matter which command you choose, using cqlsh or dsetool, be sure to |
RELOAD SEARCH INDEX ON healthcare.health_data;
REBUILD SEARCH INDEX ON healthcare.health_data;
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="medicalNotes" type="LowerCaseStrField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
There is a workaround to apply |
Example:
ALTER SEARCH INDEX SCHEMA ON <table> ADD lowerCaseString key_column_copy;
ALTER SEARCH INDEX SCHEMA ON <table> ADD copyField[@source='key_column', @dest='key_column_copy'];
RELOAD SEARCH INDEX ON <table>;
REBUILD SEARCH INDEX ON <table>;
The search query is case insensitive. All queries are converted to lowercase and return the same result. For example, searches for the following values return the same result:
-
name
-
Name
-
NAME