Using LowerCaseStrField with search indexes
LowerCaseStrField
, which provides the following features:
-
Converts the data into lowercase and correctly stores the lowercase data in
docValues
. -
Converts the query values to lowercase.
You cannot apply |
DataStax advises against using TextField
with solr.KeywordTokenizer
and solr.LowerCaseFilterFactory
.
Unintended search results could occur because the raw data was not stored as lowercase in docValues
, contrary to expectations.
Instead, use the custom LowerCaseStrField
type as described in this topic.
For example, to use LowerCaseStrField
on a field in a new index:
cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace { lowerCase : true };"
The command creates a search index with birthplace
using the LowerCaseStrField
field type.
The field type is added automatically.
To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:
dsetool get_core_schema healthcare.health_data
cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="StrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group" type="TrieIntField"/>
...
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
To add a new field to an existing index schema with the LowerCaseStrField
field type, you can:
-
Use the ALTER SEARCH INDEX SCHEMA command in cqlsh
-
Or you can display the current schema with dsetool get_core_schema; edit the XML manually; and use dsetool write_resource to update the schema by specifying your edited schema XML. Refer to dsetool get_core_schema and dsetool write_resource.
For example, in cqlsh, the following command adds the LowerCaseStrField
field type to the new field medicalNotes
if it does not exist:
ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;
DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;
Remember: No matter which command you choose, using cqlsh or dsetool, be sure to RELOAD
and REBUILD
the search index in each datacenter in the cluster.
RELOAD SEARCH INDEX ON healthcare.health_data;
REBUILD SEARCH INDEX ON healthcare.health_data;
DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField" name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace" type="LowerCaseStrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="medicalNotes" type="LowerCaseStrField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>
There is a workaround to apply
The search query is case insensitive. All queries are converted to lowercase and return the same result. For example, searches for the following values return the same result:
|